How to Prevent Your WordPress Site Content from Being Scraped

by in WordPress on 11th Jan 2013 · Comments

If you produce original content on your WordPress site or blog, you may be victimized by others who “scrape” your content. This simply means that some unscrupulous users and online marketers steal other authors’ original content so that they can profit from it. Content scraping is a big problem today, partially because it is so easy to do and partially because many people are simply too lazy to generate their own original content. They rely on thievery to get new content for their sites and they usually profit in some way from the theft.

How to Prevent Your WordPress Site Content from being Scraped

What Exactly is Content Scraping?

Content scraping is the practice of lifting content off of one blog or site and illegally posting the content on another site. This theft is usually accomplished through the use of scripts. There are several WordPress plug-ins that will actually perform the scraping for an unscrupulous web developer or marketer.

The problem is that when original content is scraped from an author’s blog or site, and then the post is illegally published elsewhere, the illegal content may be ranked higher in the SERPs than the original. This is frustrating to those who create original content. To add insult to injury, the original content creator may witness the stolen content generating revenue for the developer or marketer who stole it.

The main reason unscrupulous web developers and marketers will steal your content is to generate profit from the post or article.

How Do You Know if Your Content is Being Scraped?

One way to see if your content has been scraped from your site is to do a Google search using the title of your posting. However, if you have written about a popular topic, you may not be able to find your article, or scraped content using this approach. You can use tools that will help you find sites that scrape your content, though.

Tools that Monitor Trackbacks to Your Site
First of all, you need to add a few internal links to your post. Once you have internal links, you can monitor trackbacks to your content with tools such as Akismet. If someone is scraping your content, you will see trackback links to their site in the Akismet Spam folder.

Akismet

Another tool you can use to find scraped content is Google Alerts. This free tool from Google will email an alert to you if the content you specify, such as your article title, ends up in Google search results.

Google Alerts

A manual tool you can use is Copyscape. Copyscape offers a free version that allows you to enter the URL of your site and find other sites with duplicate content. However, Copyscape also offers a premium service for about $5.00 per 100 searches. This premium service allows you to insert all content of an article or blog post and find other content that may be “too” similar to yours to be attributable to chance.

In other words, someone may not steal ALL of your article, but they may appropriate sections, paragraphs or bulleted lists. The premium version of Copyscape will help you find those sites quickly and easily. A “con” of using Copyscape is that, unless you have an API, checking your content is a manual process. a “pro” of using Copyscape is that it is VERY accurate and will often locate content that even has similar sentences.

Copyscape

Yet another set of tools for locating stolen content are included in the Mahalo Plagiarism Detection toolbar. This toolbar installs into popular browsers, making the tools available when viewing each website. Though not quite as in-depth as Copyscape, these tools function in much the same way. You highlight a block of text and then click the button in the toolbar. A Google search is performed to find exact and similar matches for the selected text. Again, this is a manual tool, but tools such as these help tremendously for locating stolen content.

Mahalo

What Do I Do About A Content Thief?

Save a Copy of the Stolen Content

There are a few things you can do when you encounter content theft. First, you need to copy the website to have the evidence that your content was stolen. There are a couple of ways to do this. Web archiving services will preserve a copy of the website in its current state. You want to do this because when you locate content theft and confront the site’s owner, the content will often disappear. Services such as WebCite and the Internet Archive will save a cached copy of the site in its current state.

Contact the Site Owner

If the site has a Contact Us form, use it to contact the site owner and tell them that you have identified your content on their site. Many times, the site owner will remove the content. If there is no way to contact the site owner, use WhoIs through Domain Tools or who.is to locate the administrative contact for the domain name of the site. Contact this person and tell them you have located stolen content on their site. If you get no response or the site owner refuses to remove the content, report the site.

Who.is

Reporting the Site for Stolen Content

If contacting the site owner is nonproductive, contact the host that is providing them service. In most cases, this is the most effective way to remove the stolen content. The DMCA contact list may provide contact information for the host the site is using. Over 100 popular web hosting companies are listed on this resource. Many hosting companies provide DMCA forms for complaints about stolen content on their hosted sites.

Plagiarism Today

How To Protect Your WordPress Content Against Scraping

A few WordPress plug-ins are emerging that enable you to protect your original content that is posted on a WordPress site.

Digital Fingerprint Plugin by CopyFeed
This valuable WordPress plug-in inserts a unique character string into your original content. The tool then searches for the character string. Any sites that are returned are scraping your content. This tool’s usefulness is being hailed within the WordPress community.

Anti-Leach
This WordPress plug-in identifies scrapers and redirects them to a dummy content site which you specify. Some users report that they have problems integrating this plug-in with Feedburner.

Angsuman’s Feed Copyrighter
This WordPress plug-in adds a copyright to your RSS and Atom feeds. This tool is a great help if you want to feed your full content and not just a summary.

AntiScraper
AntiScraper is another great WordPress plug-in to deter content stealing. This plug-in operates by creating a blacklist of content scraper domains and disallowing these domains access to your content. First, you must create an account with AntiScraper. The account is valid for 30 days. You will then install and activate the plug-in.

Additional Ways to Deter Content Scraping

Disable Trackbacks in WordPress
You can disable trackbacks on existing and new WordPress sites. An article by WPBeginner.com: What, Why and How Tos of Trackbacks and Pingbacks in WordPress can show you how to accomplish this.

Use Summary RSS Feeds
Because many content thieves rely on full content feeds via RSS to steal their content, providing only summaries of content through feeds will hamper their efforts.

Using Content Scrapers to Your Advantage

Using internal links to your content will not only boost your page rank, it will actually enable you to use the content scraper’s efforts to generate traffic back to your own site.

Use an AutoLink Tool to Internally Link Keywords

Tools such as Ninja Affiliate will enable you perform an automatic search and replace of your WordPress content to locate keywords that you specify in the plug-in. Say you have referenced a certain company, such as Microsoft, in your content. You can insert the www.microsoft.com URL into the Ninja Affiliate plug-in, search all of your content for the keyword "Microsoft", and have the URL automatically updated each time the keyword is found in your content.

Ninja Affiliate

OR, you can use the Ninja Affiliate plug-in to associate the keyword with an internal link to a page on your site where the user will find information about the site. In other words, you could use Ninja Affiliate to associate “Microsoft” with a Microsoft page on your site where the user will ultimately find the URL to the company. You benefit from having the affiliate link, as well as utilizing the internal links within your own site. This will throw a wrench into the content scraper’s strategy when every internal link you provide automatically sends the user back to your original content.

Add an RSS Footer

Using plug-ins such as RSS Footer allow you add custom content and banner ads to the footer of your RSS feeds. Add a banner ad that will send users back to your original content. Because feeds are often how content is sold, the RSS Footer and other footer customizers will add your ad to all of your content which will, in turn, appear on the scraper’s site!

Conclusion

Content scraping is stealing and if you are an author of original content, you may need to take steps to protect your content from unscrupulous and lazy marketers and other web developers. Hopefully, this post sheds some light on how you can protect, and even benefit from, content scrapers.

Terrance is a versatile web developer and the technical editor at OXP. He enjoys creating functional websites and is particularly engrossed in all the tiny details mixed together to construct great user experiences. He always believe that every web user deserves the best!