It’s always a constant challenge how to protect our original content from site scrapers. It’s even harder no that blogs are easier to scrape because of full RSS feeds.
There’s the usual prevention methods most bloggers do — going partial feeds instead of full feeds. I’ve never really got worried with it even though I’m publishing full feeds. However lately, I’ve noticed that the scraper sites (splogs) sometimes even ranked higher than mine which has caused alarm.
Search engines promises publishers their system can intelligently identify the original from the dupes but I don’t think their success success rate is any good either. So, I thought getting a back link from the splogs will solve that dupe issue.
Lately, I’m using the Feed Footer plugin which adds custom footers (copyright, notices, advertisements) to the bottom of blog posts in the RSS feed. I’m sure most of you have seen them already.
However, if that’s not enough, you can try the AntiLeech plugin:
AntiLeech produces a fake set of content especially for them that includes links back to your site and sends it only to them. When they steal this content, it appears online just like normal, except now you’ve turned the tables on them and have provided them with useless content.
AntiLeech can detect a splogger bot using its User-Agent string (an identifier that some bots send when they are collecting data), or by IP address. You can enter a User-Agent or an IP address into the Options panel of your WordPress blog. When a visitor with a qualifying (any checked option on the options page) User-Agent or IP address visits your site, they will see only the generated content. They will see it in your page layout and in your feeds. Anywhere you’re normally outputting content, that’s where the fake content will appear to them.
Regular users whose browsers do not match these strings will see your normal content. RSS aggregators should be able to display your content normally, too.
You can download the plugin here. AntiLeech does not really prevent the splogger bots or the splogger themselves from accessing your site, they can still manually do a copy and paste. Still, you have one less to worry about.
Thanks for the tip!
Ang daming mga nagiiscrape ng blogs ngayon. Argh! And my rockersworld.com blog is of course, one of those blogs being scraped.
They’re doing it for the money. Or for testing their scripts. Other than those two reasons, I don’t see other reasons to be plausible for their kind of actions. ;) At least for me.
Nice tip. I’ll give the plugin a try, my blog has been a victim of splogs since last year. This would help turn the tide against them.
@ChrisMo: It’s a good solution, however, for sites and/or blogs whose content are being re-published/syndicated legally by other sites, or members of online newspapers, they heavily rely on Feeds with full post content.
They have no option but to provide it, and secondly, there are feed subscribers who prefers to read the whole content than to visit the site just to read the rest of the post.
It is a war that the Feed/Syndication Community will soon have to face in full force. However, base on my experience and other people’s, RSS-based feeds are mostly the victims, while Atom-based have less victims. To begin with, Atom is a WebStandard, RSS with its endless flavors, is/are not.
My solution would only be Rss feed with SE friendly url’s, I mean urls only… So that there isn’t any real content to scrape, rather a link to an article to the site post. You need to make better posts titles though…
i’ve seen some of my articles from another website actually.
it’s alexa rank is far highr than mine but i dont understand why they have to do that.
nice post master yuga.
For BrianB. whats the connection of you link? tsk tsk self promoting. Dumadalas na yata style mo na ganyan.
How fragile is the intarwebs…
http://www.ohgizmo.com/2008/03/03/the-internet-its-more-tangled-than-you-think/
yep. If you are researching, you’ll end up getting sites that doesn’t have the content you are looking for because they just stole it via feeds. I encountered around 12 already that specifically targets Pinoy owned blogs.
Fun to watch, but not fun anymore if you are one of the victims. :p
nice feed abe :)
@calvin, those are sploggers feeding off the rss.
yung pinoytravelblog parang may ibang sites na kumukuha ng sa content nya. travelhostel ata or something. same na same pati categories. hahaha, parang duplicate ng site pero ibang theme ginamit. sinadya mo ba yun abe?
I will give this plugin a try. Thanks Yuga.