Authorship link attributes: fighting content scrapers
Posted: Jan 11, 2012 by Sebastien Daniel
Updated: Oct 26, 2013 by Sebastien Daniel
Google’s eternal battle against web spam has intensified lately. If we consider the huge impact Google’s Panda Update had on search results compared to any other Google algorithm update and Google’s recent markup for authors (authorship), the search engine is clearly putting more pressure than ever on web spam.
I’m not saying these two recent actions will radically change the SERP scene; however there is some interesting potential for more quality content and less scraper website results.In this article I'll cover:
- A quick review of the panda update
- Google's curative action
- Linking everything together
- Author profiling
A quick review of the Panda Update
I’m summarizing a great deal of my previous article on Google’s Panda Update. The most important elements to retain from this algorithm change was that duplicate content (under most of its shapes and sizes) will now have a very detrimental impact on any website’s rankings. In addition, any thin content on a website (redundant, low quality, semi-duplicate, ad-filled) will also have a detrimental impact on the entire domain.
However, in all the good update brought, there were also some unexpected inconveniences, the most reported one was on scraper sites ranking above original content sites for the same articles. This angered many actors in the world of search and also demonstrated that the Panda Update made some elements of search better while making others worse.
This led to a new issue: properly ranking content scrapers and original content publishers.
Google’s curative action
Google recently announced authorship markup and web search. This may or may not be linked to the inconveniences of the panda update; however I see some interesting potential for better content ranking and of course better valuation authors’ content and reputation.
rel="author": expected to be used to identify the author of an article. When adding a “written by" link in an article, google now expects you to tag that link as rel="author" and point that link to the author's page, on the same website.
rel="me": expected to be used to link to other relevant pages about the author, outside of the current website. This is where things become more interesting since an author can link to his social media accounts and other personal websites.
Linking everything togetherLet’s take the current article as an illustrative example:
- I publish the article on www.sebweb.ca and place a rel="author" link in the article, pointing to my "author page" on www.sebweb.ca
- My author page contains multiple rel="me" links which point to external pages about myself: facebook page, linkedin pages, twitter account, company profile etc.
- Each of these external pages also link to each-other (this is social media strategy) and they link back to www.sebweb.ca since it is my website.
- Each of my social media accounts also speak of my recent article, thus linking back to this page as well.
We quickly get the picture: Everything relevant about the author AND the article will be tightly linked together.
Does this additional feature stop scrapers? No. However it makes their task slightly more complex: to justify ownership of an article they will now have to create and maintain various social media accounts and author web pages. This is something that authors do daily and with certain variety. Eventually programmers will find a way to circumvent this new anti-scraper feature, but there will be so many signals in play that it will be much more difficult to game than it previously has been.
As I wrote this article another thought crossed my mind. By using this authorship tagging feature we are making it much easier for crawlers to "Profile" an author. We’ll literally be telling these user agents who we are and what we do.