Google Indexing '?tmpl=component&type=raw' Webpages

When doing Search Engine Optimization tests you discover that your web pages are duplicated in the Google index. One record is correct and the other includes '?tmpl=component&type=raw' in the url. Usually panik strikes instantly... since the hype about Google's "duplicated content penalty" is still high, giving more, than 1,2 million hits, despite the fact that Google officially denied the existence of a dumb penalty for such a "duplicates", claiming that they are penalizing only copycat pages. so, it's mainly a false alarm... but since all SEO workers are living (and dying) form the confidence - or lack of confidence - of their clients, it's a problem, wich should be solved.

But let's see first, where these links are coming from, then we will find an easy cure for the problem itself.

The '?tmpl=component&type=raw' parameters are here for long time and acting template-independent, despite the fact, that recently RocketTheme templates where pointed out as the culprits. What these parameters are for and what they are doing? As you can see, adding these parameters to the end of a Joomla URL has the effect of returning a copy of the web page without any additional page elements - just the main content area. For example http://joomla-tips.net/?tmpl=component&type=raw. And this explain the reason to exist: this is a way to create versions of the page suitable for printing, e-mailing, PDF generation and certain Ajax function. So is nothing evil with them, in contrary, they are very useful things, a good webmaster uses these to generate interesting effects.

So the links are there every time you use in the page these special pages - for example when you have "Print" buttons enabled. Some template builders are adding "nofollow" to these links, or use other tactics to prevent indexing of these pseudo-duplicates, others aren't bothering with that - this is the main difference between these templates in this regard. There are couple of tips - we listed a couple in this site too in past - on how to do this yourself. You can adjust your template to be "duplicate proof" Cool.

A new source of these links is the use of Ajax to load the full content of an article when you click the Read More links, instead of reloading the page. In this case you need to put yourself the question - what is more important for you? That nitty-gritty Ajax function or your "duplicate free" records at Google?

But wait, before you do some silly thing (as disabling Ajax for example), do some really simple things:

  • Use a SEF component. Core SEF is fine, but there are better ones around, as sh404SEF.
  • Revisit your settings, and see, if you need those extras which are producing these links (Print, PDF, Email icons, Ajax powered read-more buttons, etc.).
  • Scan your pages for these links. How? Easy: Right-click on the page, in the menu choose "View Source" (exact wording depends on your browser, OS and language), and then search in the resulted page for the infamous "?tmpl=component&type=raw" string.
  • Tweak your "robots.txt" file from your site's root directory, by adding a 'Disallow' entry for each page you want removed from the Google Search results. Note that each entry is relative to your website domain. Remember, you're unable to use wild-cards, so each directory needs to be listed separately.
    User-agent: * 
    Disallow: /?tmpl=component&type=raw
    Disallow: /joomla-install-config/joomla-seo/?tmpl=component&type=raw
  • Use a good sitemap creator component wich knows the trick - as the one we use ;) - or create a new sitemap.xml with any tool you are using, be sure that this does not contains the "bad" links, and upload it to your site's root folder
  • Open your Google Webmaster account - I can't believe you don't have one ;) - and submit your new sitemap, and request removal from Google's index of all pages you consider, that shouldn't be there!

And you're done. You can sleep well - the Google's Duplicate content Penalty is no more threatening you.