Tackling Potential Content Duplication Issues in Your WebStore

Webpages with duplicate content are one of the major Search Engine Optimization (SEO) issues that are encountered nowadays. Indeed, it can be detrimental to your WebStore ranking in search results. Yet, it remains one of the toughest problems to solve for Webmasters. In this post, we will show you how you can leverage one of our recent patches to tackle this problem.

1. Duplicate Content Pages Consequences
Webpages with duplicate content cause at least three major problems to search engines:

  • Since, they are potentially several versions of the same pages, it is therefore difficult for search engines to identify which version of the page should be indexed;
  • Similarly, search engines cannot work out which version of the pages should be ranked in search results;
  • Search engines will be confused as to whether to direct the link metrics to one page or keep it separated across multiple versions.

2. Types and Examples of Duplicate Content Pages
In this post, we will consider two types of pages with duplicate content:

  • Any page that is queried using Google Adwords parameters;
  • Pages that include product listing such as department and category pages.

In the first case, a link can look like this:

http://www.yourstore.com/dept?gclid=CKTf7smRu7sCFcEnpQodSDYAVA

As we can notice, this link is augmented by the Google Click ID parameter which originates from Google Adwords links. Another version of the same page can be viewed at this URL:
 http://www.yourstore.com/dept 
and in this situation, the search engine will not know which version should be included in search results as previously stated. We will call this type of page with duplicate content a hard duplication.

In the second case, let us suppose without loss of generality that in a given department pages, the products are listed in several pages, with each assigned a number, in the following form:

 http://www.yourstore.com/dept?page=1

Another example of hard duplication occur with persistent-filtered search pages. These types of pages were covered in this post.

Clearly, we need to tell to the search that those pages are in fact related to each other so that they can appropriated displayed in the search results. We will refer to this kind of duplication as soft duplication. The next section will show you how to address these two types of duplication.

3. Enters Our Solution
The central part of our technique to mitigate the problem relies on including some appropriate HTML tags in the header of each page in your store with potential duplication issues. These are HTML tags are precomputed and stored in NitroScript variables. All you have to do is to include those NitroScript variables in your header template so that those tags will be included in the final page.

In filtered search pages, you can specify the canonical URL by including the following code snippet in your header template.

{if (pageproperty['pageid'] eq 'filtered')}
 {ifThereAre pfscanonical}
  {forEach pfscanonical}
   <link rel="canonical" href="{pfscanonical['url']}"/>
   <meta property="og:url" content="{pfscanonical['url']}"/>
  {endForEach}
 {endIfThereAre}
{endIf}

To specify the canonical URL for pages requested by Google Adwords, you should include this code in your header template:

{if (pageproperty['crawled_parameters_canonical_url'])}
  <link rel="canonical" href="{pageproperty['crawled_parameters_canonical_url']}"/>
  <meta property="og:url" content="{pageproperty['crawled_parameters_canonical_url']}"/>
{endIf}

Finally, addressing soft duplication is done by including this code:

{forEach linkdata}
  {if ((linkdata['rel'] ne 'canonical') || (pageproperty['crawled_parameters_canonical_url'] eq ''))}
     <link rel="{linkdata['rel']}" href="{linkdata['href']}"/>
  {endIf}
{endForEach}

To see those fixes in actions, if, for instance, you open a listing page, you should notice in the head of the resulting HTML that these tags were added:

<link rel="canonical" href="LINK TO THE PAGE"/>

If the page is potentially vulnerable to soft duplication issues, then you will notice even more tags in the form of:

<link rel="next" href="LINK OF THE NEXT PAGE"/>

These links are dynamically computed and are therefore not the same for a given pair of pages.

4. Bibliographic Notes
In this section, we provide links to webpages that we thought, would be of great interest to you.

We hope you have found this post useful, let us know your feedback.

This feature is currently only available in our Beta and Alpha channels.

1 Like

Hi Franklin

My Webmaster tools is showing duplicated meta descriptions and duplicated title tags will This rectify the issue

Thank
David

Hi David,

Thanks for the feedback. Can you provide some examples of duplication reported by your Webmaster tools please?

Regards,
Franclin

These are the two areas we are seeing a lot of duplicate content.

Hi David,

Adding those NitroScript code snippets in your header template as specified in the post will definitely fix the problem. One of the problems with the reported pages is that they lack a canonical URL. As I explained in the original post, the goal of canonical URLs is to inform the crawler which version of a given webpage (assuming there are many versions of the URL) should be used.

Please note that having made the necessary changes, it may take some time for the results to be reflected on your account. This is because the reports that you are seeing are not by any means live.

Hope this all makes sense now.

Hi Franklin

Thats Great thanks a mill

David

Hi Franklin

Its great that this can be done, Would you be able to assist when it suits

Thanks
David

Hi David,

I have addressed this issue in this thread Webmaster Tools Errors - #4 by david_acheson.

Thanks,
Franclin

Hi Francklin

I was just checking our webmaster tools and its seams that the Duplicate meta descriptions has increased up to 6000 +
White Ninja Costume: This White Ninja Costume is the perfect disguise for a true warrior and perfect
/White-Ninja-Costume/3627-L01/
/white-ninja-costume/3627-l01/

and Also the Duplicate title tags
/Black-60th-Birthday-Balloons/80897/
/black-60th-birthday-balloons/80897/

To me it like it is seeing the same only one is in caps
Your Help would be grateful

Thanks
David

Hi David,

The URLs with upper case are redirecting to the current version of your URLs (those with lower cases).

I would encourage you to ask Google to re-crawl those URL and see what happens. The steps to be followed are:

  1. Go to: https://www.google.com/webmasters/tools/ and log in

  2. If you haven’t already, add and verify the site with the “Add a Site” button

  3. Click on the site name for the one you want to manage

  4. Click Health → Fetch as Google

  5. Optional: if you want to do a specific page only, type in the URL

  6. Click Fetch

  7. Click Submit to Index

  8. Select either “URL” or “URL and all linked pages”

  9. Click OK and you’re done.

Hope this helps.
Franclin

1 Like

Hi Franklin,

I added the canonical statements you recommended for filtered pages and for soft duplication.

However, I still need a canonical setup for all of my product pages with duplicate content. The major difference are the sizes within each brand, but the descriptions and names are mostly the same.

We were using:

…but it was affecting all of our pages throughout our site; not just the product pages as intended.

Any solution?

Stan

Hi Stan,

I will appreciate if you could provide some examples. Pages with potential content duplication are mainly products belonging to a matrix. In that case, the canonical URL is computed as follows:

1 - We use any primary item specified in PAM or
2 - The item with the lowest ID

Let us know if that answers your question please.
Regards,

Franclin

Hi Franclin,

Sorry about the wait on this one.

The examples you asked for are as follows:

Brand Pages which lists all of the products in the matrix HAS a canonical in the code:
http://www.coronacigar.com/cigar-brands/A-Flores-Serie-Privada-Maduro/

Product page in the same Brand Matrix, DOES NOT HAVE a canonical in the code:
http://www.coronacigar.com/product/Cigars/Cigar-Boxes/A.-Flores-1975-Serie-Privada-Maduro-SP-52-AFSP2452M/

I’d like the canonicals to link back to the brand matrix page where a brand name is specified in PAM.

Does this help?

Stan

Hi Stan,

By default, a canonical url is provided on product pages. As I explained in the post, the canonical URL is either specified in PAM or taken from one of the item in the matrix (the one with the lowest ID). If you want to change the way we compute the canonical URLs then it will have to done in a custom development and will attract a quote.

On brand pages, it will be set to the current page since 2 brand pages cannot have the same URL. The reason why you are not currently seeing any canonical URL on your product pages is because of an edit made in your header template. You have to include the following code in it for it to work accordingly:

{if (pageproperty['pageid'] eq 'product')}

<link rel="canonical" href="{product['product_canonical_link']}"/>
 
{endIf}

Hope this helps,
Franclin

Hi Franclin,

That makes more sense. I added the code directly below the pfs canonical statement in the header.

Can you send me instructions for changing the Canonical URL in PAM? I did not see it in the Attribute Visibility.

Thanks,

Stan

Hi Stan,

You can find more about that in this thread: Selecting the Default Displayed Item in a Matrix Drop-Down List

Regards,
Franclin

Hi Franclin,

Thanks very much for the step-by-step guide.

I used your code, and I’m having an issue where the rel=“next” tag drops the category from the URL, so the href is pointing to the second page of its parent department rather than the second page of the category.

This is the code I’ve added to our header template:

{if (pageproperty['pageid'] eq 'filtered')}
 {ifThereAre pfscanonical}
  {forEach pfscanonical}
   <link rel="canonical" href="{pfscanonical['url']}"/>
   <meta property="og:url" content="{pfscanonical['url']}"/>
  {endForEach}
 {endIfThereAre}
{endIf}

{if (pageproperty['crawled_parameters_canonical_url'])}
  <link rel="canonical" href="{pageproperty['crawled_parameters_canonical_url']}"/>
  <meta property="og:url" content="{pageproperty['crawled_parameters_canonical_url']}"/>
{endIf}

{forEach linkdata}
  <link rel="{linkdata['rel']}" href="{linkdata['href']}"/>
{endForEach}

Here’s an example URL where this occurs: http://www.racksforcars.com/Rack-Systems/Yakima-Q-Clips/

Output:

<link rel="next" href="http://www.racksforcars.com/Rack-Systems/?page=2"/>
<link rel="canonical" href="http://www.racksforcars.com/Rack-Systems/Yakima-Q-Clips/"/>

The “next” link URL should actually be as follows:

<link rel="next" href="http://www.racksforcars.com/Rack-Systems/Yakima-Q-Clips/?page=2"/>

Any help would be much appreciated.

Thanks,

Tyler

Hi @tylerkrys

Thanks a million for reporting that. I will have a look at that issue and get back to you with a possible solution sometime next week hopefully. Otherwise, I agree with you that the output is not what was expected. It looks like a little bug and we will investigate that.

Regards,
Franclin

Thanks @franclin_foping, I’ll stay tuned.

Tyler

Hi @franclin_foping,

I’ve had a play with this on our demostore and getting some funny results. I pasted the code snippets above into the header template and then looked at this page:

http://broughtons.demostore.nitrosell.com/store/department/2/Door-Furniture/page5.html

I see its added this:

To me they all look wrong. Is that right? Is it me?

Thanks,
Emma