Tackling Potential Content Duplication Issues in Your WebStore

zach · October 29, 2015, 5:35pm

Also, if not running in Beta, then there is no way to counter duplicate content for these issues when using Nitrosell?

franclin_foping · October 29, 2015, 5:48pm

Hi @zach

As per our guidelines, before migrating a store to Beta, the retailer has to accept a disclaimer.

We currently have over 100 stores on Beta and as far as I can tell, they are all happy there.

Let us know your thoughts.
Franclin

andy · October 29, 2015, 11:45pm

Hi Zach,

We have been on the beta channel for a couple of years now without any major issues. We process 450 plus orders most days so it’s probably safe to assume we would know if there were any major bugs or flaws.

Hope that’s helpful.

regards.
Andy.

franclin_foping · October 30, 2015, 9:19am

Thanks for the testimony @andy

dustin · November 12, 2015, 7:15pm

Hi @franclin_foping,

Has there been any updates on this issue?

We are still seeing this when using the code provided:

When we should be seeing this:

As such we have a lot of duplicate content issues in Google Webmaster Tools.

Thanks,
Dustin

franclin_foping · November 16, 2015, 10:41am

Hi,

We have just released a fix for the issue surrounding the paginated links for legacy or version 0 URLs. Those URLs contain the word ‘store’ in them. Thanks to both @emma and @tylerkrys for reporting it to us.

Also @andy, we have also fixed the issue with extra parameters in the canonical URL, thanks for bringing it to our attention once more. I have also updated the necessary nitroscript tags to avail of this fix. It is now reflected in the original post.

Let us know your feedback.

Regards,
Franclin

emma · November 16, 2015, 1:45pm

Hi @franclin_foping,

Thanks for fixing this. Its looking a lot better now, I’ve looked at the prev and next tags on departments, categories, subcategories and brands and all ok - but there is a small oddity I’d like to report when looking at the “next” tag on categories and subcategories:

link rel=“prev” href=“http://broughtons.demostore.nitrosell.com/store/category/63/449/Pendant-Ceiling-Lights/page5.html”
link rel=“next” href=“http://broughtons.demostore.nitrosell.com/store/category/63/449/Pendant-Ceiling-Lights/?page=7”
link rel=“canonical” href=“http://broughtons.demostore.nitrosell.com/store/category/63/449/Pendant-Ceiling-Lights/page6.html”

Can you see that its syntax is different to the others? Its ?page=7 rather than page7.html. This is just on subcategories and categories - I have no idea whether this would impact with google but would be better to be corrected I’d have thought? Its fine on departments and brands so it seems like a bug.

Emma

franclin_foping · November 16, 2015, 2:38pm

Hi @emma

Thanks for your feedback. We will definitely fix it regardless of the impact it may have on search engines. So, watch out this space for more.

Regards,
Franclin

stephen · November 17, 2015, 12:39am

Hi Franclin,

With regards to Andy’s query;

Also, there is another issue which we have raised as a ticket recently describing that the canonical tag appears to be self referencing and therefore allows for stray URL parameters like sort=price&order, gclid, utm_source etc to end up being referenced in the canonical tag, obviously undesirable

Could you explain how you did that, just so we’re on the same page? It seems to be working, upon checking.

There seems to be a separate issue now, whereby the paginated code is now referencing the offending duplicate pages.

For example on this page; www.mypetwarehouse.com.au/cat-food/holistic?gclid=CIWnjN7nxr0CFQdepQodEjUAOw, the paginated code reads;

franclin_foping · November 18, 2015, 9:37am

Hi @stephen

I don’t see an issue with the next or prev urls containing extra parameters. Actually some of them may be useful such as the adwords ones. Ultimately, a page is uniquely identified by its canonical url only. If you (or a web crawler) follow a link that contains a gclid then the resulting canonical url of that page doesn’t contain any extra parameter.

Recall from @andy’s query that the issue raised was to do with canonical urls. As a result, we left the other urls untouched. We can, of course, change that but I don’t see its added value.

Let us know your thoughts please.

Regards,
Franclin

emma · November 18, 2015, 1:35pm

Hi @franclin_foping,

I am wondering how this solution tackles the duplicate page problem that we have reported via the ticket system whereby google is indexing and reporting as duplicates some department pages with page numbers so high that they don’t actually exist with products on them (and have never existed).

These pages don’t return 404 errors (they say “Sorry your search resulted in no matches”) but don’t have any products on them. How/why google has indexed these is a complete mystery but we just need google to drop them.

For example I look at broughtons.demostore.nitrosell.com/store/department/9/home-accessories/page30000.html

I see I have

link rel=“prev” href=“http://broughtons.demostore.nitrosell.com/store/department/9/home-accessories/page29999.html”
link rel=“next” href=“http://broughtons.demostore.nitrosell.com/store/department/9/home-accessories/page30001.html”
link rel=“canonical” href=“http://broughtons.demostore.nitrosell.com/store/department/9/home-accessories/page30000.html”

I can’t see how this will tell google that this page should not be indexed. The real last page in the series for this department is Page 6, so why does page 30000 have a prev and a next and a canonical of itself?

Won’t this potentially cause me more duplicate problems if google then crawls page 30001, then page 30002 etc…

Wouldn’t it be better for non existent page numbers on departments and categories to return 404 errors? I see that if you try and access a brand with an irrelevant page number it returns a 404 which seems like the best solution.

Thanks,
Emma

stephen · November 19, 2015, 10:18pm

Hi Franclin,

Thanks for your message.

Re: “Ultimately, a page is uniquely identified by its canonical url only.” - I suppose only time will tell really, we hope that these gclid parameters drop from the index - so I will monitor this closely.

Hopefully they drop from the index - either way, we’ll let you know.

Best Regards,

Stephen.

franclin_foping · November 20, 2015, 2:07pm

Hi,

Thanks a lot for reporting that @emma . This is clearly an edge case and yes I agree with you, we should definitely be returning a 404 on those pages. They appear to have been followed by web crawlers which, as you already know, browse websites in a slightly different way than human beings would. A fix for this issue will be included in our next patch.

@stephen we will be more than happy to strip the extra parameters from the paginated links if the need arises. Let us know if the index was dropped meanwhile.

@zach we are still awaiting an update from you. Please feel free to get back to us if we can be of any assistance.

Wishing you a good weekend!
Franclin

zach · November 20, 2015, 3:26pm

Hi @franclin_foping,

I am not sure what update you are awaiting from me?

At this point I am not interested in moving our site to the beta version.

Thanks,

franclin_foping · November 20, 2015, 5:23pm

Hi @zach

Thanks for letting us know.

Have a great weekend ahead.

Kind regards,
Franclin

emma · January 12, 2016, 2:34pm

Hi @franclin_foping,

When will you be implementing this and returning 404s on out of scope listing pages?

I don’t really want to implement the prev/next thing until this is fixed, but we have some worrying webmaster tools data that indicates that google has suddenly massively increased our indexed pages - from a fairly constant 25,000 over the last year to nearly 90,000 in the last 2 months… we only have 24,000 pages in our sitemap! Obviously a worry for us and needs addressing - but at the moment the implementation could potentially cause even more duplication.

Thanks,
Emma

franclin_foping · January 13, 2016, 5:49pm

Hi @emma

We are currently working on that issue.
I believe by the end of this week or early next week we should nail it down.
Thanks for your patience and custom.

Regards,
Franclin

franclin_foping · January 15, 2016, 2:47pm

Hi @emma

A fix for that particular issue is now in place and those pages should now yield a 404.

Let us know if there is anything else we can do for you today.

Have a wonderful weekend ahead.
Kindest regards,
Franclin

emma · January 19, 2016, 1:26pm

Hi @franclin_foping,

Thanks for that fix, I have now implemented the prev/next/canonicalization code on our live site and will see if it helps with our duplication errors and indexing.

I have spotted another bug however, this bug is on page 1 of a BRAND page.
eg http://www.broughtons.com/store/search/brand/brolite-bakelite/
It lists the next page as link rel=“next” href=“http://www.broughtons.com/store/search/brand/page2.html”
Can you see the brand name itself is actually missing - the URL given doesn’t exist and it should be http://www.broughtons.com/store/search/brand/brolite-bakelite/page2.html

Also the bug reported on Nov 15th whereby the “next” tag is using a V1 format when it should use V0 format on categories and sub categories is also still present.

Thanks,
Emma

franclin_foping · January 28, 2016, 10:26am

Hi @emma

Thanks for your very valuable feedback. We will extend the pagination links to brand pages as well.

The other bug is currently being looked at so watch out this space for an update when a patch will be released.

Thanks for your custom.

Kindest regards,
Franclin

Topic		Replies	Views
Webmaster Tools Errors WebStore	4	3511	April 14, 2015
No Index, No Follow on Search Pages WebStore	2	1384	October 9, 2015
Filtered search result pages and product parameters	5	828	April 12, 2019
Website Search does not bring up items with words that contain the Search Term WebStore	14	2943	October 18, 2017
Reporting search keywords to Google Analytics WebStore	5	1432	September 3, 2018

Tackling Potential Content Duplication Issues in Your WebStore

Related Topics