sighlighted scribble sircled

How to Check for and Avoid Duplicate Content

March 07, 2016

icon-expand icon-expand

Google hates duplicate content. Use these tips to make sure you don’t have any duplicate content on your site!


Everyone knows that Google hates duplicate content. What many people don’t always know is the different ways that Google may be finding duplicate content on your site. The goal of this blog is to help you go through your site to find any possible causes of duplicate content and address them. This will not cover ALL the ways that Google may find your content duplicative, but it should help with a lot of them!

Here is Google’s official stance on duplicate content. Essentially it only wants to provide users with the most relevant, timely, helpful and unique content in response to a search query. When it finds two pieces of content that appears to be the same from two different URLs it will then start evaluating the content using other factors. The list of other factors is incredibly long and complicated, but in vein of brevity it will be in one of the following buckets - page and site technical functionality, the user’s device and location, and the off-site links or authority of the content quality itself. So, if it appears that you have created duplicate content, you have essentially confused Google when it comes to rankings. And if it is confusing to Google it is most likely confusing to a user - which is not good for you.

The Most Common Causes
As I mentioned above there are a ton of reasons there may be duplicate content from a site, but let’s take a look at a few of the most common:

  • The www vs. non-www webpages: Most CMS platforms have the opportunity to set a default setting for a website. The default would be to choose http://www.yourwebsitehere.com or http://www.yourwebsitehere.com as the default setting. The problem is, with all of the other things that go into web design, this can sometimes be overlooked. So if there is no default chosen, it is possible for users and crawlers to get to both versions. The www vs. non-www is the most common cause of duplicate content. If it has been overlooked on your site, it is not just duplicate versions of one or two of your pages, it is you entire site. The good news is, this is a relatively easy fix. More on those later. How to fix: the best way to fix is to create a 301 redirect map to whichever you will ultimately set as the default. For example if you hare going to have www as the default you will first want to put a 301 redirect map in place pointing all of the non-www URLs to their www counterparts.
  • Product Pages: With an e-commerce site or a website that sells products it is incredibly common to have issues with duplicate content. It’s even more complicated if you have wholesalers that also sell your products or vice versa. Let’s deal with the product issues on your site first. If you have a very large site with a lot of products there are often serval ways to get to the same product. For example, if I’m looking for a grey sweater I can usually get to sweater pages through Men’s Clothing > Shirts and Sweaters > Sweaters so the final URL would be something like: shoppingwebsite.com/mens/shirts-and-sweaters/sweaters/cotton-sweater47-grey. But I may be able to get to the same product or list of products by starting at the sweaters, somehow getting to a URL that reads: shoppingwebsite.com/sweaters/men/cotton-sweater47-grey. While Google is getting better and understanding duplicate content and not punishing sites as much for that type of duplicate content it can still be an issue. Usually the best fix for this is using canonical tags to pass authority from one page to another. When dealing with products that you make that may be listed on other sites it is best practice to have a page that has been keyword optimized then you have on your site while then using a distribution copy (for example, the old page of the product prior to your keyword optimization) to then pass to distributors of wholesalers. That way the duplicate content is not an issue for your site, and it is viewed as unique when compared to other competitor’s product listing. How to fix: This will depend on your site. The best way to manage this is to minimize the cross-listing of products if possible. The next is deciding which URL for the product you have get the most sales or visits. If there is a most popular path to a product, we recommend using canonical tagging across the other versions of products to point to the most popular URL. This will tell Google - “Google, I know there are duplicate versions of this content on my site, this is the most important one I want your crawler to pay attention to.” This can be a bit of a complicated process and should only be done with the help of a search engine optimization expert. Please contact us if you would like to discuss this in a bit more detail. Lastly, you could create unique pages for each product regardless how the user gets to the page.
  • Duplicate Content: There is always the possibility that you used the same blocks of content on several different pages on your website. How to fix: This one is relatively straight-forward. If your site has duplicate content, you need to change those blocks based on your keyword strategy and relevance. Use the original content on the most important pages while rewriting the other portions.
  • Duplicate Content in Blogs: Blog posting is an area we tend to see a lot of duplicate content on website. A common problem is having the same piece of blog content in several different blog categories. For example if you have a blog that covers 7-10 different topics, it is likely that you will tag one blog to fit in 4 of them. Depending on the website the blog could then have as many different URLs for each corresponding topic. For example: http://www.yourwebsite.com/blog/news/my-new-blog and http://www.yourwebsite.com/blog/products/my-new-blog that have the exact same content. We typically recommend clients keep the number of blog topics to three or four and try to choose a blog category on one topic or two if they absolutely have to in order to minimize the amount of duplicate content that may come out of the blog. How to fix: The best course of action for this is to narrow down your blog topic to 4 or 5 topics if possible. After that process has been completed you need to identify which blogs belong in which topics. From there, you should 301 redirect map any of the duplicate blog versions (in some cases there may be more than one) to the one category you want to use moving forward.

How to check your site for Duplicate Content
There are several great tools that you can use to see if your website may have problems with duplicate content. All of the methods for checking that I have listed here are free, some may need a bit of assistance to set up, however they are pretty easy to use after setup or installation.

  • Doing a ‘Site:’ search: If you think you may have a URL that is causing a duplicate, you can always check Google to find out. If you simple enter in ‘site:’ followed by the URL of the page you would like to check you can see if it is appearing in Google’s index. If there are no results displayed, the URL has not been indexed.
  • Google Search Console (formally Webmaster Tools): This piggy-backs a bit from the idea above. But Google’s Search Console is an extremely helpful tool to understand how Google views your website. You can view Duplicate Content under the ‘HTML Improvements’ section.. It will list out Duplicate Page Titles and Meta Descriptions, but that insight can help you identify if those pages have the same title or are full duplicates.
  • Screaming Frog: Screaming Frog is a tool you can download for free for iOS or Windows to then crawl up to 500 URLs to identify duplicates. This is a great tool for identifying 400 or 500 errors on your site as well as duplicate content.
  • SiteLiner or CopyScape: Both of these are web applications you can run without installing any software or confirming any site ownership. These tools can help identify if you have any duplicate content not only on your site but also if there is content that may be being duplicated on other site. Either information was disseminated from the same source, or the content was copied word for word and applied to another site.

Hopefully you have found this blog helpful in understanding and cleaning up causes of duplicate content. Please feel free to reach out to us if you have any questions about this or other articles on the site!

Until next time!!

Gravity Admin

Latest Blogs

icon blog-grid

More Than a Feeling: Evidence-Backed Emotion in Business Marketing

Read Post
Read post
icon blog-grid

How Blockchain Builds Trust for Brands in the Digital Age

Read Post
Read post
highlight highlight



This e-mail was sent from Gravity Global (https://www.gravityglobal.com/)

Blog Home
The Dangers of Misunderstanding Minimum Viable Product
Top 5 Reasons to Conduct High-Impact Research in 2016