Google Power Tools for Eliminating Duplicate Content & Index Problems

By Tad Miller | May 18, 2018
More Articles by Tad


I’m letting you in on a secret of Google technical SEO: Stop making Google work so hard. Try to eliminate the extra work it takes Google to crawl your website and think of it from the perspective of what can I do with my website to save Google on their electric bill. Yes, that’s right, their electric bill.

Yes. Reducing the electrical burden on Google is an SEO Strategy

Salaries are obviously a huge component of Google’s expenses, but infrastructure, and the electricity that powers that infrastructure, is something that you might not have thought about before.

Google owns 14 data centers on 4 continents that each contain the entire internet. Yes, all of it. Each of those has the processing power to instantaneously deliver you tens of thousands of search results in the blink of an eye, and has the power to go out to all the websites in the world and index and store all the pages on those websites. So essentially, there are 14 data centers to contain multiple “Petabytes” of data (1 petabyte is equal to a million Gigabytes).

Contemplate the Amount of Electricity it Takes to Store the Entire Internet

Admirably, Google has invested a lot in reducing its carbon foot print and investing in renewable energy to power itself. But, it’s still a very big deal that they need the power to contain the internet and deliver it out over 3.5 billion searches per day, with growth of those needs a certainty.

There are absolute benefits to taking the strategic approach of saving on Google’s electric bill with your website. Primary among them being improved Google rankings, improved organic traffic from Google, and more frequent, and deeper indexing of your content.  Google itself is now giving site owners more visibility into the way your website annoys it more than ever. They do it with the new and much improved Google Search Console and its Index Coverage Reports.

Carrot and Stick Approach to Lowering the Burdens on Data Centers

Google can wield great power and impact over site owners merely by stating that features on websites can be ranking factors (both positive and negative) in its algorithm to rank search queries. When it talks, people listen and act out of fear of losing their most valuable stream of traffic.

In 2010 Google announced that site speed was a ranking factor and framed it as something that site owners needed to do for users. What they didn’t say was that by getting site owners to improve their site speed due to many different factors from server speed to image size that it would likely save them many millions of dollars in electrical processing power by reducing the load it takes to index and process a slow website.

They can rightfully keep claiming it’s in the interest of the users (which it absolutely is) but it’s also in the interest of its load on data centers. Similarly, they are pushing more site owners to adapt AMP pages, and mobile page speed improvements, all with the goal of improving the user experience and reducing the load on their servers.

Google’s latest attempt at incentivizing site owner behavior is subtler and dressed-up as what at first look appears to be just a mere analytics dashboard. Many have looked at it and scratched their head wondering what to do with it. But, the tools in Google’s new Search Console interface, particularly the Index Coverage Reports, offer webmasters and site owners the opportunity to see their websites the exact way that Google sees it. This kind of data wasn’t available in the old Search Console tool set and it should usher in a new era of technical SEO.

The New Google Search Console Solves A Problem That No Other SEO Tool Does

Marketing Mojo had Beta access to the new Google Search Console for a few of its clients before it rolled out to the world and it excited us with possibilities. In that time, we discovered that it was the definitive resource for solving a very common problem in SEO. That problem is duplicate content. Interestingly, even some of the best-known people in SEO couldn’t initially see the value in these tools:

 

Google Search Console

We have clients with massive duplicate content problems, diagnosing those problems and other indexing problems is easy with the instant visibility provided in the Index Coverage reports.

When the new Search Console was rolled out to all our customers it unleashed a flurry of activity specifically related to fixing index problems. It’s not bold and flashy in terms of fixing these problems, in fact it’s downright boring and tedious, but site owners and webmasters need to take their cues from Google on what these reports are saying.

Indexed but Not Submitted in an XML Sitemap

Every client we have Search Console access to has the problem of their XML Sitemap not being complete. The XML Sitemap is the way you show Google the fastest and most efficient way for it to index all the pages of your website.

Without it the Google-bot that indexes your site must stumble around through it like a maze in an inefficient way (how does this impact Google’s electric bill?). With the XML sitemap, Google essentially has the map to the maze that is your website and can quickly find and index all your pages…day after day after day.

Give Google the Short Cut with an XML Sitemap

If you see a high number of pages in this report that aren’t in your sitemap you either need to run a new sitemap or install a CMS plugin that automatically creates and updates your sitemap as you make changes. Doing so makes your site a “friendly” place for Google to keep coming back to.

indexed but not submitted

Get these in your sitemap

Alternate Page, With Proper Canonical

These pages aren’t a duplicate content problem. They are duplicates that utilize a canonical tag to tell Google which page is the correct page to rank.

However, what we typically see with these pages is that they often just have URL parameters that are used for tracking appended to them, which can be a valuable clue to enhance your indexing.

Once you identify what these tracking parameters are that are appended onto the URLs, you can go to the old Google Search Console interface Under URL Parameters to tell Google to ignore URLs with those parameters.

Doing so can save your “Crawl Budget” that Google allots to your site on its regular indexing so that it can use it for canonical pages or your most important pages. Additionally, such a move could save Google the effort and electricity that we are trying to achieve with optimizations.

Exclude URL Parameters

If your duplicate URLs are created by URL Parameters used for tracking. Then tell Google-Bot to Ignore Them.

Duplicate Page Without Canonical

These are the bad ones. Somehow, some way, you created more than one page with the same exact content and didn’t use a Canonical Tag to tell Google which one to rank and which one to ignore, or even just give credit to the Canonical URL.

 

 

This often is due to:

  • Having both http and https
  • Having www or non-www pages
  • Having both http/https and www/non-www pages at the same time
  • Having sub-domains with identical content as your top-level domain
  • Having a trailing slash on one page and not on another (even worse using them inconsistently in site navigation)

Nothing pointlessly wastes a bot’s time (and electricity) more than spending time indexing the same page content over and over again. It doesn’t make the bot very happy, and when the bots not happy, you aren’t going to be happy with your rankings and organic traffic. The impact of fixing these problems can be huge.

It doesn’t seem intuitive that getting rid of pages (with 301 redirects in this case) could be a strategy for getting more traffic. But, in the case of duplicate content that is absolutely the case.

Identifying and resolving duplicate content with 301 redirects, Canonical tags and in some desperate cases noindex tags and robots.txt file exclusion for indexing is a very healthy thing for your site…and Google’s electric bill.

Duplicate Page Without Canonical

If you have a lot of these, you need to get rid of them with 301’s or need to canonical them.

Google Chose a Different Canonical Than User

In our experience, this duplicate content issue seems to be exclusive to having both a secure or non-secure page that both have a self-referring canonical pointing to itself. In almost every instance that we found them, the site’s webmaster had no idea that they had both versions of the website live. The solution to this is preferably to 301 redirect the non-secure page to the preferred secure page. If that isn’t an option (it really should be THE option) then you could put the canonical tag on the unsecure page to refer to the secure page. At this point, you hopefully understand the ranking benefits of having a secure website and have made the switch to https.

google chose different canonical

It’s good that you tried to put the Canonical Tag on there. But, having 2 distinct self-referring canonicals for the same page content is a problem.

The Wages of Sin are Death

Did you ever have a relationship break up with someone and they stopped taking your phone calls or texts?  The thing with wasting Google’s time, energy, and money on duplicate content is similar in that they don’t just get mad. They get even and that means excluding your pages from its index. They don’t want anything to do with those pages and there isn’t much you can do to convince them otherwise.

Excluded from indexing

Your duplicate content is BANISHED from the index!

Google just stops indexing duplicate content pages and they are excluded from ever showing absent some action to correct the problem. But the problem is bigger than that. You have created a bad neighborhood (requires too much work and electricity) that they don’t want to waste their time on, and as a result, the rest of your site that is indexed may not be enjoying the full advantages in rankings that it could be enjoying.

excluded from indexing

Have you created a situation where Google Excludes More Pages Than it has Indexed?

Getting the balance back and tilted in the favor of valid URLs being much more than excluded URLs is a goal of technical SEO when it comes to the indexing part of SEO.

Not all the Index Coverage report is about duplicate content. It includes 404 errors, “crawl anomalies” which appear to almost always be 404 errors, redirecting pages (not that big of a deal unless they are still linked in your navigation), and pages that are blocked by the robots.txt file (many times accidentally).

You can take the same saving electricity strategy with these issues to improve your technical SEO. Redirect error pages. Get rid of internal links on your site that automatically redirect to provide the clearest path to getting to your other pages. Don’t do confusing things like have a sitemap with unsecure URLs that have canonical tags that point to secure URLs or redirect to secure URLs. Avoid putting the Google-bot through gymnastics or tail chasing to accomplish the most basic tasks and give it the easiest path from A to B to improve your performance.

The Hard Part

Never has there been such a clear way to see how Google actually sees your pages. They are clearly showing you the problems now. That doesn’t mean the problems are easy or fast to resolve. Many of these steps can be very time consuming and sorry, they aren’t really tasks that are very friendly to automation. Human judgement and intervention on these issues is necessary and having someone experienced (yes like someone in an agency that does this stuff all the time!) can ensure that attempting to fix your problems doesn’t end up actually creating new problems.

This is easy to do, but not necessarily glamorous and fun. It’s tedious and time consuming…

Short cuts should be avoided and getting it right, and right the first time, with correcting duplicate content and indexing problems is a vital approach to getting the indexing side of SEO to a level that helps you and doesn’t hurt you.

If you would like help with identifying duplicate content & index problems, or would simply like to know where you stand from an SEO standpoint, check out our SEO audit services, as well as other SEO services we offer.

Share this article

Facebook Icon Twitter Icon LinkedIn Icon Pinterest Icon

Subscribe today!

Join over 4,000 marketers who receive actionable digital marketing insights.

 

Blog Search


Marketing Agency of the Year 2018 Marketing Agency of the Year 2017