By Tad Miller
May 18, 2018
More Articles by Tad
I’m letting you in on a secret of Google technical SEO: Stop making Google work so hard. Try to eliminate the extra work it takes Google to crawl your website and think of it from the perspective of what can I do with my website to save Google on their electric bill. Yes, that’s right, their electric bill.
Salaries are obviously a huge component of Google’s expenses, but infrastructure, and the electricity that powers that infrastructure, is something that you might not have thought about before.
Google owns 14 data centers on 4 continents that each contain the entire internet. Yes, all of it. Each of those has the processing power to instantaneously deliver you tens of thousands of search results in the blink of an eye, and has the power to go out to all the websites in the world and index and store all the pages on those websites. So essentially, there are 14 data centers to contain multiple “Petabytes” of data (1 petabyte is equal to a million Gigabytes).
Admirably, Google has invested a lot in reducing its carbon foot print and investing in renewable energy to power itself. But, it’s still a very big deal that they need the power to contain the internet and deliver it out over 3.5 billion searches per day, with growth of those needs a certainty.
There are absolute benefits to taking the strategic approach of saving on Google’s electric bill with your website. Primary among them being improved Google rankings, improved organic traffic from Google, and more frequent, and deeper indexing of your content. Google itself is now giving site owners more visibility into the way your website annoys it more than ever. They do it with the new and much improved Google Search Console and its Index Coverage Reports.
Google can wield great power and impact over site owners merely by stating that features on websites can be ranking factors (both positive and negative) in its algorithm to rank search queries. When it talks, people listen and act out of fear of losing their most valuable stream of traffic.
In 2010 Google announced that site speed was a ranking factor and framed it as something that site owners needed to do for users. What they didn’t say was that by getting site owners to improve their site speed due to many different factors from server speed to image size that it would likely save them many millions of dollars in electrical processing power by reducing the load it takes to index and process a slow website.
They can rightfully keep claiming it’s in the interest of the users (which it absolutely is) but it’s also in the interest of its load on data centers. Similarly, they are pushing more site owners to adapt AMP pages, and mobile page speed improvements, all with the goal of improving the user experience and reducing the load on their servers.
Google’s latest attempt at incentivizing site owner behavior is subtler and dressed-up as what at first look appears to be just a mere analytics dashboard. Many have looked at it and scratched their head wondering what to do with it. But, the tools in Google’s new Search Console interface, particularly the Index Coverage Reports, offer webmasters and site owners the opportunity to see their websites the exact way that Google sees it. This kind of data wasn’t available in the old Search Console tool set and it should usher in a new era of technical SEO.
Marketing Mojo had Beta access to the new Google Search Console for a few of its clients before it rolled out to the world and it excited us with possibilities. In that time, we discovered that it was the definitive resource for solving a very common problem in SEO. That problem is duplicate content. Interestingly, even some of the best-known people in SEO couldn’t initially see the value in these tools:
We have clients with massive duplicate content problems, diagnosing those problems and other indexing problems is easy with the instant visibility provided in the Index Coverage reports.
When the new Search Console was rolled out to all our customers it unleashed a flurry of activity specifically related to fixing index problems. It’s not bold and flashy in terms of fixing these problems, in fact it’s downright boring and tedious, but site owners and webmasters need to take their cues from Google on what these reports are saying.
Every client we have Search Console access to has the problem of their XML Sitemap not being complete. The XML Sitemap is the way you show Google the fastest and most efficient way for it to index all the pages of your website.
Without it the Google-bot that indexes your site must stumble around through it like a maze in an inefficient way (how does this impact Google’s electric bill?). With the XML sitemap, Google essentially has the map to the maze that is your website and can quickly find and index all your pages…day after day after day.
If you see a high number of pages in this report that aren’t in your sitemap you either need to run a new sitemap or install a CMS plugin that automatically creates and updates your sitemap as you make changes. Doing so makes your site a “friendly” place for Google to keep coming back to.
These pages aren’t a duplicate content problem. They are duplicates that utilize a canonical tag to tell Google which page is the correct page to rank.
However, what we typically see with these pages is that they often just have URL parameters that are used for tracking appended to them, which can be a valuable clue to enhance your indexing.
Once you identify what these tracking parameters are that are appended onto the URLs, you can go to the old Google Search Console interface Under URL Parameters to tell Google to ignore URLs with those parameters.
Doing so can save your “Crawl Budget” that Google allots to your site on its regular indexing so that it can use it for canonical pages or your most important pages. Additionally, such a move could save Google the effort and electricity that we are trying to achieve with optimizations.
These are the bad ones. Somehow, some way, you created more than one page with the same exact content and didn’t use a Canonical Tag to tell Google which one to rank and which one to ignore, or even just give credit to the Canonical URL.
This often is due to:
Nothing pointlessly wastes a bot’s time (and electricity) more than spending time indexing the same page content over and over again. It doesn’t make the bot very happy, and when the bots not happy, you aren’t going to be happy with your rankings and organic traffic. The impact of fixing these problems can be huge.
It doesn’t seem intuitive that getting rid of pages (with 301 redirects in this case) could be a strategy for getting more traffic. But, in the case of duplicate content that is absolutely the case.
Identifying and resolving duplicate content with 301 redirects, Canonical tags and in some desperate cases noindex tags and robots.txt file exclusion for indexing is a very healthy thing for your site…and Google’s electric bill.
In our experience, this duplicate content issue seems to be exclusive to having both a secure or non-secure page that both have a self-referring canonical pointing to itself. In almost every instance that we found them, the site’s webmaster had no idea that they had both versions of the website live. The solution to this is preferably to 301 redirect the non-secure page to the preferred secure page. If that isn’t an option (it really should be THE option) then you could put the canonical tag on the unsecure page to refer to the secure page. At this point, you hopefully understand the ranking benefits of having a secure website and have made the switch to https.
Did you ever have a relationship break up with someone and they stopped taking your phone calls or texts? The thing with wasting Google’s time, energy, and money on duplicate content is similar in that they don’t just get mad. They get even and that means excluding your pages from its index. They don’t want anything to do with those pages and there isn’t much you can do to convince them otherwise.
Google just stops indexing duplicate content pages and they are excluded from ever showing absent some action to correct the problem. But the problem is bigger than that. You have created a bad neighborhood (requires too much work and electricity) that they don’t want to waste their time on, and as a result, the rest of your site that is indexed may not be enjoying the full advantages in rankings that it could be enjoying.
Getting the balance back and tilted in the favor of valid URLs being much more than excluded URLs is a goal of technical SEO when it comes to the indexing part of SEO.
Not all the Index Coverage report is about duplicate content. It includes 404 errors, “crawl anomalies” which appear to almost always be 404 errors, redirecting pages (not that big of a deal unless they are still linked in your navigation), and pages that are blocked by the robots.txt file (many times accidentally).
You can take the same saving electricity strategy with these issues to improve your technical SEO. Redirect error pages. Get rid of internal links on your site that automatically redirect to provide the clearest path to getting to your other pages. Don’t do confusing things like have a sitemap with unsecure URLs that have canonical tags that point to secure URLs or redirect to secure URLs. Avoid putting the Google-bot through gymnastics or tail chasing to accomplish the most basic tasks and give it the easiest path from A to B to improve your performance.
Never has there been such a clear way to see how Google actually sees your pages. They are clearly showing you the problems now. That doesn’t mean the problems are easy or fast to resolve. Many of these steps can be very time consuming and sorry, they aren’t really tasks that are very friendly to automation. Human judgement and intervention on these issues is necessary and having someone experienced (yes like someone in an agency that does this stuff all the time!) can ensure that attempting to fix your problems doesn’t end up actually creating new problems.
Short cuts should be avoided and getting it right, and right the first time, with correcting duplicate content and indexing problems is a vital approach to getting the indexing side of SEO to a level that helps you and doesn’t hurt you.
If you would like help with identifying duplicate content & index problems, or would simply like to know where you stand from an SEO standpoint, check out our SEO audit services, as well as other SEO services we offer.
Join over 4,000 marketers who receive actionable digital marketing insights.