How-To: Troubleshoot a Page That Isn’t Getting Indexed
So, you’ve just launched a new site, or a new page on your current site, and you are having some issues with it’s rankings. First thing you’ll want to check is if the page is even getting indexed in the first place. If it’s not in Google’s index, then you’re definitely not going to find it in the search engine results pages.
Checking If a Page is Indexed…
To check whether a page is in Google’s index, simply perform a search on your specific page. For instance, type into the search box site:http://www.search-mojo.com/services/google_analytics.php.
If you find a result for that URL, then good news, your page is in the index, and you just need to work on getting it ranked higher. If you see something like the image below, then your page isn’t getting picked up, and you need to start working on getting it indexed.
Here are a few things to get you on your journey to solve the mystery of why this page isn’t getting indexed:
Check for NOINDEX, NOFOLLOW.
A good way to stay out of the index is to utilize a robots meta tag in the <head> section of a page. What it does is tells the engine bots not to index the page, and not to scan it for any links to follow. Check for this tag first, remove it or edit it if you find it. To check, simply navigate to the page in question and right click, selecting View Page Source. The tag will look something like this: <META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>
Check the /Robots.txt File for Blocked URLs.
Navigate to your robots.txt file (for instance, www.yourdomain.com/robots.txt) and be sure you don’t have any rules on there that are keeping your page from getting indexed. It might be located in a subdirectory that you’ve instructed the engines to ignore and not crawl. Double check what you’ve indicated in the Disallow: sections and be sure it’s not the cause. You can also see the content of your /robots.txt file within your Google Webmaster Tools account, under the Health, Blocked URLs section. Based on it’s contents, Webmaster tools will even tell you how many URLs are ultimately blocked.
Replacement Page? Check Your Redirects.
If the page not getting indexed is a new URL for an old page, or just a new page altogether that you want to replace an old one with, be sure you’ve taken the appropriate measures to make the transition smoothly. Since the old page was likely up for a while, and might have a lot of links pointing to it, Google will probably prioritize that one over your new page, for a while at least. Be sure you’ve implemented a proper 301 permanent redirect from the old page to the new page, indicating that all traffic be directed to this new URL. While some of it will be lost, it will still pass on some of the link juice to the new URL. If you have this redirect accurately set up, even if searchers find the old URL they’ll be redirected to the new one, as with the search engine bots. This will hopefully alert Google to the fact that this page should now be indexed.
Check Your XML Sitemap.
Don’t have one? Start creating one, pronto. An XML Sitemap is essentially a table of contents for the search engines, listing out the URLs on your site. It would most likely live here: www.yourdomain.com/sitemap.xml; go check to see if you have one. If not, then check with your CMS to see if it can create one for you (many automatically update it when you add or change content. Win!). If it does not, you can easily download an XML Sitemap generator like GSiteCrawler and create one yourself. If you do already have one, be sure to update it anytime changes are made to your site; for instance, if pages are removed or added, run a new one. The key here is to be sure you submit it to the engines once you’ve got it up and running. In your Webmaster Tools account, navigate to the Optimization section, and select Sitemaps. Here you can add a new sitemap, and see how many URLs are submitted as well as indexed. While the bots can crawl your site naturally, this just gives them another outlet to find and crawl more URLs on your site. Think of it as you doing your due diligence. You’ll likely see more pages getting indexed after you submit one.
Check Your XML Sitemap, Again.
One thing the engine bots have started doing is push for cleaner XML Sitemaps. If you go back and check yours, do you see the new URL? Good. Do you also see the old URL? Get that out of there. Be sure to comb through the URLs and clean up the ones that you don’t want indexed, or that you’ve replaced with new pages. As long as they are in there, the bots will find them fair game to crawl and index. Also remove any DUST, Different URLS with the Same Text. This could be as simple as checking the XML Sitemap for duplicate URLs, one with and one without a trailing slash. Check which one actually loads on your site and remove the other from the list.
Submit That Specific Page URL to the Index.
Utilizing your Webmaster Tools accounts again, navigate to Health, Fetch as Google. If you enter the page information, and click FETCH, the URL will be listed below; then you’ll want to click Submit to Index, hopefully getting a notification reading URL submitted to index. It still may not get picked up right away, but you should be sure to do this to cover all your bases.
Check for Duplicate Content.
You may not even realize it, but you could have more than one URL with the same content contained in it. This is definitely frowned upon by the engines, and you need to resolve it as soon as you can. One way of fixing this is by implementing a 301 permanent redirect from an old page to a new one, as discussed earlier. This will fix the duplicate content issue in that instance. Another way is to tell Google straight up that you have two pages with similar content, through a canonical tag. A canonical tag tells Google that you have another page with the same content, and to give the credit to the primary URL. This way, you can keep both URLs up, but not suffer the consequences from engine penalties. A canonical tag looks like this: <link rel=”canonical” href=”http://www.yourdomain.com/PageYouWantToGiveCreditTo”/> You can place this tag on each page that is a duplicate of the primary, inserting it into the <head> sections. You can also do this with mobile pages that are similar to your main site pages, using switchboard tags.
Check for Links to That Page.
If you are still having issues after taking all of the above measures, see if anyone has linked to your content. Chances are, even though people can get to it through your site, people just may not be linking to it yet. When an external site links to your page, it alerts Google to it’s whereabouts, helping it get picked up. Try talking about the new content via social media channels, linking back to the page. Also try linking to it in a new press release. Something that might even help is to add a more prominent link to that page on your site. If it’s new and buried pretty deep, the bots may just not be getting to it when crawling your site. Try placing it somewhere on the homepage, giving it a better chance of getting crawled during the bots visit.
Hurry Up & Wait.
Now, after you’ve taken every measure you can think of to make that page search engine friendly and indexable, sit and wait. Check back every day to see if it’s gotten picked up, but sometimes it might just take a while for the bots to get back to your site to crawl it.