By Catherine Potts
Mar 25, 2009
More Articles by Catherine
What does one do when legacy videos on your site are not updating the Meta Data in Google rankings? You analyze.
What is a Google XML sitemap?
First, an HTML sitemap the “representation of the architecture” of a web site and helps humans navigate your web site. I tend toward the visual examples (see picture) so the map below is very helpful in understanding what is going on. As you can see the main site (in light blue) have a ton of pages branching out from them. A little small, yes, but these are a great example. It’s easy to see that the pages on the lower side of the sitemap have less visibility.
A Google XML Sitemap is:
The protocol of XML sitemaps also defines autodiscovery, i.e. how search engines can automatically discover website xml sitemaps. The answer is linking to the XML sitemap, e.g. sitemap.xml, from robots.txt.
Instead of just pointing to one XML sitemap file for auto discovery, you can list multiple sitemaps:
Or point to XML sitemap index file:
Information about XML sitemaps protocol:
- Each XML sitemap file can contain max 50.000 urls and be 10 mb in size.
- It is possible to link 1000 XML sitemaps using a sitemap index file.
- You can read our article about page priorities in XML sitemaps.
- XML sitemap files and sitemap index files have to be stored as UTF-8 documents.
Example of XML sitemaps file:
<?xml version=”1.0″ encoding=”UTF-8″?>
Part of the issue is that our client has such a huge site and is always adding content. Every day, there is information and pages being added so content is never really “retired” officially. It just ends up so far down the list, it’d essentially dropped off the linked map. The Googlebot is unable to index (autodiscovery) and thus it ends up appearing as a gateway or doorway page. For our client, all the information is useful so helping keep those sites indexed is key. This is why links are important to keeping content up to date and indexed by Google.
We took 5 sample pages of legacy videos from our client’s site that had duplicate Meta descriptions and titles in Google:
There are really no inbound links indexed-via the sitemap OR other website pages, which allows Googlebot to get back in and re-index these videos and their new information. We have three possible fixes:
1. Internal linking-Link the pages from somewhere within the site.
2. Create smaller, individual section sitemaps, including these videos, so that they can be linked from the sitemap and Googlebot can find them. (Better long term solution.)
3. Build a few inbound links from other websites to these videos.
Some pages are indexed and some are not. There was a LIVE page on our client’s site that was not indexed. It was in the sitemap, though very low (the lower, the less visible)-making it more of a possibility that it’s not getting indexed. Additionally, there were no inbound links to this particular page according to both Google and Yahoo.
Same as previous issue. We need the Googlebot to find the pages. So we need to do some internal linking, as well as inbound linking and individual section sitemaps.
These pages may be seen as Doorway Pages by Google. Why is this a problem?
Doorway pages are intended to deceive the search engine in order to gain higher rankings. Such pages are particularly meant for spiders and upon landing on a doorway page it instantly redirects to the “real” website.
“As a rule of thumb, if you can’t reach the page by following the site navigation, then it is a doorway page. You are not supposed to “visit” the page. Instead, you are just supposed to find it in the search results and then click through to get to the site in question. In essence, a doorway page is no more than a one-page click-through advertisement for a website.” SEO Logic
Here is what Google has to say about it:
“Whether deployed across many domains or established within one domain, doorway pages tend to frustrate users, and are in violation of our webmaster guidelines. Therefore, we frown on practices that are designed to manipulate search engines and deceive users by directing them to sites other than the ones they selected, and that provide content solely for the benefit of search engines. Google may take action on doorway sites and other sites making use of these deceptive practices, including removing these sites from the Google index.”
The best solution to fix the unintended issue is internal links to the page within the client’s site. What we often suggest to clients who are not interested in promoting certain legacy material is to create an HTML sitemap that is kind of hidden in the footer. That sitemap can have dozens of links on it which Googlebot CAN follow. However, Googlebot only indexes something like the first 1M of data on a page so it’s possible to run out of room.
According to Google, they do say that the Google XML sitemap can have up to 50,000 URLs or 10MB -it’s good to know how many URLs are in your site’s sitemap.