There are many reasons why your website or pages from your website don’t get indexed, or got dropped from Google or Bing’s index. The reasons range from faulty coding practices, hardware problems to poor content, and might require an investigation from an SEO expert. If your content is not being indexed, here are the issues you should be investigating to get Google / Bing to readily find your content.
You have a missing or faulty Robot.txt file. Or you have mistakenly disallowed crawlers from indexing the site or selected pages. It may also be as simple as your XML sitemap is not referenced in the Robot.txt file.
You have updated your site, and your new URL 301 redirect process is faulty. Using Java script redirects instead of traditional 301 redirects can be problematic for crawlers. Another issue is the practice (when completely overhauling a website) of executing a mass 301 redirection to a single page at the new website.
You may have a simple canonical problem – your preferred URL is not set in Webmaster Tools, or you did not add canonical preferences to your XML sitemap.
You may have a faulty, partial, or malformed XML sitemap (i.e., broken links etc.). XML sitemaps should have a link to every page on your website.
Having a partial sitemap defeats the purpose of the XML sitemap protocol.
Your programmer included the Noindex Meta tags in the source code.
Your website has been hacked.
Your website was deemed spam by Google’s spam detection process because of questionable content, links or intent.
You may have mis-configured URL Parameters options in Google Webmaster tools, or you did not add and verify your new domain in your Webmaster account.
Your ISP may have had server down problems when the crawler tried to visit your website. If this occurs several times the crawler assumed you were out of business. It’s also possible that the firewall is blocking the crawler.
Your website is performance challenged, and takes an excessive amount of time to load. This could generate a time out, or the search engine could designate the page as low-quality, and choose not to index the content.
Your website is problematic for crawlers because the page is dynamically generated, so the crawler is unable to parse the content because the URLs contain excessive parameters or session ID’s. It’s a better practice to present static HTML pages to the crawler when it comes to indexing your website.
Your critical content is password protected or behind registration forms.
Your website has poor link or coding quality. This can happen when you do not bother to validate the source code to identify broken links and incomplete arguments before going live. Another reason is that the support for the If-Modified-Since HTTP header is missing from the code.
Less Common But Possible
Your Website has violated Google’s quality guidelines. This could include using deceptive SEO practices such as cloaking. Other quality issues include:
- The page contains too many links.
- The page lacks clear descriptive text (text poor).
- The page is image heavy with no descriptive file names or Alt Text.
- The website lacks a clear navigational hierarchy.
- The website has missing Title / Description text.
- The website has orphan pages – there are no static text links to the pages.
- Your website generates a large number of 404 errors.
All these are reasons why search engines will decide not to index a page or drop a site / page from their search index. Google knows of millions of URLs that have not been indexed, and probably never will because they have been classified as spam, or as low-quality content. Google is in the business of indexing high-quality content, and is not interested in allocating resources to websites that are poorly coded with marginal content.