We have built our own website crawler tool that's able to detect features that can help (or hinder) local search rankings — as well as normal organic ranking factors.
However, our crawler sometimes runs into issues and cannot to crawl a website successfully, which is usually due to the following reason(s):
1) Crawler is blocked by a third-party service such as CloudFlare
Services like Cloudflare are used to protect websites from being accessed by malicious services.
Cloudflare doesn't always know that we are perfectly safe, so we can occasionally get stopped by them.
2) Website structure is non-standard
Websites can be built in many different ways. Our crawl success rate of 93% is very high, but we’re sometimes asked to crawl a site that has implemented its navigation in a way that our crawler can't read — this means we can access the website's home page, but can't access deeper pages because we’re unable to identify the links on the home page to navigate from.
We are continually extending the capabilities of our crawler to handle new and different navigation approaches. Should you encounter an issue when crawling a website, please report it to our customer success team, who will create a development ticket so they can investigate the site's structure.
It's also worth checking whether unsuccessful crawls are confined to the BrightLocal crawler. You can quickly check whether Google is able crawl your site by going to Google and using this query to search:
If Google displays lots of results for this specific domain, then it is being crawled by Google, and the crawling issue is, therefore, confined to BrightLocal. In these instances, we'll try to fix it for you.
3) Pages are not interlinked
As our crawler ignores the sitemap.xml file and scans through each page looking for links to other pages on your website, it may not be able to find standalone pages that don't have such a link.
If your website has pages which are not interlinked, our crawler cannot reach those pages — even if those pages are indexed by Google and other search engines.