We have built our own website crawler tool that is able to detect features that help (or hinder) local search rankings — as well as normal organic ranking factors.
Sometimes our crawler runs into issues. There are most common reasons for not being to crawl a website:
1) Crawler is blocked by a third-party service such as CloudFlare
Services like Cloudflare are used to protect websites from being accessed by malicious services.
Obviously, we're not malicious but Cloudflare doesn't always know that so we can get occasionally stopped by them.
2) Website structure is non-standard
Websites can be built in many different ways. We have a crawl success rate of 93% but from time-to-time, we’re asked to crawl a site that has implemented its navigation in a way that our crawler can't read. This means we can access the website homepage but can't access deeper pages because we’re not able to identify the links on the homepage to navigate from.
We are continually extending the capabilities of our crawler to handle new and different navigation approaches. If you encounter an issue crawling a website please report it to our customer success team and they will create a development ticket for that site structure to be looked at.
It is also worth checking whether unsuccessful crawls are confined to the BrightLocal crawler. You can quickly check whether Google can crawl your site by going to Google and searching using this query:
If Google displays lots of results for this specific domain then it is being crawled by Google. If this is the case, then the crawling issue is confined to BrightLocal, and we'll try to fix it for you.
3) Pages are not interlinked
Our crawler may not be able to find standalone pages which aren’t linked to from other pages on your website.
Our crawler ignores the sitemap.xml file and scans through each page looking for links to other pages on your website. This task is repeated for each page our crawler finds.
If your website has pages which are not interlinked, then our crawler cannot reach those pages even if those pages are indexed by Google and other search engines.