When browsing the internet, the go-to tool is a search engine. It works wonders, you just enter a keyword(s) and in a matter of milliseconds you will see millions of related results. This enormous database of results is ever-changing, with thousands of new webpages and websites being added and removed on a daily basis. With more than 1.8 billion active domains and trillions of webpages, it is mathematically impossible to sort through them manually. That is when search engines enlist the help of web crawlers, a.k.a. web spiders.
They are computer programs that scour the web, ‘reading’ everything they find. These automatic indexers scan appropriate web pages for the words & (depending on the spider) other forms of media and where these contents are used. You can think of a search engine bot as a librarian of sorts, taking the information found, sorting it into categories and indexing it so that it could be found again when a request is made.
These spiders collect not just the actual visible contents of the page, but also its URL, meta tags, links inside the page and the destinations from those links, along with other relevant information. The bots also keep a record of where the most important keywords are found, so having your keywords in the headings, metadata and the first few sentences in the page tells the spiders directly what your webpage is about, providing better SEO results.
Although web crawlers are designed and programmed by the search engine themselves, there exists a way for you to provide instructions on how you want the spiders to crawl your site. This is done by creating a robots.txt file with specific directives that tell the bots where to crawl and which places not to. To learn more about the robots.txt file and how to create it, check out this article.