Evangelism by Search Engine – Part 2 - Basic Requirements
Posted by Bill Anderton
Today’s blog posting will address the most basic of requirements for doing Evangelism by Search Engine: having a website and making its pages “crawlable” in order to feed the indexes of the search engines.
Search engines work by using high-automated purpose-built application software that reside on the search engines’ servers to crawl the web, going from website to website, harvesting the text on the page along with other information and metadata.
The crawlers are, in essence, very specialized web browsers that instead of displaying pages to a screen for a human to read, harvest the information on pages they crawl and write it to data bases on the search engines' servers. The data harvested are then further processed by the search engines using additional software processes to build their indexes of websites that are searched when we enter a query into the search engine.
As an aside, the metaphor of these automated robots crawling the web has led to crawlers also being called “spiders” in web jargon.
In order to do Evangelism by Search Engine, we first have to have web pages that can be found, crawled and its information harvested and entered into the search engines’ indexes that are used for searching. If we don’t, our web pages and even whole websites are “non-entities” in any search; our pages and sites can’t be presented to users because they are missing from the indexes.
The first requirement of doing Evangelism by Search Engine is to …. well …. have a website! I’m not trying to be sarcastic with this rather obvious statement; this is worth mentioning because many churches simply don’t have any type of website. Some surveys have indicated that perhaps 25% of all churches in North America have no websites at all and, additionally, that perhaps 50% of all churches only have very-small rarely-updated websites of just a few pages.
Both of these circumstances will limit how effective Evangelism by Search Engine can be and will be individually addressed in this blog posting.
Obviously, to use search engines for evangelism, you have to a website. Yeah, I know, pretty basic. If you don’t have a website, get one! Also yeah, I know, easy for me to say. If your church currently doesn’t have a website by now, you are likely struggling with building one. However, fight the good fight; face your challenges and get your site built.
To help, see my development series that just finished:
If you don’t have any website at all, this Evangelism-by-Search-Engine series will be ahead of your ability to use its advice until you get your website up and running. However, keep these things in mind as you plan your new website. If you do, you will be able to go directly into evangelism upon the launch of your new website. Also, everything written about here is just basic common sense and best practices that will help your human visitors too.
For those of you who have very-small rarely-updated websites, I first have to introduce you to the “advantage of large numbers.” A website with only a few pages will not have much impact in the search engines (or with human visitors for that matter). Conversely, a website with lots of pages will make a far larger impact.
Sites with lots of pages will get at least proportionally more referrals from search engines than smaller sites and often disproportionally higher numbers. Simply put, a site with hundreds or thousands of pages will be sent many more visitors by the search engines than a church with three to six pages.
Referrals from searches engines are based users' queries matching individual web pages NOT websites as a whole. Sites with more pages will have statistically higher probabilities of matching more users' queries and, therefore, get more referrals.
It may surprise some of you to learn that even very-small churches have sufficient content to produce easily websites of two- to three-hundred pages. With a little work, these very-same very-small churches could easily increase their page inventory to a couple of thousand pages. I will write more about how even very-small churches can do this in future blog postings.
Any church, regardless of size or available budget, can produce enough pages to do Evangelism by Search Engine.
Large sites often have other advantages too that come about by having people administering the website who pay attention to optimizing their websites for being found by search engines. Websites that pay even basic attention (and place some value) to search engine optimization are always rewarded with even better results and disproportionally higher visitors than those that don’t. They get more seekers knocking on their virtual front doors than websites that do little or nothing.
With a working domain stocked with as many pages as possible, the next step is to make sure that all of you pages are crawlable; that the search engines’ web crawlers can find and crawl all of your pages for indexing.
If the search engine crawlers can’t find your pages to read them and report what they find back to in indexing algorithms on the search engines’ servers, you won’t show up in the search engines’ indexes. When people search on a query that might otherwise show one of your pages, you won’t be shown at all because your pages are absent from the indexes.
Some webmasters make the mistake of not making it easy for crawlers to find all of their pages. Review your site and make sure that all of your pages can be easily found. It is not only good advice for making it easy for crawlers to find your content; it also helps humans (your visitors) to find your content too. This is a good thing.
Crawlers find pages just like we humans do; they follow the hyperlinks embedded in web pages. They discover you site in the first place because other sites have linked to your site; this is what sends the crawler to you site in the first place. Once landed in your website, the crawler then looks for links to all of the pages contained within your site (as well as links to other sites as well.) By looking at the links in your site to your own pages, the crawler ultimately discovers all of the pages you make available in your browseable interface (all of the links that users can click upon to get around in your site.)
The links in your browseable interface can be in any of your navigation (primary, secondary and tertiary) as well as simply links that are embedded in the text of the page.
The designs of your various navigation schemes are very important not only for your human visitors but also for any crawlers visiting your site. Also remember that crawlers might not come in from your home page; they can come into your site from anywhere and land in any page in your site at any time. For example, one of the ways crawlers find your pages is by crawling other sites that might contain one or more links to one or more of your pages. The crawlers will use these links harvested from third-party sites to enter your site via the linked page (which might not be your home page.) From that landing page, the crawlers (and your human visitors too) should also be able to easily navigate to other pages in your site too.
Don’t lock up or hide your public content; make it easy to find with as few clicks as possible regardless of how or where the crawler (or human user) enters your site.
There are also two important technical things you can do to make your website very crawler friendly: put both a robot.txt file and a sitemap.xml file in your web root folder.
The robot.txt file is a simple text file that uses the Robot Exclusion Standard. See http://en.wikipedia.org/wiki/Robots_exclusion_standard and http://www.robotstxt.org/ for more information.
The robots.txt file lets any webmaster give certain instructions to crawlers coming into the websites. These instructions can be placed a text file called robots.txt in the root of the web site hierarchy (e.g. https://www.example.com/robots.txt). The robots.txt file contains certain defined instructions in a specific format. Well-behaved crawlers that elect to follow the instructions (compliance is voluntary on the part of the people building the crawler) first fetch this file and read the instructions before attempting to fetch any other file or page from the web site. If the robots.txt file doesn't exist, crawlers coming into any website assume that the webmaster wishes to provide no specific instructions, and crawls everything it can find in the entire website through the browsable interface.
The robots.txt file can be made with any program/text editor (like Notepad) to by various generators that can be found on the Internet.
Some crawlers also support a “sitemap” directive in the robots.txt file that allows the crawler to discover links to one or more sitemap.xml files in the same robots.txt in the file. The sitemap.xml file provides additional important information for the search engines that use the standard. Google invented the standard, but it is also supported by Bing, Yahoo and Ask.
The sitemap.xml protocol allows a webmaster to advise search engines about URLs in their website that are available for crawling. See http://en.wikipedia.org/wiki/Sitemaps and http://www.sitemaps.org/ for more information.
The sitemap.xml file lists the URLs for a site and certain other information like its relative importance, how often it is typically changed and the last time each listed URL was updated. The sitemap.xml provides this information to the search engines that allow them to plan, schedule and crawl the website that employ sitemaps.xml more intelligently.
Sitemaps are particularly beneficial on websites where:
Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover pages in websites. Using this protocol does not guarantee that web pages will be included in search indexes and their use does not influence the way that pages are ranked in search results.
The use of a sitemap.xml file can, however, greatly speed up the process of keeping the search engines’ indexes up to date. This is very important when visitors might be searching for timely events or other information. If you post something a week in advance of an event but it takes two weeks for the search engines to find new information and get it indexed, you are out of luck for having the search engines send visitors to you event information in a timely fashion. By using a sitemap, you can greatly reduce the time it takes for the search engines to update their indexes by making the search engines aware of what is new or what changed; updating the search engines indexes overnight is common and within hours or minutes can happen for highly trafficked sites.
Sitemaps.xml can be made manually with text editors according to the established standard or personal crawler-based applications that are made for doing your own crawls of your site and producing the properly-formatted XML file. Also, some advanced content management systems (CMS) automatically generate sitemap.xml files and update the sitemap.xml file every time a page in the site is updated or a new page added. This is very helpful because it tells the search engines very quickly of all new and updated pages in the site that should be crawled anew or re-crawled. This helps get new material into the search engines’ indexes as soon as possible or get the search engines to update their indexes when pages are modified. The use of these features in CMS-based sites requires much less “housekeeping” work but the sitemaps are kept up to date in real time.
All websites hoping to do Evangelism by Search engine should have both a robot.txt file and a sitemap.xml file that is update regularly (real-time or daily if possible and a couple of times per week if you are using a crawler to make your sitemap.xml file manually.)
You can test the efficacy of your basic work of getting your pages found and indexed by search engines by finding out and keeping track of the number of pages actually contained in the indexes of each search engine. You can see the content in the search engines’ indexes by putting the following command into the search box form of the search engines:
“site:http://www.mydomain.com” (without the quotes and replacing "www.mydomain.com: with your home URL)
Using this command, the search engine will return a list of all of the pages for the specified site that are currently in their index. The results pages will also show the count of the total pages in their indexes.
Getting your pages found by search engines and included in their indexes is only the first step in doing Evangelism by Search Engine. As basic as this step is, it will get you started and produce modest results without additional work.
However, in the remainder of this series, I will write about additional best practices you can use to make Evangelism by Search Engine much more effective. Many of the practices are extremely easy to do but will yield greatly improved results.
Tomorrow, I will write about how to write the text for your web pages. The words that you use in each of your pages have a big impact on determining how the pages are indexed and how many people find your pages.
Category: (05-13) May 2013 Tag:
This is only the blog's abstract. To read the full text and participate in all of the interactive features of the community, please register. It's FREE!