Get Update with SEO News and prepare your knowledge about SEO for Organic and paid SEO.
ASK crawler
Q: What is a website crawler?
A: A website crawler is a software program designed to follow hyperlinks throughout a Web site, retrieving and indexing pages to document the site for searching purposes. The crawlers are innocuous and cause no harm to an owner's site or servers.
Q: Why does Ask use website crawlers?
A: Ask utilizes website crawlers to collect raw data and gather information that is used in building our ever-expanding search index. Crawling ensures that the information in our results is as up-to-date and relevant as it can possibly be. Our crawlers are well designed and professionally operated, providing an invaluable service that is in accordance with search industry standards.
Q: How does the Ask crawler work?
* The crawler goes to a Web address (URL) and downloads the HTML page.
* The crawler follows hyperlinks from the page, which are URLs on the same site or on different sites.
* The crawler adds new URLs to its list of URLs to be crawled. It continually repeats this function, discovering new URLs, following links, and downloading them.
* The crawler excludes some URLs if it has downloaded a sufficient number from the Web site or if it appears that the URL might be a duplicate of another URL already downloaded.
* The files of crawled URLs are then built into a search catalog. These URL's are displayed as part of search results on the site powered by Ask's search technology when a relevant match is made.
Q: How frequently will the Ask Crawler download pages from my site?
A: The crawler will download only one page at a time from your site (specifically, from your IP address). After it receives a page, it will pause a certain amount of time before downloading the next page. This delay time may range from 0.1 second to hours. The quicker your site responds to the crawler when it asks for pages, the shorter the delay.
Q. Can I prevent Teoma/Ask search engine from showing a cached copy of my page?
A: Yes. We obey the "noarchive" meta tag. If you place the following command in your HTML page, we will not provide an archived copy of the document to the user.
< META NAME = "ROBOTS" CONTENT = "NOARCHIVE" >
If you would like to specify this restriction just for Teoma/Ask, you may use "TEOMA" in place of "ROBOTS".
Q: Does Ask observe the Robot Exclusion Standard?
A: Yes, we obey the 1994 Robots Exclusion Standard (RES), which is part of the Robot Exclusion Protocol. The Robots Exclusion Protocol is a method that allows Web site administrators to indicate to robots which parts of their site should not be visited by the robot. For more information on the RES, and the Robot Exclusion Protocol, please visit http://www.robotstxt.org/wc/exclusion.html.
Q: Can I prevent the Ask crawler from indexing all or part of my site/URL?
A: Yes. The Ask crawler will respect and obey commands that direct it not to index all or part of a given URL. To specify that the Ask crawler visit only pages whose paths begin with /public, include the following lines:
# Allow only specific directories
User-agent: Teoma
Disallow: /
Allow: /public
Q: Where do I put my robots.txt file?
A: Your file must be at the top level of your Web site, for example, if www.mysite.com is the name of your Web site, then the robots.txt file must be at http://www.mysite.com/robots.txt.
Q: How can I tell if the Ask crawler has visited my site/URL?
A: To determine whether the Ask crawler has visited your site, check your server logs. Specifically, you should be looking for the following user-agent string:
User-Agent: Mozilla/2.0 (compatible; Ask Jeeves/Teoma)
Q: How can I prevent the Ask crawler from indexing my page or following links from a particular page?
A: If you place the following command in the section of your HTML page, the Ask crawler will not index the document and, thus, it will not be placed in our search results:
< META NAME = "ROBOTS" CONTENT = "NOINDEX" >
The following commands tell the Ask crawler to index the document, but not follow hyperlinks from it:
< META NAME = "ROBOTS" CONTENT = "NOFOLLOW" >
You may set all directives OFF by using the following:
< META NAME = "ROBOTS" CONTENT = "NONE" >
See http://www.robotstxt.org/wc/exclusion.html#meta for more information.
Q: Why is the Ask crawler downloading the same page on my site multiple times?
A: Generally, the Ask crawler should only download one copy of each file from your site during a given crawl. There are two exceptions:
* A URL may contain commands that "redirect" the crawler to a different URL. This may be done with the HTML command:
< META HTTP-EQUIV="REFRESH" CONTENT="0; URL=http://www.your page address here.html" >
or with the HTTP status codes 301 or 302. In this case the crawler downloads the second page in place of the first one. If many URLs redirect to the same page, then this second page may be downloaded many times before the crawler realizes that all these pages are duplicates.
* An HTML page may be a "frameset." Such a page is formed from several component pages, called "frames." If many frameset pages contain the same frame page as components, then the component page may be downloaded many times before the crawler realizes that all these components are the same.
Q: Why is the Ask crawler trying to download incorrect links from my server? Or from a server that doesn't exist?
A: It is a property of the Web that many links will be broken or outdated at any given time. Whenever any Web page contains a broken or outdated link to your site, or to a site that never existed or no longer exists, Ask will visit that link trying to find the Web page it references. This may cause the crawler to ask for URLs which no longer exist or which never existed, or to try to make HTTP requests on IP addresses which no longer have a Web server or never had one. The crawler is not randomly generating addresses; it is following links. This is why you may also notice activity on a machine that is not a Web server.
Q: How did the Ask Website crawler find my URL?
A: The Ask crawler finds pages by following links (HREF tags in HTML) from other pages. When the crawler finds a page that contains frames (i.e., it is a frameset), the crawler downloads the component frames and includes their content as part of the original page. The Ask crawler will not index the component frames as URLs themselves unless they are linked via HREF from other pages.
Q: What types of links does the Ask crawler follow?
A: The Ask crawler will follow HREF links, SRC links and re-directs.
Q. Can I control the rate at which the Ask crawler visits my site?
A. Yes. We support the "Crawl-Delay" robots.txt directive. Using this directive you may specify the minimum delay between two successive requests from our spider to your site.
Q: Why has the Ask crawler not visited my URL?
A: If the Ask crawler has not visited your URL, it is because we did not discover any link to that URL from other pages (URLs) we visited.
Q: Does Ask crawler support HTTP compression?
A: Yes, it does. Both HTTP client and server should support this for the HTTP compression feature to work. When supported, it lets webservers send compressed documents (compressed using gzip or other formats) instead of the actual documents. This would result in significant bandwidth savings for both the server and the client. There is a little CPU overhead at both server and client for encoding/decoding, but it is worth it. Using a popular compression method such as gzip, one could easily reduce file size by about 75%.
Q: How do I register my site/URL with Ask so that it will be indexed?
A: We appreciate your interest in having your site listed on Ask.com and the Ask.com search engine. Your best bet is to follow the open-format Sitemaps protocol, which Ask.com supports. Once you have prepared a sitemap for your site, add the sitemap auto-discovery directive to robots.txt, or submit the sitemap file directly to us via the ping URL. (For more information on this process, see Does Ask.com support sitemaps?) Please note that sitemap submissions do not guarantee the indexing of URLs.
Create your Web site and set up your Web server to optimize how search engines look at your site's content, and how they index and trigger based upon different types of search keywords. You'll find a variety of resources online that provide tips and helpful information on how to best do this.
Q: Why aren't the pages the Ask crawler indexed showing up in the search results at Ask.com?
A: If you don't see your pages indexed in our search results, don't be alarmed. Because we are so thorough about the quality of our index, it takes some time for us to analyze the results of a crawl and then process the results for inclusion into the database. Ask does not necessarily include every site it has crawled in its index.
Q: Can I control the crawler request rate from Ask spider to my site?
A: Yes. We support the "Crawl-Delay" robots.txt directive. Using this directive you may specify the minimum delay between two successive requests from our spider to your site.
Q. How do I authenticate the Ask Crawler?
A: A. User-Agent is no guarantee of authenticity as it is trivial for a malicious user to mimic the properties of the Ask Crawler. In order to properly authenticate the Ask Crawler, a round trip DNS lookup is required. This involves first taking the IP address of the Ask Crawler and performing a reverse DNS lookup ensuring that the IP address belongs to the ask.com domain. Then perform a forward DNS lookup with the host name ensuring that the resulting IP address matches the original.
Q: Does Ask.com support sitemaps?
A: Yes, Ask.com supports the open-format Sitemaps protocol. Once you have prepared the sitemap, add the sitemap auto-discovery directive to robots.txt as follows:
SITEMAP: http://www.the URL of your sitemap here.xml
The sitemap location should be the full sitemap URL. Alternatively, you can also submit your sitemap through the ping URL:
http://submissions.ask.com/ping?sitemap=http%3A//www.the URL of your sitemap here.xml
Please note that sitemap submissions do not guarantee the indexing of URLs. To learn more about the protocol, please visit the Sitemaps web site at http://www.sitemaps.org.
Q: How can I add Ask.com search to my site?
A: We've made this easy, you can generate the necessary code here.
Subscribe to:
Comments (Atom)