How to use robots.txt correctly and detailed

can use # annotated in the documents, the use of specific methods and UNIX in practice. The file records usually start with one or more rows of User-agent, followed by the number of Disallow and Allow, the details are as follows:

robots is an important channel of communication with the spider site, the site through the robots file to declare the site does not want to be included in search engines or part of the specified search engine only included a specific part. Please note that only if your site contains not to be included in the search engine content, only need to use the robots.txt file. If you want all the content included in the search engine website, do not create robots.txt files.

What is the robots file

: " "

robots file is often placed in the root directory, containing one or more records, these records by separate blank lines (CR, CR/NL, or, NL as the terminator), each record format is shown below:

User-agent: the value of the item is used to describe the robot search engine name. " robots.txt" file, if there are multiple User-agent records that have more than one robot by " robots.txt" restrictions on the file, must have at least one User-agent record. If the value is set to *, to any robot are effective in " robots.txt" " User-agent:*" file, only such a record. If the " robots.txt" " adding file, User-agent:SomeBot" and a plurality of Disallow and Allow, so called " SomeBot" User-agent:SomeBot" by " Disallow and Allow behind the limit line.

Disallow: the value of the item for a group of URL do not want to be accessed, this value can be a complete path, can also be a non empty prefix path, with the value of the Disallow entry will not be at the beginning of the URL robot access. For example, " Disallow:/help" /help.html, /helpabc.html, robot to prohibit access to /help/index.html, Disallow:/help/&qu> and "

Leave a Reply

Your email address will not be published. Required fields are marked *