• shanghailongfeng
  • shanghailongfeng
  • How do you find spider crawling rules through web logs

    for you webmaster, Baidu included is concerned about the most important. Understanding Baidu spider crawling law, so as to better improve the collection of the situation is also must master. Many websites are currently using virtual space and can provide logs. Log refers to the root folder in the web site under the logfiles folder, which dates.Txt text file, there are a lot of introduction, through the HTTP view, return the command that way to view spiders, here is not introduced. More websites now do not provide a log format that can be viewed by software. More like the following log format, as follows:

    , 03:28:34, GET, /goods.php,, 20034696, 390

    first 03:28:34 access time

    The page /goods.php accessed by

    second, GET, and get represents the access to

    third accesses the source IP

    of the web site

    fourth 200 successfully access

    fifth 34696390 represents the record size

    is the format of the log, how to analyze, a look at the head are big. Log every website has more than 1M, thousands of records will not see dizziness.

    attention, tell everyone a tip. After long-term observation, found that Baidu’s spider source, server, IP address, is a domain under the following network segment. What do you mean, that is, all of them start with 202.108, and the IP addresses are similar to IP? 202.108.X.X?. The IP address of this network segment is located in Beijing Netcom cable building, belonging to the backbone of the national Internet backbone, and now this section of IP has disappeared. Then, log out of your log and use ctrl+f to find out if there is a IP for this segment. Some words, just look for the time to visit, then you can find out the time Baidu spider access your web site of the law. That is the lever for updates the role of ah.

    finally, www.521dyy.cn welcome Paizhuan, absolutely original own experience. Please go down, thanks to

    Leave a Reply

    Your email address will not be published. Required fields are marked *