[Python] python抓取网页及网页上所有连接的演示代码 →→→→→进入此内容的聊天室

来自 , 2020-02-07, 写在 Python, 查看 131 次.
URL http://www.code666.cn/view/c37a2122
  1. import urllib, htmllib, formatter, re, sys
  2.  
  3. url = sys.argv[1]
  4. website = urllib.urlopen("http://"+url)
  5. data = website.read()
  6. website.close()
  7. format = formatter.AbstractFormatter(formatter.NullWriter())
  8. ptext = htmllib.HTMLParser(format)
  9. ptext.feed(data)
  10. links = []
  11. links = ptext.anchorlist
  12. for link in links:
  13.    if re.search('http', link) != None:
  14.       print(link)
  15.       website = urllib.urlopen(link)
  16.       data = website.read()
  17.       website.close()
  18.       ptext = htmllib.HTMLParser(format)
  19.       ptext.feed(data)
  20.       morelinks = ptext.anchorlist
  21.       for alink in morelinks:
  22.          if re.search('http', alink) != None:
  23.             links.append(alink)
  24. #//python/7382

回复 "python抓取网页及网页上所有连接的演示代码"

这儿你可以回复上面这条便签

captcha