[Java] java爬虫 扒小说 →→→→→进入此内容的聊天室

来自 , 2019-07-06, 写在 Java, 查看 108 次.
URL http://www.code666.cn/view/83fa5a43
  1. import java.io.BufferedWriter;
  2. import java.io.File;
  3. import java.io.FileWriter;
  4.  
  5. import org.jsoup.Jsoup;
  6. import org.jsoup.nodes.Document;
  7. import org.jsoup.safety.Whitelist;
  8.  
  9. public class App {
  10.     static Document doc;
  11.  
  12.     public static String getContent(int id) throws Exception {
  13.         doc = Jsoup.connect("http://www.xstxt.com/fanrenxiuxianchuan/" + id)
  14.                 .timeout(30000).get();
  15.         String title = doc.title();
  16.         title = title.substring(6, title.length() - 8);
  17.         // System.out.println(title);
  18.  
  19.         String txt = doc.getElementById("booktext").toString();
  20.         txt = Jsoup.clean(txt, Whitelist.none());
  21.         txt = txt.replaceAll(" ", "\\n");
  22.  
  23.         txt = txt.replace("\\n\\n", "\\n").replace("\\n\\n", "\\n")
  24.                 .replace("\\n\\n", "\\n").replace("\\n \\n ", "\\n")
  25.                 .replace("\\n\\n", "\\n");
  26.  
  27.         // System.out.println(txt);
  28.         return title + txt;
  29.     }
  30.  
  31.     public static void main(String[] args) throws Exception {
  32.         String filename = "z:/dd.txt";
  33.         BufferedWriter bw = new BufferedWriter(new FileWriter(filename));
  34.         String str = "";
  35.  
  36.         for (int i = 0; i < 1000; i++) {
  37.             System.out.println(i);
  38.             str = getContent(1071907+i);
  39.  
  40.             bw.write(str);
  41.             bw.write("\\n\\n");
  42.  
  43.         }
  44.  
  45.         bw.close();
  46.  
  47.     }
  48. }
  49.  
  50. //源代码片段来自云代码http://yuncode.net
  51.                        

回复 "java爬虫 扒小说"

这儿你可以回复上面这条便签

captcha