[Python] scrapy自定义pipeline类将采集数据保存到mongodb →→→→→进入此内容的聊天室

来自 , 2019-03-23, 写在 Python, 查看 126 次.
URL http://www.code666.cn/view/3a09a524
  1. # Standard Python library imports
  2.  
  3. # 3rd party modules
  4. import pymongo
  5.  
  6. from scrapy import log
  7. from scrapy.conf import settings
  8. from scrapy.exceptions import DropItem
  9.  
  10.  
  11. class MongoDBPipeline(object):
  12.     def __init__(self):
  13.         self.server = settings['MONGODB_SERVER']
  14.         self.port = settings['MONGODB_PORT']
  15.         self.db = settings['MONGODB_DB']
  16.         self.col = settings['MONGODB_COLLECTION']
  17.         connection = pymongo.Connection(self.server, self.port)
  18.         db = connection[self.db]
  19.         self.collection = db[self.col]
  20.  
  21.     def process_item(self, item, spider):
  22.         err_msg = ''
  23.         for field, data in item.items():
  24.             if not data:
  25.                 err_msg += 'Missing %s of poem from %s\n' % (field, item['url'])
  26.         if err_msg:
  27.             raise DropItem(err_msg)
  28.         self.collection.insert(dict(item))
  29.         log.msg('Item written to MongoDB database %s/%s' % (self.db, self.col),
  30.                 level=log.DEBUG, spider=spider)
  31.         return item
  32. #//python/8390

回复 "scrapy自定义pipeline类将采集数据保存到mongodb"

这儿你可以回复上面这条便签

captcha