圖片路徑存儲且item的json化是怎樣的-創新互聯

圖片路徑存儲且item的json化是怎樣的，針對這個問題，這篇文章詳細介紹了相對應的分析和解答，希望可以幫助更多想解決這個問題的小伙伴找到更簡單易行的方法。

為瑞昌等地區用戶提供了全套網頁設計制作服務，及瑞昌網站建設行業解決方案。主營業務為成都網站設計、成都網站制作、瑞昌網站設計，以傳統方式定制建設網站，并提供域名空間備案等一條龍服務，秉承以專業、用心的態度為用戶提供真誠的服務。我們深信只要達到每一位用戶的要求，就會得到認可，從而選擇與我們長期合作。這樣，我們也可以走得更遠！

1.item_completed()方法

語法：item_completed(results, items, info)；
當一個單獨項目中的所有圖片請求完成時（不管下載成功或者失敗），ImagesPipeline.item_completed() 方法將被調用。item_completed()方法必須返回將發送到后續item pipeline階段的輸出，因此必須返回或刪除item（默認情況下item_completed會返回全部item）；

2.在pipline中重寫item_completed方法

在ImagePipeline中重寫item_completed方法獲取圖片的保存路徑

class ImagePipeline(ImagesPipeline):
  def file_path(self, request, response=None, info=None):
      ## start of deprecation warning block (can be removed in the future)
      def _warn():
          from scrapy.exceptions import ScrapyDeprecationWarning
          import warnings
          warnings.warn('ImagesPipeline.image_key(url) and file_key(url) methods are deprecated, '
                        'please use file_path(request, response=None, info=None) instead',
                        category=ScrapyDeprecationWarning, stacklevel=1)
      # check if called from image_key or file_key with url as first argument
      if not isinstance(request, Request):
          _warn()
          url = request
      else:
          url = request.url
      # detect if file_key() or image_key() methods have been overridden
      if not hasattr(self.file_key, '_base'):
          _warn()
          return self.file_key(url)
      elif not hasattr(self.image_key, '_base'):
          _warn()
          return self.image_key(url)
      ## end of deprecation warning block
      image_guid = hashlib.sha1(to_bytes(url)).hexdigest()  # change to request.url after deprecation
      # 修改為時間為目錄
      return '{}/{}.jpg'.format(datetime.now().year,image_guid)
  def item_completed(self, results, item, info):
      # 獲取圖片地址保存到列表中
      values = [value['path'] for ok, value in results if ok]
      # 給item賦值
      item['image_path'] = values.pop(0) if values else 'default.jpg'
      return item

3.創建md5函數

我們可以使用scrapy中的hashlib.md5 處理 url，首先在項目settings文件的同一目錄下，創建一個叫utils的package，然后在這個包里創建一個md5文件；使用之前先從hashlib中導入md5，把hashlib中md5()實例化，然后用update傳入url，再用 hexdigest() 提取摘要。還可以使用isinstance()來判判斷傳入值編碼類型，使用encode()方法將unicode編碼轉換成其他編碼的字符串等；
```
from hashlib import md5
def get_md5(url):
if isinstance(url, str):
    # 先轉化為字節碼
    url = url.encode()
    print(url)
obj = md5()
obj.update(url)
return obj.hexdigest()
if __name__ == '__main__':
print(get_md5('www.baidu.com'))
```

4. 在item中添加字段

import scrapy
class XkdDribbbleSpiderItem(scrapy.Item):
    title = scrapy.Field()
    image_url = scrapy.Field()
    date = scrapy.Field()
    # 添加圖片路徑到item中
    image_path = scrapy.Field()
    # 加頁面的url地址添加到item中
    url = scrapy.Field()
    # 添加url的哈希值字段
    url_id = scrapy.Field()

5. 將item在spider中返回

import scrapy
from urllib import parse
from scrapy.http import Request
from datetime import datetime
from ..items import XkdDribbbleSpiderItem
from ..utils.md5_tool import get_md5
class DribbbleSpider(scrapy.Spider):
    name = 'dribbble'
    allowed_domains = ['dribbble.com']
    start_urls = ['https://dribbble.com/stories']
def parse(self, response):
        # 獲取a標簽的url值
        # selector
        a_selectors = response.css('div.teaser a')
        for a_selector in a_selectors:
            image_url = a_selector.css('img::attr(src)').extract()[0]
            page_url = a_selector.css('::attr(href)').extract()[0]
            yield Request(url=parse.urljoin(response.url, page_url), callback=self.parse_analyse,meta={'a_image_url': image_url})
    def parse_analyse(self, response):
        title = response.css('header h2::text').extract_first()
        image_url = response.meta.get('a_image_url')
        date_raw = response.css('p span.date::text').extract()[0]
        date_str = date_raw.strip()
        date = datetime.strptime(date_str, '%b %d, %Y').date()
        item = XkdDribbbleSpiderItem()
        item['title'] = title
        item['image_url'] = [image_url]
        item['date'] = date
        item['url'] = response.url
        item['url_id'] = get_md5(response.url)
        # item數據模型進行落地，數據持久化
        yield item

6.創建JsonSavePipeline，用于寫入item到文件中

import codecs
import json
class JsonSavePipeline:
    def process_item(self, item, spider):
        # 將spider中返回的item轉化為字典
        file = codecs.open('blog.json', mode='a')
        dict_item = dict(item)
        # 將字典json化
        line = json.dumps(dict_item, ensure_ascii=False) + '\n'
        # 寫入到文件
        file.write(line)
        # 再次返回item
        file.close()

7.在settings文件中添加JsonSavePipeline

'XKD_Dribbble_Spider.pipelines.JsonSavePipeline': 2,

關于圖片路徑存儲且item的json化是怎樣的問題的解答就分享到這里了，希望以上內容可以對大家有一定的幫助，如果你還有很多疑惑沒有解開，可以關注創新互聯-成都網站建設公司行業資訊頻道了解更多相關知識。

名稱欄目：圖片路徑存儲且item的json化是怎樣的-創新互聯
文章出自：http://vcdvsql.cn/article34/pghpe.html

成都網站建設公司_創新互聯，為您提供網站內鏈、外貿建站、手機網站建設、微信公眾號、動態網站、網站建設

聲明：本網站發布的內容（圖片、視頻和文字）以用戶投稿、用戶轉載內容為主，如果涉及侵權請盡快告知，我們將會在第一時間刪除。文章觀點不代表本網站立場，如需處理請聯系客服。電話：028-86922220；郵箱：631063699@qq.com。內容未經允許不得轉載，或轉載時需注明來源：創新互聯

猜你還喜歡下面的內容

bl双性强迫侵犯h_国产在线观看人成激情视频_蜜芽188_被诱拐的少孩全彩啪啪漫画