Several Ways to Share python Download Files

2021-10-25 07:14:47
OfStack

Directory 1, 1-like synchronous download
2. stream using streaming request, requests. get method
3. Download files asynchronously
4. Split and download files asynchronously
5. Attention

1, 1 Synchronous Download

Sample code:


import requests
import os

def downlaod(url, file_path):
  headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Firefox/68.0"
  }
  r = requests.get(url=url, headers=headers)
  with open(file_path, "wb") as f:
    f.write(r.content)
    f.flush()

2. stream using streaming request, requests. get method

By default, the value of stream is false, which will immediately start downloading files and store them in memory. If the file is too large, it will lead to insufficient memory, and the program will report an error.
When the stream parameter of the get function is set to True, it will not start downloading immediately. It will only start downloading when you use iter_content or iter_lines to traverse the content or access the content properties. Note one point: It also needs to stay connected before the file is downloaded.


iter_content : 1 Block 1 Block traversal of the content to be downloaded 
iter_lines : 1 Row 1 Traversal of rows to download content

Downloading large files using the above two functions prevents excessive memory usage because only a small portion of data is downloaded at a time.

Sample code:

3. Download files asynchronously

Since request requests are blocking, the aiohttp module is used to initiate the request.

Sample code:


import aiohttp
import asyncio
import os


async def handler(url, file_path):
  headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Firefox/68.0"
  }
  async with aiohttp.ClientSession() as session:
    r = await session.get(url=url, headers=headers)
    with open(file_path, "wb") as f:
      f.write(await r.read())
      f.flush()
      os.fsync(f.fileno())


loop = asyncio.get_event_loop()
loop.run_until_complete(handler(url, file_path))

4. Split and download files asynchronously

The above is a co-process to download a file, the following method is to divide the file into several parts, each part with a co-process to download, and finally write to the file.

The following example uses streaming writing, that is, writing contents to disk.


import aiohttp
import asyncio
import time
import os


async def consumer(queue):
  option = await queue.get()
  start = option["start"]
  end = option["end"]
  url = option["url"]
  filename = option["filename"]
  i = option["i"]

  print(f" No. 1 {i} Tasks start running ")
  async with aiohttp.ClientSession() as session:
    headers = {"Range": f"bytes={start}-{end}"}
    r = await session.get(url=url, headers=headers)
    with open(filename, "rb+") as f:
      f.seek(start)
      while True:
        chunk = await r.content.read(end - start)
        if not chunk:
          break
        f.write(chunk)
        f.flush()
        os.fsync(f.fileno())
        print(f" No. 1 {i} Tasks are being written ing")
    queue.task_done()
    print(f" No. 1 {i} Tasks were written successfully ")


async def producer(url, headers, filename, queue, coro_num):
  async with aiohttp.ClientSession() as session:
    resp = await session.head(url=url, headers=headers)
    file_size = int(resp.headers["content-length"])
    #  Create 1 Files 
    with open(filename, "wb") as f:
      pass
    part = file_size // coro_num
    for i in range(coro_num):
      start = part * i
      if i == coro_num - 1:
        end = file_size
      else:
        end = start + part
      info = {
        "start": start,
        "end": end,
        "url": url,
        "filename": filename,
        "i": i,
      }
      queue.put_nowait(info)


async def main():
  #  Need to fill in the following url , filename , coro_num
  url = ""
  filename = ""
  coro_num = 0
  headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Firefox/68.0"
  }
  queue = asyncio.Queue(coro_num)
  await producer(url, headers, filename, queue, coro_num)
  task_list = []
  for i in range(coro_num):
    task = asyncio.create_task(consumer(queue))
    task_list.append(task)
  await queue.join()
  for i in task_list:
    i.cancel()
  await asyncio.gather(*task_list)


startt = time.time()
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
end = time.time() - startt
print(f" Used {end} Seconds ")

5. Attention

The above examples are all introductory ideas. The program is not robust. Robust programs need to add error capture and error handling.

Above is the python download file several ways to share the details, more information about python download file please pay attention to other related articles on this site!