An instance of a screenshot of a web page is implemented in Python using Selenium

  • 2020-04-02 13:53:28
  • OfStack

Selenium is a tool that lets browsers automate a series of tasks and is often used to automate tests. However, it can also be used to take screenshots of web pages. Currently, it supports four client languages: Java, C#, Ruby, and Python. If you use Python, simply type "sudo easy_install selenium" in the command line and press enter to install client support for the Python version of selenium.

In Python, for example, we can use the following script to take a screenshot of a given page (such as the home page of a site) :


# -*- coding: utf-8 -*-
#
# author: oldj <oldj.wu@gmail.com>
#

from selenium import webdriver
import time


def capture(url, save_fn="capture.png"):
  browser = webdriver.Firefox() # Get local session of firefox
  browser.set_window_size(1200, 900)
  browser.get(url) # Load page
  browser.execute_script("""
    (function () {
      var y = 0;
      var step = 100;
      window.scroll(0, 0);

      function f() {
        if (y < document.body.scrollHeight) {
          y += step;
          window.scroll(0, y);
          setTimeout(f, 50);
        } else {
          window.scroll(0, 0);
          document.title += "scroll-done";
        }
      }

      setTimeout(f, 1000);
    })();
  """)

  for i in xrange(30):
    if "scroll-done" in browser.title:
      break
    time.sleep(1)

  browser.save_screenshot(save_fn)
  browser.close()


if __name__ == "__main__":

  capture("//www.jb51.net")

Notice that in the above code, instead of taking a screenshot immediately after opening the page, I executed a JavaScript script on the page, dragging the scroll bar of the page to the bottom and back to the top before taking a screenshot. The advantage of this is that if there is some lazily loaded content at the bottom of the page, it will generally be loaded after this operation.

Selenium is more powerful than browser plug-ins such as PageSaver, for example, because it can inject and execute a piece of JS on a page, simulate actions such as mouse clicks, and run multiple instances at once (multiple threads taking screenshots at the same time). In this case, using Selenium to take a screenshot of a page seems like a good choice.


Related articles: