Automating screenshots with Selenium

One of our web sites has a help section which includes screenshots of the web site itself. Needless to say, the screenshots quickly become out-of-date when we modify the site.

After manually updating the screenshots a couple times, I decided to automate the process using Selenium and Python. This post describes:

Setting up Selenium in a Python virtual environment
Initializing the WebDriver
Interacting with the web site
Taking screenshots of specific elements

Set up the virtual environment

We’ll install Selenium in a new Python virtual environment. Selenium also requires a driver to interface with the browser which must be installed separately.

First, create and activate a virtual environment:

$ cd ~/.envs
$ python3 -m venv selenium
$ source ./selenium/bin/activate.fish

Note that I run the .fish variation of the virtual environment activate script because I use the fish shell. For Bash you’d run source ./selenium/bin/activate instead.

Next, download a WebDriver such as ChromeDriver or geckodriver. Ensure that the driver version is compatible with your browser version. I use ChromeDriver because Chrome’s font rendering on Linux is visually superior to Firefox’s. Make sure that the chromedriver or geckodriver executable is in your PATH.

To save screenshots we’ll use the Pillow imaging library. Install it into the virtual environment with:

$ python3 -m pip install Pillow

The following sections how to use Selenium and its WebDriver in Python to capture screenshots.

Initialize the WebDriver

The WebDriver API provides high-level commands that make it easy to control the browser. It takes just a few lines of code to initialize the WebDriver and load a web site:

from selenium import webdriver

# Initialize driver
driver = webdriver.Chrome()
driver.implicitly_wait(5)
driver.get("http://localhost:3000/")
driver.maximize_window()

The .implicitly_wait() function sets the timeout for an element to be found or for a command to complete. This timeout remains in effect for the entire script.

Interacting with the web site

After loading the web site, we log in and navigate to where we want to take the first screenshot:

import os

from selenium.webdriver.common.keys import Keys

# Get credentials from environment variables
username = os.environ.get("MY_USERNAME")
password = os.environ.get("MY_PASSWORD")

# Log in
elem = driver.find_element_by_name("username")
elem.send_keys(username)
elem = driver.find_element_by_name("password")
elem.send_keys(password)
elem.send_keys(Keys.RETURN)

Selenium provides many methods to locate elements on a page, including:

find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector

For example, to navigate by clicking on an element that has an ID, run:

elem = driver.find_element_by_id("my-menu")
elem.click()

The XPath variation is useful when the target element doesn’t have an ID and isn’t easily identified by another attribute.

Now that we’ve navigated to the page of interest, we’re ready to take a screenshot.

Taking screenshots

Selenium provides a function to capture the entire window, but not one to capture only specific elements. Therefore, taking a screenshot of specific elements involves several steps:

Scroll the elements into view
Capture the entire window
Crop the image
Save the cropped image

Before walking through these steps, let’s define some data structures and a utility function:

import math
from typing import NamedTuple


class Bounds(NamedTuple):
    """
    A rectangular bounding box defined by two points.
    Lower left: (x0, y0)
    Upper right: (x1, y1)
    """
    x0: int
    y0: int
    x1: int
    y1: int


class Margin(NamedTuple):
    """
    Margin around the elements in an image.
    x: left and right margin
    y: top and bottom margin
    """
    x: int
    y: int


def get_bounding_box(elems):
    """
    Get the bounding box of a list of HTML elements.
    """
    x0 = math.inf
    y0 = math.inf
    x1 = -1
    y1 = -1

    for elem in elems:
        x0 = min(x0, elem.location["x"])
        y0 = min(y0, elem.location["y"])
        x1 = max(x1, elem.location["x"] + elem.size["width"])
        y1 = max(y1, elem.location["y"] + elem.size["height"])

    return Bounds(x0=x0, y0=y0, x1=x1, y1=y1,)

Given a list of HTML elements, get_bounding_box() returns a Bounds object that represents the rectangular bounding box containing all the elements. This provides a convenient way to capture a specific area of the web page.

We’ll use the Margin class later to represent a margin around the elements in a captured image.

Scroll the elements into view

We can use the Selenium locator functions to identify the elements to capture. In the code below, assume the elements of interest are in the list elems.

# Scroll to first element and adjust for margin
elems[0].location_once_scrolled_into_view
driver.execute_script(f"window.scrollBy(0, -{margin[1]});")

The first line calls the location_once_scrolled_into_view property for its side effect of scrolling the entire element into view; see https://github.com/SeleniumHQ/selenium/blob/b4b7674/py/selenium/webdriver/remote/webelement.py#L562. Note that relying on this behavior isn’t robust and a more explicit API call would be better, once it exists.

The second line scrolls up to account for the desired vertical margin around the elements.

Capture the screen

The WebDriver API provides several functions to take a screenshot of the current window:

get_screenshot_as_base64: Returns the screenshot in a Base64-encoded string.
get_screenshot_as_file: Saves the screenshot to a file.
get_screenshot_as_png: Returns the screenshot as binary data.

Because we need to crop the screenshot to include only the elements of interest, we use get_screenshot_as_png:

# Capture the visible window
png_data = driver.get_screenshot_as_png()

Crop the capture

To crop the screenshot to the desired area, we consider the bounding box of the elements, the specified margin around the elements, and the vertical scroll position of the window:

import io

from PIL import Image


# Get Y scroll value
scroll_y = driver.execute_script("return window.scrollY;")

# Crop image to bounds specified elements, with margin
bounds = get_bounding_box(elems)
image = Image.open(io.BytesIO(png_data))
image = image.crop(
    (
        # left
        bounds.x0 - margin.x,
        # upper
        bounds.y0 - scroll_y - margin.y,
        # right
        bounds.x1 + margin.x,
        # lower
        bounds.y1 - scroll_y + margin.y,
    )
)

Save the image

Finally, we save the cropped image to a PNG file:

image.save(filename)

The script can repeat the workflow of navigating, locating elements, and capturing a screenshot.

Advanced interaction

The code snippets above showed basic ways of locating and interacting with a web page using Selenium, such as clicking on an element. It’s possible to script more complex actions including hovering and drag-and-drop using ActionChains.

For example, the following code moves the mouse cursor to hover over a list element:

elem = driver.find_element_by_xpath(
    "//li[contains(string(), 'My Item')]"
)

actions = ActionChains(driver)
actions.move_to_element(elem)
actions.perform()

Many actions can be queued in an ActionChains object. Calling .perform() runs the queued actions.

Wrapping up

When finished, the script closes the browser and stops the WebDriver:

driver.quit()

The updated screenshots are now ready for the web site.

Set up the virtual environment#

Initialize the WebDriver#

Interacting with the web site#

Taking screenshots#

Scroll the elements into view#

Capture the screen#

Crop the capture#

Save the image#

Advanced interaction#

Wrapping up#