One of our web sites has a help section which includes screenshots of the web site itself. Needless to say, the screenshots quickly become out-of-date when we modify the site.
After manually updating the screenshots a couple times, I decided to automate the process using Selenium and Python. This post describes:
- Setting up Selenium in a Python virtual environment
- Initializing the WebDriver
- Interacting with the web site
- Taking screenshots of specific elements
Set up the virtual environment
We’ll install Selenium in a new Python virtual environment. Selenium also requires a driver to interface with the browser which must be installed separately.
First, create and activate a virtual environment:
$ cd ~/.envs
$ python3 -m venv selenium
$ source ./selenium/bin/activate.fish
Note that I run the .fish
variation of the virtual environment activate
script because I use the fish shell. For Bash you’d
run source ./selenium/bin/activate
instead.
Next, download a WebDriver such as
ChromeDriver
or geckodriver. Ensure
that the driver version is compatible with your browser version. I use
ChromeDriver because Chrome’s font rendering on Linux is visually superior to
Firefox’s. Make sure that the chromedriver
or geckodriver
executable is
in your PATH
.
To save screenshots we’ll use the Pillow imaging library. Install it into the virtual environment with:
$ python3 -m pip install Pillow
The following sections how to use Selenium and its WebDriver in Python to capture screenshots.
Initialize the WebDriver
The WebDriver API provides high-level commands that make it easy to control the browser. It takes just a few lines of code to initialize the WebDriver and load a web site:
from selenium import webdriver
# Initialize driver
driver = webdriver.Chrome()
driver.implicitly_wait(5)
driver.get("http://localhost:3000/")
driver.maximize_window()
The .implicitly_wait()
function sets the timeout for an element to be found
or for a command to complete. This timeout remains in effect for the entire
script.
Interacting with the web site
After loading the web site, we log in and navigate to where we want to take the first screenshot:
import os
from selenium.webdriver.common.keys import Keys
# Get credentials from environment variables
username = os.environ.get("MY_USERNAME")
password = os.environ.get("MY_PASSWORD")
# Log in
elem = driver.find_element_by_name("username")
elem.send_keys(username)
elem = driver.find_element_by_name("password")
elem.send_keys(password)
elem.send_keys(Keys.RETURN)
Selenium provides many methods to locate elements on a page, including:
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
For example, to navigate by clicking on an element that has an ID, run:
elem = driver.find_element_by_id("my-menu")
elem.click()
The XPath variation is useful when the target element doesn’t have an ID and isn’t easily identified by another attribute.
Now that we’ve navigated to the page of interest, we’re ready to take a screenshot.
Taking screenshots
Selenium provides a function to capture the entire window, but not one to capture only specific elements. Therefore, taking a screenshot of specific elements involves several steps:
- Scroll the elements into view
- Capture the entire window
- Crop the image
- Save the cropped image
Before walking through these steps, let’s define some data structures and a utility function:
import math
from typing import NamedTuple
class Bounds(NamedTuple):
"""
A rectangular bounding box defined by two points.
Lower left: (x0, y0)
Upper right: (x1, y1)
"""
x0: int
y0: int
x1: int
y1: int
class Margin(NamedTuple):
"""
Margin around the elements in an image.
x: left and right margin
y: top and bottom margin
"""
x: int
y: int
def get_bounding_box(elems):
"""
Get the bounding box of a list of HTML elements.
"""
x0 = math.inf
y0 = math.inf
x1 = -1
y1 = -1
for elem in elems:
x0 = min(x0, elem.location["x"])
y0 = min(y0, elem.location["y"])
x1 = max(x1, elem.location["x"] + elem.size["width"])
y1 = max(y1, elem.location["y"] + elem.size["height"])
return Bounds(x0=x0, y0=y0, x1=x1, y1=y1,)
Given a list of HTML elements, get_bounding_box()
returns a Bounds
object
that represents the rectangular bounding box containing all the elements.
This provides a convenient way to capture a specific area of the web page.
We’ll use the Margin
class later to represent a margin around the elements
in a captured image.
Scroll the elements into view
We can use the Selenium locator functions to identify the elements to
capture. In the code below, assume the elements of interest are in the list
elems
.
# Scroll to first element and adjust for margin
elems[0].location_once_scrolled_into_view
driver.execute_script(f"window.scrollBy(0, -{margin[1]});")
The first line calls the location_once_scrolled_into_view
property for its
side effect of scrolling the entire element into view; see
https://github.com/SeleniumHQ/selenium/blob/b4b7674/py/selenium/webdriver/remote/webelement.py#L562.
Note that relying on this behavior isn’t robust and a more explicit API call
would be better, once it exists.
The second line scrolls up to account for the desired vertical margin around the elements.
Capture the screen
The WebDriver API provides several functions to take a screenshot of the current window:
get_screenshot_as_base64
: Returns the screenshot in a Base64-encoded string.get_screenshot_as_file
: Saves the screenshot to a file.get_screenshot_as_png
: Returns the screenshot as binary data.
Because we need to crop the screenshot to include only the elements of
interest, we use get_screenshot_as_png
:
# Capture the visible window
png_data = driver.get_screenshot_as_png()
Crop the capture
To crop the screenshot to the desired area, we consider the bounding box of the elements, the specified margin around the elements, and the vertical scroll position of the window:
import io
from PIL import Image
# Get Y scroll value
scroll_y = driver.execute_script("return window.scrollY;")
# Crop image to bounds specified elements, with margin
bounds = get_bounding_box(elems)
image = Image.open(io.BytesIO(png_data))
image = image.crop(
(
# left
bounds.x0 - margin.x,
# upper
bounds.y0 - scroll_y - margin.y,
# right
bounds.x1 + margin.x,
# lower
bounds.y1 - scroll_y + margin.y,
)
)
Save the image
Finally, we save the cropped image to a PNG file:
image.save(filename)
The script can repeat the workflow of navigating, locating elements, and capturing a screenshot.
Advanced interaction
The code snippets above showed basic ways of locating and interacting with a web page using Selenium, such as clicking on an element. It’s possible to script more complex actions including hovering and drag-and-drop using ActionChains.
For example, the following code moves the mouse cursor to hover over a list element:
elem = driver.find_element_by_xpath(
"//li[contains(string(), 'My Item')]"
)
actions = ActionChains(driver)
actions.move_to_element(elem)
actions.perform()
Many actions can be queued in an ActionChains
object. Calling .perform()
runs the queued actions.
Wrapping up
When finished, the script closes the browser and stops the WebDriver:
driver.quit()
The updated screenshots are now ready for the web site.