Take and Compare Website Screenshots with Python

I maintain a few wordpress websites and every time I install multiple updates, I have to click through the sites to see if everything still works. Small, potential design issues could still be overseen. To simplify and to improve this process, I created a python script that takes multiple screenshots of the website. This script is executed before and after the upgrade. Once this is done, I execute another script that compares the screenshots and shows if there are any differences. This blog post is about those two scripts – one that takes screenshots, another that compares two images.

The screenshot taker script uses selenium and therefore, we need to download the selenium webdriver at first. The second script uses pillow library in python to compare images:

Prerequisites

  1. Download and install webdriver
  2. extract exe into a folder and add folder path to PATH
  3. Install selenium and pillow:
pip install selenium
pip install pillow

# or 
# python -m pip install selenium
# python -m pip install pillow

Python script to take screenshots of a website

The following is the simplified version of the “screenshot taker” script. It contains the hardcoded URL and a hardcoded list of subsites. Later in this blog post, I will link a nicer one which accepts arguments, crawls the website for urls and allows a few more parameters.

import os
from datetime import datetime
from time import sleep
from selenium import webdriver
from selenium.webdriver.firefox.options import Options

siteurl = "https://arminreiter.com"
screen_width = 2560
screen_height = 1440
output_directory = 'output_' + datetime.now().strftime('%Y%m%d_%H%M%S')

sites = [ "/", "/about", "/resources", "/privacy-policy"]

options = Options()
options.add_argument("--headless")

driver = webdriver.Firefox(options=options)
driver.set_window_size(screen_width, screen_height)

os.makedirs(output_directory , exist_ok=True)

for url in sites:
    print("get " + url + "...")
    filename = url.replace('/','_') + ".png"

    driver.get(siteurl + url)
    sleep(3)
    outfile = os.path.join(output_directory, filename)
    driver.get_screenshot_as_file(outfile)

driver.quit()

Even the script above contains the hardcoded urls and only takes screenshots, it should already be sufficient for many use cases. When the script ran, we will have screenshots of our website. If we run it before and after the upgrade, we will have screenshots to compare. So, lets write the script that allows us to compare two images:

Python script to compare two images

import argparse
import os
from datetime import datetime
from PIL import Image, ImageChops

parser = argparse.ArgumentParser()
parser.add_argument("--first",  "-f", help="path to the first folder for image comparison", required=True)
parser.add_argument("--second", "-s", help="path to the second folder for image comparison", required=True)
args = parser.parse_args()

dir1 = args.first
dir2 = args.second

outputdir = 'result_' + datetime.now().strftime('%Y%m%d_%H%M%S')

for filename in os.listdir(dir1):

    file1 = os.path.join(dir1, filename)
    file2 = os.path.join(dir2, filename)
    im1 = Image.open(file1)
    im2 = Image.open(file2)

    print('Compare ' + filename + ' (' + file1 + ' AND ' + file2 + ')')

    diff_img = ImageChops.difference(im1, im2).convert('RGB')
    if diff_img.getbbox():
        outpath = outputdir + '/' + filename + "-s.png"
        print("Images are different, store difference in " + outpath)
        os.makedirs(outputdir , exist_ok=True)
        diff_img.save(outpath)
    else:
        print("Images are equal")

Improved Python Scripts

The following scripts are improved version of the two above. The improved screenshot script takes the url of the website and the screen resolution as parameters. It uses a webservice to get all subsites of the website and takes screenshots of each subsites. The number of subsites can be limited by using the -l parameter.
The image compare script is just extended by a description and 1-2 small adoptions.

Improved Website Screenshot Python Script

# The following script gets 2 folders as input and
# compares the images with each other. Possible
# differences are stored in a new folder starting
# with compresult_
# usage: python imgCompare.py -f folder1 -s folder2
import argparse
import os
from datetime import datetime
from PIL import Image, ImageChops
parser = argparse.ArgumentParser()
parser.add_argument("--first", "-f", help="path to the first folder for image comparison", required=True)
parser.add_argument("--second", "-s", help="path to the second folder for image comparison", required=True)
args = parser.parse_args()
dir1 = args.first
dir2 = args.second
outputdir = "compresult_" + datetime.now().strftime("%Y%m%d_%H%M%S")
for filename in os.listdir(dir1):
file1 = os.path.join(dir1, filename)
file2 = os.path.join(dir2, filename)
if not os.path.exists(file2):
print("File " + file2 + " does not exist in second folder. Skip image")
continue
im1 = Image.open(file1)
im2 = Image.open(file2)
print("Compare " + filename + " (" + file1 + " AND " + file2 + ")")
diff_img = ImageChops.difference(im1, im2).convert("RGB")
if diff_img.getbbox():
outpath = outputdir + "/" + filename + "-s.png"
print(" Images are different, store difference in " + outpath)
os.makedirs(outputdir , exist_ok=True)
diff_img.save(outpath)
else:
print(" Images are equal")
view raw imgCompare.py hosted with ❤ by GitHub

Improved Image Compare Python Script

# this script takes multiple screenshots of a website
# including subsites and saves these screenshots into
# a folder starting with the name output_
# usage:
# python webscreenshot.py -u https://arminreiter.com
# python webscreenshot.py -u https://arminreiter.com -w 1920 -h 1080 -l 30
import argparse
import os
import urllib.request
from datetime import datetime
from time import sleep
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
def get_subsites(baseurl):
sites = ['/']
urls = urllib.request.urlopen('https://api.hackertarget.com/pagelinks/?q=' + baseurl).read()
for u in urls.decode('utf-8').splitlines():
rurl = u.replace(baseurl, '')
if u.startswith(baseurl) and not (rurl in sites):
sites.append(rurl)
return sites
# Arguments
parser = argparse.ArgumentParser()
parser.add_argument("--url", "-u", help="url of the website, e.g. https://arminreiter.com", required=True)
parser.add_argument("--width", "-x", help="Screenshot resolution width, default: 2560", default=2560, type=int)
parser.add_argument("--height", "-y", help="Screenshot resolution height, default: 1440", default=1440, type=int)
parser.add_argument("--limit", "-l", help="Limits the number of urls that will be screenshoted (0 is no limit), default: 10", default=10, type=int)
args = parser.parse_args()
siteurl = args.url
screen_width = args.width
screen_height = args.height
output_directory = 'output_' + datetime.now().strftime('%Y%m%d_%H%M%S')
wait_time = 3 # time to wait until screenshot is taken
print("Start script for " + siteurl + ".")
print("Screenshot resolution: " + str(screen_width) + "x" + str(screen_height))
print("Save all files to: " + output_directory)
print("get subsites for " + siteurl)
sites = get_subsites(siteurl)
print("received " + str(len(sites)) + " sites")
print("start selenium webdriver")
options = Options()
options.add_argument("--headless")
driver = webdriver.Firefox(options=options)
driver.set_window_size(screen_width, screen_height)
os.makedirs(output_directory , exist_ok=True)
total = len(sites)
if args.limit > 0:
total = args.limit
for i in range(total):
url = sites[i]
print(str(i+1).rjust(3,' ') + " of " + str(total) + ": get " + url + "...")
filename = url.replace('/','_') + ".png"
driver.get(siteurl + url)
sleep(wait_time)
outfile = os.path.join(output_directory, filename)
driver.get_screenshot_as_file(outfile)
driver.quit()
print("done")
view raw webscreenshot.py hosted with ❤ by GitHub

Categories:

No responses yet

    Leave a Reply

    Your email address will not be published. Required fields are marked *