Institutional Knowledge

Wherein we write down some stuff that we know.

Institutional Knowledge header image 2

Automating Website Screen Captures on OS X

March 24th, 2008 · No Comments

We have 23 campuses in the California State University system and I like to know what all of them are doing on their web sites. Sure, I could dig into the Way Back Machine, but I find that can give inconsistent results. Instead, I decided that I wanted to automate getting screenshots of all the CSU homepages. This requires some shell scripting, so I’m going to make a lot of assumptions about technical knowledge here. If you aren’t comfortable mucking around on the command line, this may not be the best solution for you.

1. Install webkit2png

If you are on 10.5, you won’t have to install anything but this script.

2. Script Your Shots

I’ve got a small shell script with an array of 23 URLs and a for loop that passes each one to webkit2png.

SITES=( http://www.csub.edu/
...
http://www.csustan.edu/ )

SITENUM="${#SITES[*]}"

for ((i=0;i< $SITENUM;i++)); do
   webkit2png -F -W 1024 -d -D /Users/pberry/Projects/csushots/ ${SITES[${i}]}
done

</pre>

3. Profit!

From there, you could just cron the script up and let it run. The -d option will put the date in the filename, so you aren’t blowing away your archive of screen shots with each run. I decided it would be nice if I could organize these in iPhoto and I didn’t want to manually import my shots each time my script ran. I’m no good with AppleScript, so I turned to Automator.

I created a workflow that creates the directory, runs the shell script, imports them into iPhoto (iPhoto is set to copy on import) and then deletes the directory. The extra steps were really just to make the workflow happen. I’m guessing there is a way to do it without creating/deleting, but I was in a hurry. I then created a Smart Album based on the unique parts of the file names (by default webkit2png will use the URL for the filename) for each campus. Using iPhoto also lets me do “fun” stuff, like create movies showing how a page changes over time (QuickTime).

Now I’m able to spot trends and changes in our system. I could extent the process to grab other pages as well, but for now I’m okay with just the main page for each campus.

Caveats and Misc.

The webkit2png script can have trouble with Flash, so screenshot will look “unfinished” if they use a big Flash movie to show photos. /looks at Bakersfield

If you don’t have 10.5, you’ll have to install the PyObjC bridge. It sounds nasty, but it’s easy. Double-click and you’re on your way.

You could do the first two steps on linux with khtml2png.

Tags: Misc.

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment