Institutional Knowledge

Wherein we write down some stuff that we know.

Institutional Knowledge header image 2

Routing Around Damage with Ruby

October 26th, 2007 · No Comments

We have an installation of some monitoring software. It monitors the accessibility of web pages. It creates static HTML reports. A different report is generated for each “site” we monitor. Separate HTML reports are fine, even preferred at times, except that they are totally useless for getting a bigger picture of what we are monitoring, as there is no way to query the monitoring software for the information we need. “How many sites are 100% compliant?” “How many aren’t?” “How many are over 80%?” Each report stands alone with it’s data encased in semantically poor HTML. Scott had already built a system where we could store the results of the monitoring to give us the big picture we were looking for. But, and you knew this was coming, the data for each report has to be entered in manually.

This, will just not do.

Scott crafted a plan wherein his software would expose a RESTful web service that would give us the current state of things in XML. A glue script would be written that would query the web service, find the static HTML report location, scrape the information we need from each report, and then finally update the information in our custom reporting tool.

Now, I’m not a huge fan of scraping HTML for content, especially semantically poor HTML. Design changes cause breakage. Luckily, there is Hpricot. It does quite an amazing job of taking crud HTML and letting you parse it as if it were XML. Less chance of breakage is always good.

The script is completed and runs from cron every night. Manual data entry just isn’t our thing.

Tags: Web Development