At the start of the term, we looked at portal traffic on the first day of the term, in this case spring 2008. It was a busy day, to say the least. Now that we have data for the entire term, what more do we know?
- The second day of school was the busiest day of the term for visits (not visitors, as define by Google Analytics), 32,915.
- Monday after spring break had the highest total visitors, 18,391.
- Saturday, March 15, the start of spring break was our least busy day for visits, 4,499.
Now, you might look at these numbers and say, “Pat, that’s exactly what we would expect to happen.” That is correct, but until you actually have the numbers, you don’t really know that your expectations of reality and reality itself match up. Now you do.
The numbers for the whole term look like this:

You might notice some abnormalities in our sparkline graphs. This is due to our introduction of tracking some popular “external links” during spring break. It will slightly distort our page view data for spring 2008, external links are tracked as views, but as long as we break our reporting into pre and post break segments we’ll be fine. Our visitor information remains consistent though.
The external link tracking is important in the portal so we know where people are going when they leave. Our portal strategy has never been to bring applications into the portal, just provide easy access. Before Google Analytics it was difficult for us to track that information. Now we track clicks to all major applications from the portal. That information should prove illuminating in the future.
Overall having Google Analytics in the portal is a big win for us. We’re still keeping all the raw log file information and doing our usual processing, but this wins hands down just for the amazing breadth of information we can look at now.
Tags: Portal

Previously I talked about improving the SSL performance in Tomcat simply by upgrading the JVM. Here we have a somewhat not-to-scale chart showing how we did in a “real world” test. Last night at 6pm an application opened up and sent a flood of users to our authentication service (CAS). Last year we could not handle the flood. CAS stalled, which caused a flood of calls to the help desk. In the business, we call that less than optimal.
We’re rolling CAS on Java 6 now and SSL performance is no longer an issue.
Tags: Web Development
Meriam Library has announced the arrival of Multi-Search. This is an exciting new way to search the catalog as well as several online databases that we have access to like JSTOR and Academic Search. In the past students would have to search each journal database and our catalog separately. Now, everything is handled through a single search box. Hopefully users will find this means of search much more convenient and a faster way of researching.
Tags: Information Design · Search
In the last two weeks, WEBD has upgraded two of our big services: Confluence and JIRA. As apart of these upgrades we turned over authentication to CAS, our single-sign-on service. Now users will be able to jump back and forth between Confluence and JIRA without having to login a second time. Confluence now becomes a great place to keep documentation and support for web applications that also employ CAS authentication.
Tags: Authentication
We have 23 campuses in the California State University system and I like to know what all of them are doing on their web sites. Sure, I could dig into the Way Back Machine, but I find that can give inconsistent results. Instead, I decided that I wanted to automate getting screenshots of all the CSU homepages. This requires some shell scripting, so I’m going to make a lot of assumptions about technical knowledge here. If you aren’t comfortable mucking around on the command line, this may not be the best solution for you.
If you are on 10.5, you won’t have to install anything but this script.
2. Script Your Shots
I’ve got a small shell script with an array of 23 URLs and a for loop that passes each one to webkit2png.
SITES=( http://www.csub.edu/
...
http://www.csustan.edu/ )
SITENUM="${#SITES[*]}"
for ((i=0;i< $SITENUM;i++)); do
webkit2png -F -W 1024 -d -D /Users/pberry/Projects/csushots/ ${SITES[${i}]}
done
</pre>
3. Profit!
From there, you could just cron the script up and let it run. The -d option will put the date in the filename, so you aren’t blowing away your archive of screen shots with each run. I decided it would be nice if I could organize these in iPhoto and I didn’t want to manually import my shots each time my script ran. I’m no good with AppleScript, so I turned to Automator.
I created a workflow that creates the directory, runs the shell script, imports them into iPhoto (iPhoto is set to copy on import) and then deletes the directory. The extra steps were really just to make the workflow happen. I’m guessing there is a way to do it without creating/deleting, but I was in a hurry. I then created a Smart Album based on the unique parts of the file names (by default webkit2png will use the URL for the filename) for each campus. Using iPhoto also lets me do “fun” stuff, like create movies showing how a page changes over time (QuickTime).
Now I’m able to spot trends and changes in our system. I could extent the process to grab other pages as well, but for now I’m okay with just the main page for each campus.
Caveats and Misc.
The webkit2png script can have trouble with Flash, so screenshot will look “unfinished” if they use a big Flash movie to show photos. /looks at Bakersfield
If you don’t have 10.5, you’ll have to install the PyObjC bridge. It sounds nasty, but it’s easy. Double-click and you’re on your way.
You could do the first two steps on linux with khtml2png.
Tags: Misc.
March 13th, 2008 · 1 Comment
We run a number of applications in Tomcat (both 5.0.x and 5.5.x) and for the most part we’re very happy with the performance we get. There is one time of the year where our CAS (Central Authentication Service) gets killed though, and it’s because of SSL connections. Let me elaborate, it’s because of Tomcat 5.0.x running under JDK 1.4.x. One application for one hour out of the year floods CAS with so many requests that it can’t keep up due to the overhead of SSL. JDK 1.4 just can’t deal with SSL very well, or rather, very quickly. The threads fill up and start blocking connections. In the business we call that LTO (Less Than Optimal).
Now, there are many technical solutions (Tomcat has native APR libraries, we could front with Apache httpd or we have hardware that can do SSL but the latter has security issues) which we never deployed because for 1 hour out of the 8,760 hours in a year we do just fine with the existing setup. Yes, I understand that’s only 99.988% uptime, but still it’s pretty good.
Now, you’re probably thinking to yourself “Where the heck have these guys been? Java 5 gives you a huge performance boost and Java 6 just adds to the gains provided by 5!” We’ve been deploying Java 5 on upgrades and new applications. We just never got to CAS and honestly there was no real need because CAS is so simple and so solid, you rarely think about it once it’s running.
Give me numbers, Mrs. Landingham!
I fired up httperf and grabbed some numbers.
JDK 1.4.2_06
Total: connections 2000 requests 2000 replies 2000 test-duration 83.398 s
Connection rate: 24.0 conn/s (41.7 ms/conn, < =311 concurrent connections)
Connection time [ms]: min 449.9 avg 6122.5 max 47219.6 median 3891.5 stddev 6211.0
Connection time [ms]: connect 6011.3
Connection length [replies/conn]: 1.000
Request rate: 24.0 req/s (41.7 ms/req)
JDK 1.5.0_15
Total: connections 2000 requests 2000 replies 2000 test-duration 57.203 s
Connection rate: 35.0 conn/s (28.6 ms/conn, < =26 concurrent connections)
Connection time [ms]: min 79.7 avg 255.5 max 3421.0 median 163.5 stddev 230.4
Connection time [ms]: connect 225.6
Connection length [replies/conn]: 1.000
Request rate: 35.0 req/s (28.6 ms/req)
That’s roughly a 28% increase in the time to process a request. Now, we all know that there are lies, damn lies, and statistics. This is by
no means an exhaustive breakdown of the differences between
SSL performance between these two
JVMs. This is simply a small bit of empirical data. That being said, it’s probably the cheapest and easiest performance gain your ever likely to get.
Tags: Authentication · Recent Projects
February 21st, 2008 · 1 Comment
We’ve all taken the Myers-Briggs Personality Test. Answer a series of questions and you arrive at a 4-letter description of yourself based on four dichotomies: attitudes, functions, and lifestyle. For example, back in my freshman year in college, I was an INTP or Introversion-Intuition-Thinking-Perceiving.
Based on Rand’s Managing Human book, I’m proposing the idea of a Rands Personality Test based on the following dichotomies of management styles.
| Dichotomies |
| Inward |
Holistic |
| Incrementalist |
Completionist |
| Mechanic* |
Organic* |
| Or combinations of the two: Organic/mechanic and Mechanic/organic |
From these dichotomies, at this point in my career I would consider myself to be a self-described
ICOm or
Inward-
Completionist-
Organic/mechanic.
For those of you who have read Managing Humans, what’s your Rand’s Personality Type?
Tags: Project Management
It’s inevitable. You’re developing a web application and the “need” arises for it to send e-mail. Now, we’re not trying to sound too dismissive of e-mail, but lets face it, e-mail has a number of flaws.
- SMTP is not a real-time protocol. There is absolutely no guarantee of delivery and most applications that send e-mail are not aware of bounces.
- Spam is a problem. Everywhere. The tools put in place to battle spam will inevitably treat one of your messages as spam.
- E-mail overload is an even bigger problem, even if you “solve” the spam problem you are often just tossing more needles in the ever-growing haystack that is the INBOX.
These are issues we see time and time again when some kind of workflow or notification scheme needs to be put into place. What are the alternatives? The easiest is probably RSS, but that assumes that the users are already using some form of RSS reader. Luckily mail programs are starting to incorporate syndication formats and present them to people with the e-mail interface they know and love/loathe.
But at the end of the day you decide that you really do have to send e-mail, the best thing to do is to keep it short. That is, if you want people to actually read the e-mail.
Tags: E-mail
February 20th, 2008 · Comments Off
Our friend Stewart Mader is putting out 21 Days of Wiki Adoption videos. Even if you are already going with your wiki project, there will be some episodes worth watching.
Rock on, Stewart.
Tags: Web 2.0
Metrics are a funny thing. We have 4 main ways of looking at the portal. We have concurrent users, which we measure at 5 minute intervals. We track logins and unique users on a daily basis. We have Google Analytics. Last but not least, we also track how many people click through to PeopleSoft through our CAS access logs. For instance, here is what the first day of spring term looked like in terms of concurrent users on a hourly basis.

Yeah, it was hopping. You can even seen the network outage we had late that night (yeah, that was fun). This is just a snapshot though. It just tells us that, in general, it was really busy.
Our login stats also show that Monday was indeed very, very busy with 31,984 logins made by 13,412 users. A good number to have, but it doesn’t say much. We don’t currently run an analysis to get a real logins per user average.
Google Analytics shows us where people went in the portal but not where they went when they were leaving the portal.

We know there are three main destinations: PeopleSoft, WebCT/Blackboard Vista, and webmail. Because of the way we send users to PeopleSoft from the portal, we have good numbers — although they are not tied to unique users.

Clearly the first day of the term (the 28th) shows a lot of activity. We rack up +70k clicks for only 13,412 users and 31,984 logins. The performance of the PeopleSoft system suffered a little that day, which is what really prompted us to start trying to tie all these numbers together.
While nothing is definitive, it seems that the performance problems were due to a number of issues — as you would expect in an enterprise system like PeopleSoft. With ~32k logins and some poor statistical assumptions you get just over 2 clicks into PeopleSoft per login and users logging in at least twice during the day.
The questions that these numbers can’t answer for us are many. Is the system usable? Are usability problems leading people to login more times than they would in a perfect world? Are students needing to login frequently on the first day to find or adjust their schedule? All of these will remain “unknown unknows” until we actually talk to students.
What do we know from these numbers? The first day of school is busy.
Tags: Portal