Wherein we write down some stuff that we know.

Archive for the ‘Portal’ Category

Portal Stats: The Term in Review

Friday, June 13th, 2008

At the start of the term, we looked at portal traffic on the first day of the term, in this case spring 2008. It was a busy day, to say the least. Now that we have data for the entire term, what more do we know?

  • The second day of school was the busiest day of the term for visits (not visitors, as define by Google Analytics), 32,915.
  • Monday after spring break had the highest total visitors, 18,391.
  • Saturday, March 15, the start of spring break was our least busy day for visits, 4,499.

Now, you might look at these numbers and say, “Pat, that’s exactly what we would expect to happen.” That is correct, but until you actually have the numbers, you don’t really know that your expectations of reality and reality itself match up. Now you do.

The numbers for the whole term look like this:

Google Analytics spring 2008

You might notice some abnormalities in our sparkline graphs. This is due to our introduction of tracking some popular “external links” during spring break. It will slightly distort our page view data for spring 2008, external links are tracked as views, but as long as we break our reporting into pre and post break segments we’ll be fine. Our visitor information remains consistent though.

The external link tracking is important in the portal so we know where people are going when they leave. Our portal strategy has never been to bring applications into the portal, just provide easy access. Before Google Analytics it was difficult for us to track that information. Now we track clicks to all major applications from the portal. That information should prove illuminating in the future.

Overall having Google Analytics in the portal is a big win for us. We’re still keeping all the raw log file information and doing our usual processing, but this wins hands down just for the amazing breadth of information we can look at now.

First Day Metrics

Thursday, January 31st, 2008

Metrics are a funny thing. We have 4 main ways of looking at the portal. We have concurrent users, which we measure at 5 minute intervals. We track logins and unique users on a daily basis. We have Google Analytics. Last but not least, we also track how many people click through to PeopleSoft through our CAS access logs. For instance, here is what the first day of spring term looked like in terms of concurrent users on a hourly basis.

portal-20080128-sm.png

Yeah, it was hopping. You can even seen the network outage we had late that night (yeah, that was fun). This is just a snapshot though. It just tells us that, in general, it was really busy.

Our login stats also show that Monday was indeed very, very busy with 31,984 logins made by 13,412 users. A good number to have, but it doesn’t say much. We don’t currently run an analysis to get a real logins per user average.

Google Analytics shows us where people went in the portal but not where they went when they were leaving the portal.

Google Analytics screen shot showing the portal was very, very busy.

We know there are three main destinations: PeopleSoft, WebCT/Blackboard Vista, and webmail. Because of the way we send users to PeopleSoft from the portal, we have good numbers — although they are not tied to unique users.

cms-clicks-sm.png

Clearly the first day of the term (the 28th) shows a lot of activity. We rack up +70k clicks for only 13,412 users and 31,984 logins. The performance of the PeopleSoft system suffered a little that day, which is what really prompted us to start trying to tie all these numbers together.

While nothing is definitive, it seems that the performance problems were due to a number of issues — as you would expect in an enterprise system like PeopleSoft. With ~32k logins and some poor statistical assumptions you get just over 2 clicks into PeopleSoft per login and users logging in at least twice during the day.

The questions that these numbers can’t answer for us are many. Is the system usable? Are usability problems leading people to login more times than they would in a perfect world? Are students needing to login frequently on the first day to find or adjust their schedule? All of these will remain “unknown unknows” until we actually talk to students.

What do we know from these numbers? The first day of school is busy.

Grades Day, Or the Effect of a Simple Message in the Right Place

Wednesday, January 9th, 2008

Right before finals week last semester, the university registrar asked that a brief announcement be placed just above the link to PeopleSoft in the Portal. The announcement let students know that grades would post on January 8. It went up on December 18, the second day of finals.

The result on January 8 was the busiest Portal day ever. (Pat or Scott will hopefully speak to the details of the usage numbers and Portal system performance.)

Clearly, the message was in the right place for a sufficient amount of time and helped to fulfill the compelling need of students to see their grades as soon as they were available. This was the first semester that the message went up at the beginning of finals week. It was also the first time that the message was nearly alone in terms of other content in the channel. My guess is that those were the chief contributing factors to the effectiveness of the announcement.

It’s also interesting to note where this information wasn’t publicized:
I mention those places not to criticize them for not having the information. Instead, it is to point out that while it might have been nice to have the information in those places, it is far more powerful to have the information as close to the point of contact for the user as possible.

Experimenting with Twitter

Friday, August 31st, 2007

Not long after the Virginia Tech shootings there was an article on the College Web Editor blog asking if Twitter could be used for emergency communications. I argued that this would be a poor use of Twitter and of campus resources in a situation where resources are critical. I still stand by my reasoning for that particular use of Twitter. That isn’t to say that I don’t think that Twitter can’t be used on campus; I’ve been experimenting a tiny bit this week with something that I think Twitter is actually good at, disseminating casual information to those that decide they want it.

So, my little experiment is to send portal usage stats out via Twitter. The updates are sent every two hours between 8am and 6pm, Monday through Friday. Thank you, cron! There are so many moving parts in this that I’m tempted to fire up OmniGraffle to illustrate it, but I’ll try with just words and see how it goes.

The portal has multiple servers running behind the scenes so that we can deal with lots of traffic. Each of these servers have their own set of user sessions. We can query these servers for their session information, and we do this every 5 minutes and stick the information in a database (thank you perl and mysql). We have another internal system that allows us to graph and browse this data, which is incredibly helpful in looking for usage trends. I wrote a small ruby script that uses the Twitter gem written by John Nunemaker to query the stats database and then post the information to Twitter. Scott was even so kind as to place a Wordpress widget in our theme (top right) to show the information pulled directly from Twitter.

Now, some might call this a frivolous exercise but that would be ignoring the fact that we have to push a lot of information around campus and we need to be looking more at using proper APIs and not just don’t batch dumps of data to be imported by some other system. Our systems need to talk, and more importantly listen to what other systems are saying.

Dealing with Downtime

Tuesday, August 28th, 2007

Yesterday we were cruising right along in the portal. Things were looking quite good for the first day of the fall term. We were sustaining over 1,500 concurrent users, but let me stop here for just a second and visit what this means. Before we were running uPortal we ran another “monolithic” portal product and it suffered from severe performance problems, especially under high load at peak times. I could argue how “monolithic” it was since many of the components were on different pieces of hardware, but I’ll let it slide because there was no effort made to distinguish the different applications from “the portal.”

We currently run two production app servers (Tomcat, Java 5, Dell 2850s) and yes, we keep the portal as simple as possible. People login, and then they go to e-mail, WebCT, or PeopleSoft. So what makes the portal different from a static HTML page with links? Two things. We know who you are and to some extent what you do on campus, so we show you things that are appropriate for your group. The real killer app is single sign-on, which in our case is CAS. You login once, and then we can get you to all these other applications, safely, without having the user login again.

Anyway, back to our Monday debacle. A big requirement of the initial uPortal deployment was that it had to stand up to peak demand, and by all measurements the portal meets and exceeds this requirement. So, where did it all go wrong? It all started deep in the core of the network…

The day the core network died

Around 2:30pm we noticed that network connectivity was gone. I think it’s time to interject a small quote from Rands.

The fact that a computer without an Internet connection is essentially a very expensive DVD player is a recent development, but the fact is, when I sit down at my MacBook and there is no wireless I think, “Well, I could play Bejeweled, right?”

When the network goes funky CAS can lose touch with LDAP and without LDAP it can’t authenticate people. Our killer app is useless at that point and that’s when you know things aren’t going well. You also know your day is about to get worse, because the only solution is to bounce tomcat. This, of course, means (in our current configuration) we lose any existing user sessions. So, as soon as we are back up, we are inundated with a wave of people needing to re-authenticate. CAS was a “little” slow, but it eventually slogged through the onslaught of requests.

So, what’s with the odd bump in portal sessions after the network recovered? Scott surmised that it was phantom sessions. When the network went down many people probably closed their browser windows, killing their current session on the browser side but not on the portal side. After the server-side timeout was hit, the sessions (on the server) started to drop as Scott predicted.

Luckily we were in communication with the help desks and they were able to explain to people what was going on. It looks like a CAS 3.1 cluster could be useful sooner rather than later.