August 18, 2007

Blackboard upgrades 8/18/2007

Todays updates did not go well. I'm finishing the process of rolling back at this point.
Blackboard is up and running in the state it was before the upgrade.

There were a number of problems but the short version is that the upgrade was taking way too long and finally errored out. I started at about 6:30 and started the rollback at about 11:45.
I have a few ideas on how to make the update work that involve the
database. I'm also submitting some errors to blackboard in a few minutes.
If I can get the mirrors rebuild I can try the update again this Sunday night at
10PM. Otherwise I can try it some night this coming week

The first unexpected "got ya" that hit me was that upgrade was unable to modify the
database due to the replication subscriptions. We are replicating our blackboard databases to a special database for access by our umbc apps. I had to remove the replication subscriptions to remove the publications, to alow the updater to modify the database.

What this means is that I'll need to completely reconfigure the replication
setup next week. This should be a good thing. When I first set up the replication,
I felt like there was quite a bit of pressure to get it working very quickly. So, I never really had time to figure out what I was doing. Nor was I able to document "how to set up replication" for the blackboard database. Now hopefully I should be able to set this up better and document the procedure properly.

The second problem was that the updater had not finished the first server after 2 hours
and it normally takes 30 minutes. I've seen this before in test, and it didn't finish after a
full day. So I had to start over. The next try I let the updater run for 2.5 hours and it finished with a critical error. Hence the rollback.

Netsaint is broken. Going to netsaint.umbc.edu gives a PGP page.

August 17, 2007

Blackboard downtime 8/16

First problem I heard was around the date of 8/13 while I was still away on vacation. Kip Canfield could not access his assessments and students could not take tests. I did some testing and worked with Jeffery to prove that this was not system wide. I also found a possible work around.
I then tried rebooting the servers to see if the problem would clear. This caused the access problems to happen to not only assessments, but to every course, systemwide. I pushed out a reconfig and this caused even more problems as it would not complete cleanly. One instance of the problems was the tomcat service would not delete, so it could not be installed correctly. When I finally got the system to "config" without errors, blackboard started working again.

Then I found one really weird network issue. The database server could not reach the fileserver/collabserver. As in "ping 130.85.29.12" (from 130.85.29.13) returned "unreachable" I fixed that by removing some of the broadcom advanced features I had enabled on the database server.
The features have been running for over a year. So, my guess is that some of the recent network changes in the computer room caused this problem to show up now.
This was probably not cause of the access problems, although it would have broken the collab service.

July 23, 2007

Blackboard pauses

On Sunday 7/22/2007 ~10:30 PM - 11:00PM, patches were applied to
our blackboard servers. Users may have noticed some "pauses"
or incomplete pages during this time.

June 12, 2007

Blackboard file system repair

Last night, 6/12/07, we ran a repair operation on Blackboards file system. During this time
(12:15AM - 12:45AM) Parts of the blackboard system would not function correctly. Any files
stored within blackboards file system would have been inaccessable.

The repair corrected some corruption that had found its way into the file system.



June 4, 2007

Blackboard hang again

Blackboard hung again ~ 12:50PM 6/4 Seems the database server
got so heavily loaded that it stopped responding to backboard.

New media was running queries at the time. I restarted the database server, Blackboard started working again.

After this new media started the query again, Blackboard stopped working. I then restarted just the SQL process and blackboard started working again.

May 24, 2007

Blackboard Hang

On the enening of 5/23 and 5/24 Blackboard hung from ~1:30 AM
till 3:00AM. I am currently investigating the cause.
It seems that during the above times I was unable to map the
drive on the file server from the app servers.

May 8, 2007

Blackboard slowdown?

5/7 at ~1pm one of the blackboard front end servers (app2) started reporting very slow http response. I was not able to determine what was causing this but restarted tomcat and failed all users over to the other server. After restarting App2, App1 started exhibiting the same behavior.
This points to a user process causing the problem. I spoke with Bob
and he had run some large gradebook operations, but nothing else unusual

April 25, 2007

Blackboard partial "hang"

One of our blackboard app servers hung today at ~1:12pm. Someone in the newmedia group tried to run a sort on the gradebook of a community with ~13,000 users. The server then became unresponsive.
The other server did not stop but kept working. I restarted the JVM and everything worked again.

April 19, 2007

Blackboard security updates reboot

The blackboard backend servers were rebooted at 4/19 at 12:30 AM to finish the installation
of OS security patches. This may have caused up to a 2 minutes pause for some users.

April 12, 2007

More Blackboard slow downs

I noticed the same "hanging" problem on blackboard at ~12:20 4/12

I waited on a page load for about 2 minutes. I checked the database
and the same process was active that was causing the problems before.
My guess is that the query is locking tables the do not need to be locked.
There are a large number of blocked processes during the slowdown