UMBC logo

« New Juniper VPN for UMBC | Main | Mail Upgrades To Begin August 20th »

August 18, 2007

Storage Failure Results in Blackboard and Additional System Outages

On Tuesday, September 18th at approximately 5pm there was a failure in our campus file storage system that caused Blackboard and other Windows services to be unavailable until 8pm. This storage outage impacted a number of other campus services in addition to Blackboard, such as:

Blackboard Pilot Server (OIT)
Content DM (Library)
Illiad (Library)
Reslife Server (ORL)
Coldfusion (CPS)
iStrategy (PeopleSoft)
Counseling Services Scheduler

This error was the result of a storage hardware failure, which combined with a software error, caused the storage system to go offline. This error was completely independent of Blackboard and the Windows services noted but nonetheless resulted in them being unavailable. We apologize for any impact this had on courses and other campus business.

OIT fully understands that these systems, especially Blackboard, are a critical campus resource and we are reviewing this incident to make certain this does not happen again. We have included additional details below for those that are interested.

Details
Over the summer OIT installed a new redundant file storage system that is housed in two separate facilities. This redundant file system is used for critical services, such as blackboard, and insures that in a disaster such as a fire or flood we will have the data associated with service available and be able to restore service in the other facility within a few hours. Earlier this week a hardware failure occurred on one of the fiber optic links that connects the two facilities. In replacing the failed hardware component the system should automatically fail over to an alternate fiber optic link and remain up. This time, when the component was replaced a software error caused the file storage system to go offline.

We are working with our storage vendor to fully understand how this occurred and will adjust our procedures to make certain that this kind of error does not happen again. We are committed to providing a robust set of services that you can depend on.

|

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)