<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
   <title>OIT SysCore</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/" />
   <link rel="self" type="application/atom+xml" href="http://www.umbc.edu/blogs/oit-syscore/atom.xml" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29</id>
   <updated>2007-08-18T17:39:58Z</updated>
   
   <generator uri="http://www.sixapart.com/movabletype/">Movable Type 3.34</generator>

<entry>
   <title>Blackboard upgrades 8/18/2007</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/08/blackboard_upgrades_8182007.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.4728</id>
   
   <published>2007-08-18T17:37:50Z</published>
   <updated>2007-08-18T17:39:58Z</updated>
   
   <summary>Todays updates did not go well. I&apos;m finishing the process of rolling back at this point. Blackboard is up and running in the state it was before the upgrade. There were a number of problems but the short version is...</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      Todays updates did not go well.  I&apos;m finishing the process of rolling back at this point.
Blackboard is up and running in the state it was before the upgrade.

There were a number of problems but the short version is that the upgrade was taking way too long and finally errored out.  I started at about 6:30 and started the rollback at about 11:45.
I have a few ideas on how to make the update work that involve the
database.  I&apos;m also submitting some errors to blackboard in a few minutes.
If I can get the mirrors rebuild I can try the update again this Sunday night at
10PM.  Otherwise I can try it some night  this coming week

The first unexpected &quot;got ya&quot; that hit me was that upgrade was unable to modify the
database due to the replication subscriptions.  We are replicating our blackboard databases to a special database for access by our umbc apps.  I had to remove the replication subscriptions to remove the publications, to alow the updater to modify the database.

What this means is that I&apos;ll need to completely reconfigure the replication
setup next week.  This should be a good thing.  When I first set up the replication,
I felt like there was quite a bit of pressure to get it working very quickly.  So, I never really had time to figure out what I was doing.  Nor was I able to document &quot;how to set up replication&quot; for the blackboard database.  Now hopefully I should be able to set this up better and document the procedure properly.

The second problem was that the updater had not finished the first server after 2 hours
and it normally takes 30 minutes.  I&apos;ve seen this before in test, and it didn&apos;t finish after a
full day.  So I had to start over. The next try I let the updater run for 2.5 hours and it finished with a critical error. Hence the rollback.

Netsaint is broken.  Going to netsaint.umbc.edu gives a PGP page.

      
   </content>
</entry>
<entry>
   <title>Blackboard downtime 8/16</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/08/blackboard_downtime_816.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.4726</id>
   
   <published>2007-08-17T17:43:23Z</published>
   <updated>2007-08-17T18:29:04Z</updated>
   
   <summary>First problem I heard was around the date of 8/13 while I was still away on vacation. Kip Canfield could not access his assessments and students could not take tests. I did some testing and worked with Jeffery to prove...</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      First problem I heard was around the date of 8/13 while I was still away on vacation. Kip Canfield could not access his assessments and students could not take tests.  I did some testing and worked with Jeffery to prove that this was not system wide.  I also found a possible work around.
I then tried rebooting the servers to see if the problem would clear.  This caused the access problems to happen to not only assessments, but to every course, systemwide.  I pushed out a reconfig and this caused even more problems as it would not complete cleanly. One instance of the problems was the tomcat service would not delete, so it could not be installed correctly. When I finally got the system to &quot;config&quot; without errors, blackboard started working again.

Then I found one really weird network issue.  The database server could not reach the fileserver/collabserver.  As in &quot;ping 130.85.29.12&quot;  (from 130.85.29.13) returned &quot;unreachable&quot;  I fixed that by removing some of the broadcom advanced features I had enabled on the database server.
The features have been running for over a year. So, my guess is that some of the recent network changes in the computer room caused this problem to show up now.
This was probably not cause of the access problems, although it would have broken the collab service.

      
   </content>
</entry>
<entry>
   <title>Blackboard pauses</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/07/blackboard_pauses.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.4626</id>
   
   <published>2007-07-23T17:14:01Z</published>
   <updated>2007-07-23T18:10:15Z</updated>
   
   <summary>On Sunday 7/22/2007 ~10:30 PM - 11:00PM, patches were applied to our blackboard servers. Users may have noticed some &quot;pauses&quot; or incomplete pages during this time....</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      On Sunday 7/22/2007 ~10:30 PM - 11:00PM, patches were applied to 
our blackboard servers.  Users may have noticed some &quot;pauses&quot;
or incomplete pages during this time.  
      
   </content>
</entry>
<entry>
   <title>Blackboard file system repair</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/06/blackboard_file_system_repair.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.4577</id>
   
   <published>2007-06-12T11:35:20Z</published>
   <updated>2007-06-12T11:46:13Z</updated>
   
   <summary>Last night, 6/12/07, we ran a repair operation on Blackboards file system. During this time (12:15AM - 12:45AM) Parts of the blackboard system would not function correctly. Any files stored within blackboards file system would have been inaccessable. The repair...</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      Last night, 6/12/07, we ran a repair operation on Blackboards file system.   During this time
(12:15AM - 12:45AM)  Parts of the blackboard system would not function correctly.  Any files
stored within blackboards file system would have been inaccessable.

The repair corrected some corruption that had found its way into the file system.


                                                                     

      
   </content>
</entry>
<entry>
   <title>Blackboard hang again</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/06/blackboard_hang_again.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.4554</id>
   
   <published>2007-06-04T20:02:02Z</published>
   <updated>2007-06-04T20:07:28Z</updated>
   
   <summary>Blackboard hung again ~ 12:50PM 6/4 Seems the database server got so heavily loaded that it stopped responding to backboard. New media was running queries at the time. I restarted the database server, Blackboard started working again. After this new...</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      Blackboard  hung again ~ 12:50PM 6/4   Seems the database server 
got so heavily loaded that it stopped responding to backboard.

New media was running queries at the time.  I restarted the database server, Blackboard started working again.  

After this new media started the query again, Blackboard stopped working.  I then restarted just the SQL process and blackboard started working again.
      
   </content>
</entry>
<entry>
   <title>Blackboard Hang</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/05/blackboard_hang.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.4546</id>
   
   <published>2007-05-24T20:14:45Z</published>
   <updated>2007-05-24T20:55:41Z</updated>
   
   <summary>On the enening of 5/23 and 5/24 Blackboard hung from ~1:30 AM till 3:00AM. I am currently investigating the cause. It seems that during the above times I was unable to map the drive on the file server from the...</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      On the enening of 5/23 and 5/24  Blackboard hung from ~1:30 AM
till 3:00AM.  I am currently investigating the cause.  
It seems that during the above times I was unable to map the 
drive on the file server from the app servers.  


      
   </content>
</entry>
<entry>
   <title>Blackboard slowdown?</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/05/blackboard_slowdown.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.4512</id>
   
   <published>2007-05-08T20:53:18Z</published>
   <updated>2007-05-08T21:02:18Z</updated>
   
   <summary>5/7 at ~1pm one of the blackboard front end servers (app2) started reporting very slow http response. I was not able to determine what was causing this but restarted tomcat and failed all users over to the other server. After...</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      5/7 at ~1pm one of the blackboard front end servers (app2) started reporting very slow http response.  I was not able to determine what was causing this but restarted tomcat and failed all users over to the other server.  After restarting App2, App1 started exhibiting the same behavior.
This points to a user process causing the problem.  I spoke with Bob 
and he had run some large gradebook operations, but nothing else unusual


      
   </content>
</entry>
<entry>
   <title>Blackboard partial &quot;hang&quot;</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/04/blackboard_partial_hang.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.4443</id>
   
   <published>2007-04-25T18:18:22Z</published>
   <updated>2007-04-25T18:24:49Z</updated>
   
   <summary>One of our blackboard app servers hung today at ~1:12pm. Someone in the newmedia group tried to run a sort on the gradebook of a community with ~13,000 users. The server then became unresponsive. The other server did not stop...</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      One of our blackboard app servers hung today at ~1:12pm.  Someone in the newmedia group tried to run a sort on the gradebook of a community with ~13,000 users.  The server then became unresponsive.
The other server did not stop but kept working.  I restarted the JVM and everything worked again.
      
   </content>
</entry>
<entry>
   <title>Blackboard security updates reboot</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/04/blackboard_security_updates_re.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.3726</id>
   
   <published>2007-04-19T05:40:52Z</published>
   <updated>2007-04-19T05:43:43Z</updated>
   
   <summary>The blackboard backend servers were rebooted at 4/19 at 12:30 AM to finish the installation of OS security patches. This may have caused up to a 2 minutes pause for some users....</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      The blackboard backend servers were rebooted at 4/19 at 12:30 AM to finish the installation
of OS security patches.  This may have caused up to a 2 minutes pause for some users.


      
   </content>
</entry>
<entry>
   <title>More Blackboard slow downs</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/04/more_blackboard_slow_downs.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.3677</id>
   
   <published>2007-04-12T17:26:50Z</published>
   <updated>2007-04-12T17:35:10Z</updated>
   
   <summary>I noticed the same &quot;hanging&quot; problem on blackboard at ~12:20 4/12 I waited on a page load for about 2 minutes. I checked the database and the same process was active that was causing the problems before. My guess is...</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      I noticed the same &quot;hanging&quot; problem on blackboard at ~12:20 4/12

I waited on a page load for about 2 minutes.  I checked the database 
and the same process was active that was causing the problems before.
My guess is that the query is locking tables the do not need to be locked.
There are a large number of blocked processes during the slowdown


      
   </content>
</entry>
<entry>
   <title>Blackboard slowdown 4/12</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/04/blackboard_slowdown_412.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.3676</id>
   
   <published>2007-04-12T16:22:16Z</published>
   <updated>2007-04-12T16:39:33Z</updated>
   
   <summary>At ~11:15 on 4/12/07 Blackboard slowed down to the point where we were receiving calls about it. We determined that someone in the new media group was running some very intensive database queries. As soon as these were killed blackboard...</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      At ~11:15 on 4/12/07 Blackboard slowed down to the point where we were receiving calls about it.  We determined that someone in the new media 
group was running some very intensive database queries.  As soon as 
these were killed blackboard returned to normal operation.  


      
   </content>
</entry>
<entry>
   <title>Blackboard</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/04/blackboard.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.3673</id>
   
   <published>2007-04-11T20:20:43Z</published>
   <updated>2007-04-11T20:23:24Z</updated>
   
   <summary>There was a possible pause on the collaberation service today 4/11 at ~2:40PM. I do not know if this effected anyone. The service was restarted due to &quot;human error&quot;...</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      There was a possible pause on the collaberation service today 4/11 at ~2:40PM.  I do not know if this effected anyone.
The service was restarted due to &quot;human error&quot;

      
   </content>
</entry>
<entry>
   <title>AFS security change on core-managed systems</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/04/afs_security_change_on_coreman.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.3672</id>
   
   <published>2007-04-11T16:43:38Z</published>
   <updated>2007-04-11T16:45:54Z</updated>
   
   <summary>As per OpenAFS Security Advisory 2007-01, setuid status has been disabled on all of the core managed servers and workstations. This has been done via a cfengine change, and the introduction of an init script which will disable setuid status...</summary>
   <author>
      <name>Rob Banz</name>
      
   </author>
         <category term="Software" scheme="http://www.sixapart.com/ns/types#category" />
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      <![CDATA[As per <A HREF="http://www.openafs.org/security/OPENAFS-SA-2007-001.txt">OpenAFS Security Advisory 2007-01</A>, setuid status has been disabled on all of the core managed servers and workstations.  This has been done via a cfengine change, and the introduction of an init script which will disable setuid status on bootup.]]>
      
   </content>
</entry>
<entry>
   <title>Blackboard pauses at 1:30 - 2:00 AM</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/04/blackboard_pauses_at_130_200_a.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.3628</id>
   
   <published>2007-04-04T06:33:47Z</published>
   <updated>2007-04-04T06:37:21Z</updated>
   
   <summary>Blackboard had two ~3 minutes pauses at about 1:50 AM this morning. This was neccessary in order to apply patches to the back end servers....</summary>
   <author>
      <name>David Freeman</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      Blackboard had two ~3 minutes pauses at about 1:50 AM this morning.  This was neccessary
in order to apply patches to the back end servers.  
      
   </content>
</entry>
<entry>
   <title>New backup server to begin testing</title>
   <link rel="alternate" type="text/html" href="http://www.umbc.edu/blogs/oit-syscore/2007/04/new_backup_server_to_begin_tes.html" />
   <id>tag:www.umbc.edu,2007:/blogs/oit-syscore//29.3627</id>
   
   <published>2007-04-03T23:32:10Z</published>
   <updated>2007-04-03T23:48:03Z</updated>
   
   <summary>Our new Legato Networker server and LTO-3 tape library has been set up and has begun early testing! The new server specs are: one Sun T1000 with 4GB of RAM and a fibre channel interface to connect it to four...</summary>
   <author>
      <name>Dale Ghent</name>
      
   </author>
   
   
   <content type="html" xml:lang="en-us" xml:base="http://www.umbc.edu/blogs/oit-syscore/">
      Our new Legato Networker server and LTO-3 tape library has been set up and has begun early testing!

The new server specs are: one Sun T1000 with 4GB of RAM and a fibre channel interface to connect it to four LTO-3 drives in a new Qualstar TLS-88264 tape library. This tape library has 131 data tapes in it with 400GB of native capacity each. In the future, this tape library can be expanded to a total of 264 tapes and eight LTO-3 or LTO-4 drives.

This new system will replace two existing and administratively-separate backup systems and AIT-based libraries operated by BSG and OIT.
      
   </content>
</entry>

</feed>
