« April 2005 | Main | June 2005 »

May 2005 Archives

May 2, 2005

bb restart

The blackboard app server was restarted at 12:14AM (monday) to fix the video driver

May 10, 2005

Legato Client stuff

After installing the legato backup client on two machines (syscoredb and jumpcore) this morning (wondering why this hadn't been done at all, as well), I copied the legato packages and install/config scripts out of Tim's work directory to /afs/umbc.edu/depts/oit/systems/legato.

May 11, 2005

Another bb restart

Both production blackboard server were restarted to clear up the java mem leak. This will hopefully keep us from crashing before the semester ends.

New MySQL server backup regimine

The new MySQL server is now doing nightly dumps of its databases at 3:30am. It creates a full dump and a individual dumps of each database on it in a backup directory, organized by date. This will make reconstituting the entire databases server or individual databases easy to do in the even that needs to be done.

Titan.umbc.edu crash

Titan.umbc.edu went down this morning. One of it's CPU boards completely failed, and has been removed from service. Titan is only a 20 processor machine now... This makes us very very sad, it's so hard to hold back the tears.

May 12, 2005

hfs10 on-line

We've brought hfs10, one of the new Sun V20z fileservers on-line, and are starting to migrate users to it for some good old-fashioned live load testing. Perhaps you're lucky enough to be one of them...

Continue reading "hfs10 on-line" »

May 18, 2005

Fixing "broken" AFS backup volumes

This entry has been moved.

May 19, 2005

hfs11 online

We've brought the second of our Sun V20z/Xraid backed fileservers on line. The first, hfs10, has been performing quite well -- so we figured we'd fire up the second.

Update: Just a reminder -- *someone* configured an alternate interface on this server (an internal, 192.* address). However, this causes the AFS fileserver to register that address as a valid address in the VLDB. An entry in the server's /usr/afs/local/NetRestrict file had to be added, and the server restarted.

May 20, 2005

Real Live Account Deletion...

We've started a process to purge "really old" accounts from our system; meaning, we're clearing out the files of accounts that have been deactivated for a really long time....

May 23, 2005

hfs11.afs memory problems(?)

hfs11 began crashing periodically beginning at 5:30pm on Sunday May 22, seemingly due to some memory problems -- at least that's what the logs seem to suggest. Of course, it had to wait until it was mostly full of users (including the CIO), because that's just the way these things go.

The memory was replaced around 9am. We'll see how she works now.

update: sun concurred that it was the memory.

May 25, 2005

ftp1 problems.

FTP1's system disk failed yesterday evening. It came up fine after a power cycle, but the disk should be replaced very soon.

Mail delivery delays

Between May 20th and May 23rd, we were experiencing some mail delivery delays due to two unconnected reasons.

First, mx5in seemed to have lost it's time sync -- and since everything around here works via Kerberos, and Kerberos requires a relativly synchronized clock to do it's thing, it wasn't able to get priviliges to delivery mail into folks' accounts, and a period over the weekend was queuing mail.

The second reason is a bit more strange -- it appears the the AFS client software on a couple of the mail delivery boxes didn't refresh it's volume location information; as we've been doing quite a few volume moves, it seems for a small subset (probably less that 100) of user volumes, these servers lost track of where they are. We noticed this monday morning, and forcibly refreshed the volume location information on all of our servers which kicked loose the mail that was pending delivery.

Towards a more balanced life

We periodically run a process which examines the space usage on our AFS home directory servers, and moves user volumes from one server to another in an attempt to balance the usage. However, just balancing on usage isn't enough. Recently, we've made some changes in the process to take into account other factors, such as the volume's average activity, in order to have servers with a load profile that is more even.

Continue reading "Towards a more balanced life" »

Mail Transport: mxout

This entry has been moved.

NIS/YP Maps

This page has been moved.

Mail problems and other interests

Today we experienced a mail delivery problem that has gotten us before. Basically, a machine not under our pervue had been sitting with a metric ****ton of cued messages for one person on our system. The administrator of this machine "fixed" the problem, and it's mail server happily began to deliver these into our system. Now, the MTA will accept bunches of message for someone, fork off MDA processes (procmail) to deliver to the local addresses, the MDA will wait for a lockfile to deliver the message, do it's thing, clear the lock file, etc. Of course, if you've got a TON of messages being delivered, there are problably the respective TON of procmail processes waiting for their lockfiles... After awhile, things begin to break down, as all of the available sendmail children processes are waiting for their respective MDA's to deliver messages to this one address... WAIT, what's that locking thing???

Continue reading "Mail problems and other interests" »

May 27, 2005

hfs11 problems (continued)

hfs11's system was replaced last night (5/26) with that of hfs12. If it keeps crashing now, something is really wrong.

May 31, 2005

Continuing Saga of Ftp1

ftp1.umbc.edu's disk finally died today. Died completely, it wouldn't even show up on a SCSI bus.

Continue reading "Continuing Saga of Ftp1" »

AFS Server History

This entry has been moved.

About May 2005

This page contains all entries posted to OIT SysCore in May 2005. They are listed from oldest to newest.

April 2005 is the previous archive.

June 2005 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34