« March 2005 | Main | May 2005 »

April 2005 Archives

April 12, 2005

bb6.umbc.edu behavior change

BB6.umbc.edu now points to the virthost webserver, which generates a redirect to bb-app6.umbc.edu. This replaces it's previous behaviour, as browsers have problems setting cookies...

Mail delivery change

The mail delivery system has been altered to "bounce" messages to accounts that are inactive, deactivated, or scheduled to be deleted. This change has been made both because of user requests, and an interest to stop filling up our disks with spam that will never be read.

Continue reading "Mail delivery change" »

April 13, 2005

Mail Delivery Change, revisited

While examining the impact of the mail delivery changes made yesterday, it became clear that we could do less. Meaning, as the determination of an account's status was being made by the MDA at the time of delivery, the a bounce message had to be generated, and sent, by our mail servers to the user. This determination would better be done at the MTA level, where the user's account state could be determined at address resolution time, and returned as an error state to the sending MTA during the SMTP transaction, leaving it with the job to notify the sending user...

Continue reading "Mail Delivery Change, revisited" »

AFS Fiber Channel Network Test Plan

As implementation of UMBC's AFS home directory fiber channel network carries on, I have written a test plan to pursue once all the hardware is in place.

AFS Fiber Channel Network Test Plan

April 14, 2005

Log Administravia

We've enabled "TypeKey" authentication for comments on this site. TypeKey-authenticated users will have their comments auto-approved. Discussion is a good thing.

We'd like to webauth-enable the comments portion of MovableType, but it might take some work. One thing at a time :)

April 15, 2005

First PRODUCTION Solaris 10 x86 box

Our first production Solaris x86 server is on line, and it's not a fileserver as we'd originally planned.

"mr4.umbc.edu", a 2x Xeon Dell 2650 with 3G of RAM was jumpstarted to Solaris 10 yesterday afternoon, and is currently serving imap/pop service to over 400 users as part of the imap/pop service cluster. The other machines in the cluster (mr5 - 8) with similar hardware are currently running Linux.

So, if you're reading mail right now, there's a 1/5 chance you're using it...

Continue reading "First PRODUCTION Solaris 10 x86 box" »

April 18, 2005

userpages.umbc.edu disabled account handling

The content for disabled accounts will no longer display on userpages.umbc.edu. This means that if you're account disabled, so is your free web hosting.

Very simple mod_perl module to do this:

package UMBC::HomePageAccess;

use Apache;
use Apache::Constants qw(:common HTTP_MOVED_TEMPORARILY DECLINE_CMD);
use Apache::Log;

sub handler {

my($r) = shift(@_);

my $uri = $r->uri;

if ( $uri =~ /^\/\~([^\/]+)\// ) {
my ($name, $passwd, $uid, $gid, $quota, $gcos, $dir, $shell) =
getpwnam($1);
if ( defined($passwd) ) {
if ( $passwd =~ /^\*(DEACTIVATED|INACTIVE|DELETED)\*$/ ) {
$r->custom_response(404, "/error_disabled.html");
return 404;

}
}
}

return DECLINED;

}

1;

April 19, 2005

Using WebAuth in the central UMBC Web Environment

This is a brief overview on how to WebAuth-enable, or protect, applications running on the central UMBC Web environment. WebAuth is available as an authentication method within the Apache webserver that is running on the central UMBC webserver, including most virtual-hosted sites. Utilizing WebAuth in your applications only requires manipulation of a ".htaccess" file. Detailed syntax and options are covered in the Apache 1.3 documentation.

Continue reading "Using WebAuth in the central UMBC Web Environment" »

Blackboard reboot

The blackboard servers will be rebooted late tonight 4/20. This is to apply security updates. Folks that are active at the time of reboot could notice a pause
in service of up to 2 minutes. Most folks will not notice anything.

KVM switches down

The KVM switches in racks A3, C1, E3, E4 are currently not functioning. This covers mostly windows servers. There are problems with the pins in some of the cables.
New cables have been ordered and the firmware has been upgraded in the master switch. The cables should arrive by the end of this week.

April 20, 2005

Dell Remote Managment

Today I spent some time playing with/learning about the Dell Remote Access Controller that our 2650's, and other similar boxen, are configured with.

The DRAC provides functionality for console redirection, server managment (power cycling!), hardware status monitoring, and other things. The web interface is slow and horrible, however, there IS a nice command-line tool for both Linux & Windoze to query, configure, and access the remote management functionality.

Continue reading "Dell Remote Managment" »

Live server monitoring roll out

As a side-project to the new AFS server upgrade project, I have implemented a SNMP-based server monitoring site using freely-available PHP-based Cacti.

Right now, statistics for each server are basic, covering CPU usage, load average, disk space consumption, and network interface traffic. In time, I will develop SNMP agents and queries that will dig deeper into what's happening on our servers such as process count (eg: monitoring the number of httpd processes on web servers, imapd/pop3d processes on the mail readers, and so on.) Also, I plan to monitor all ports on our fiber channel switches for traffic and errors.

Since every server will be running NET-SNMP, we can also utilize this as a (potential) new way to monitor servers using SNMP traps. This, however, is another project for another day.

The ultimate goal, though, is to use this as a capacity planning tool.

You can view the statistics we are collecting by clicking here.

April 21, 2005

mr4 outage

Mr4 was down overnight due to an apparent disk problem:

Apr 19 17:07:52 mr4.umbc.edu cadp160: WARNING: Timeout on target 0 lun 0. Initiating recovery.

The disk has been replaced. No user-noticable downtime should have been noticed.

April 23, 2005

Bad Saturday.

There were some system problems (some noticable, and some unnoticable)on Saturday afternoon and into the night affecting some of the core infrastructure. Later in the evening (around 9:30pm), they showed themselves as being a bit worse, and, at that time most of the AFS fileservers (home space, data space, you name it) had their fileserver processes restarted. A couple less cooperative client machines (most of the imap/pop servers, mail delivery systems, irix2 and linux2) were hard rebooted, or powercycled at around 10:45, and most everything was back in service by 11pm. ('cept hfs5, read on)

Continue reading "Bad Saturday." »

April 25, 2005

Last week's myUMBC woes

Here it is! A slightly-belated, much anticipated explanation of myUMBC's issues last week. Short story: myUMBC didn't exactly "break" per se, but it was acting a little flaky due to a couple of separate, unrelated circumstances. The symptoms were: sporadic session timeouts, where users would click on a link and get reprompted for username/password, even if they had not been idle; and also on Friday, there was some weirdness with registration, where myUMBC was not allowing students to register even though their appointments had already passed, making them allegedly eligible.

Both of these issues were caused by time inconsistencies on separate back-end systems. The first issue was due to the Oracle database server time being off 1 hour (not accounting for daylight savings time). The second was the fault of the HP 3000 mainframe, our back-end system of record for all student data. The time on this server was off by 22 minutes, causing the HP to report incorrect student registration appointment times.

There was no correlation between these two separate incidents. Both should now be resolved. I'm writing it off to those wacky leprechauns who live under the floor in the computer room.

Some users also noticed flakiness with the Degree Navigation application last week. The app was giving sporadic "internal server error" pages. This was related to incident #1 (the time discrepancy on the Oracle database server). The Degree Navigation code had a bug which caused it to barf out when it encountered a session timeout condition. This has now been fixed as well.

Hope that all made sense.. It's Monday and I'm still half awake!

bfs4 unintended downtime

bfs4, a fileserver that primarly servers web resources and the like, was unintentionally "paused" during the console server work yesterday. Once it was notied, it was 'unpaused', and everything seemed to recover fine. This outage affected parts of the MyUMBC environment, including webadmin, and class schedules.

Continue reading "bfs4 unintended downtime" »

console server aborted upgrade

Today we tried to upgrade our serial console server from the home-built rack mount machine running RedHat 6.1 that we put together something like 5 years ago to a reliable Dell 2450 we had around (old, but reliable) running RH Enterprise. This was "aborted", well, because it didn't really work out.

As it turns out, however, our Cyclades Y-series cards seemd to have an issue with running in the 2450. We have two different "style" cards, you see, and old-style full-height PCI card, and newer ones are a bit smaller. While they are supposed to be the same card, the "large" cards do not seem to work well in the presence of the "small" ones while in the 2450. Ended up going back to the old setup, and will see what Cyclades has to say about it.

April 26, 2005

Who ever thought NIS would be resource intensive?

And who ever thought that some astray ypserv processes would eat your AFS cell for lunch?

We experienced some problems with our AFS cell today, not to dissimilar to those we experienced on Saturday night. ...

Continue reading "Who ever thought NIS would be resource intensive?" »

April 27, 2005

Blackboard apppak3

Weblock is now running blackboard 6.3.something. It was also taken apart, blown out, and had a magnet run across its motherboard. This was an attempt to remove the dirt/metal fillings inside the server.(from the contractors that put our trays in) Hopefully it will stop crashing now.

backed up mail

Email that passed through mx9in since the system problems yesterday to this morning has been held up in it's local queue. It's clearing out now (it has a half-gig of mail to deliver!), so folks may see some delayed mail coming in from yesterday. All of the other mail delivery servers survived just fine -- there always has to be one...

rack outage

We lost power to a rack this evening at around 5pm; this affected users on hfs9.afs.umbc.edu, and also took down solaris1.gl.

Power was back on quickly afterwards, however, hfs9 hadn't finished "salvaging" it's filesystems until after 6.

About April 2005

This page contains all entries posted to OIT SysCore in April 2005. They are listed from oldest to newest.

March 2005 is the previous archive.

May 2005 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34