« Sun Studio 10 | Main | license1 upgrade »

And so we begin: The AFS server overhaul.

For the past three to four years or so, the AFS servers which house and serve all UMBC users' UNIX home directories, email, and personal web pages have been operating on a mix of Sun UltraSPARC and x86-based hardware platforms running Solaris 8 and Linux. Each server has its own direct-attached SCSI-based storage array, each containing 8 drives of various sizes. Host-based software RAID has kept drive failures from becoming a visible issue to the users whose home volumes are stored on a particular server.

Four years is a fairly respectable period of time for a service to be running non-stop like that. Naturally, though, technology and physical wear and tear progresses to the point where it becomes advantageous to seek out a upgrade before the old gets too old and catastrophic failure becomes a distinct possibility and resident boogeyman.

Over the past several months, Rob and I have been scouting for such a upgrade. We've spent time eyeing both existing and emerging technologies and products with the purpose of integrating the most cost-effective, scaleable and fault-tolerant AFS server network we can. We have settled on the following:

  • Sun V20z AMD Opteron-based servers
  • Qlogic fiber channel network (switches and host cards)
  • Apple Xserve RAID arrays

Some detail on these three items:

Sun V20z server
It's no secret that both Rob and I have a strong Sun/Solaris-based background. Over the years, we have both come to appreciate the many positive aspects and properties of Sun's SPARC-based hardware and the Solaris OS. It was a small surprise, though, when Sun released their first modern x86-based servers in early 2004, especially given Sun's tenuous relationship with Microsoft and Red Hat (whose OSes these servers are certified to run) and the fact that Sun very nearly killed their own Solaris x86 OS product in 2003. But as 2004 barreled on, we began to see Sun's hand unfold - they really had something big and good planned. This was the release of Solaris 10.

Big deal, right? Solaris 10 would probably be another ho-hum rev to a OS with shrinking market share, one could (and many did) figure. Well, it turned out not to be that way... the folks in Mountain View and Cambridge were busier than we imagined, turning out tools such as Dtrace and a completely redesigned and optimized network stack, among other large and complicated things. No small feat. So then it was obvious: Sun sent Solaris to the drydock for a complete overhaul, and then they were going to mate it up with the fastest x86 hardware out there - AMD's Opteron 64bit CPU. Brilliant.

Naturally, this made Rob and I quite giddy. Reviews of both Solaris 10 and the V20z server started to come in from respecable sources and the combo was declared A Very Good Thing. We ordered three, and they arrived today. They will be fitted with Qlogic FC cards and dual homed to the FC network.

Apple's Xserve RAID (Xraid) arrays
Just like Sun did with Opteron and Solaris 10, our friends at Apple also made a surprising yet welcomed move a few years back with the release of their Xraid hardware RAID array. Internally, this machine has matured over time and can very well be considered the best value in the industry when it comes to price/performance, topped off with Apple renowned quality.

Dual RAID controllers, one gigabyte of cache (512MB per RAID controller), each internal ATA hard drive on its own buss controller, and 2GB fiber channel to front-end all of that to PTP or SAN fiber channel. The two we have ordered puts 2.0TB of storage in only 6 total rack units of space (one rack unit is 1.75 inches.) Amazing.

Fiber for the diet
Tying all of the above gear together will be a fiber channel SAN. It will comprise of two Qlogic SANbox 5200 switches, both linked together. Our main AFS servers (the Sun V20z's) will have one connection to each switch. The Apple Xraids will also have one connection to each switch. This will provide us with two things:

  • Redundancy on the switch level. If one switch were to fail or lose power, the Xraids and V20z's could still talk with each other via the second switch.
  • Reduncancy on the fiber level, between both switches and dual-homed hosts. Fiber cables, as wonderful as they are, are still fragile things. They fail after a while or by the cause of an accident. This design will mitigate that.

Eventually, the fiber channel network will be expanded beyond the ECS building. Over time, a partial mirror of the data center in ECS will be built in the Public Policy building. This fiber channel network design will accomodate that nicely.

--

So there you have it. That's our plan to make sure you continue to have a reliable place to house your email, web pages, and home directories. Any questions?

Post a comment

About

This page contains a single entry from the blog posted on March 23, 2005 8:25 PM.

The previous post in this blog was Sun Studio 10.

The next post in this blog is license1 upgrade.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34