We had an outage of hfs10,11 & 12 during the morning of 10/25. These three fileservers experienced the "thread starvation" problem which causes all clients that are accessing them to "hang". This was very quickly identified, and the fileserver processess were forcibly restarted. Unfortunatly,this meant the salvager had to run on all of the volumes -- which took between 30-45 minutes per server to complete. Service was restored to the affected volumes by 12:45pm.
All three of these servers were running OpenAFS 1.4.0-rc4. Two had been running since 11/17, while the third was last restarted on 11/9, which rules out the "I've been running for this long and now I'm going to die" theory.
The rxdebug output of one of the servers while it was in it's "hung" state was uninteresting. However, I've forwarded the output to the OpenAFS developers list to see what someone may think of it.
Since waiting for the salvaging to complete is pretty unproductive time, the fileserver + volserver binaries of these machines were upgraded to 1.4.0-rc8.