Long Running Processes and AFS
From Syscore
As explained in the AFS documentation, and as many of you have probably found out from experience, AFS uses "tokens" to authenticate you to the filesystems. These tokens are associated with a "process authentication group" (PAG), and have a limited life time (usually 25 hours).
Since most background jobs take less than 25 hours, the token lifetime isn't usually the biggest problem -- however, over the past few weeks it as come to light that the allocation of PAG's is becoming a problem on our multi-user systems. A typical UCE multi-user system goes through over 5000 logins a day, each login allocates a PAG, which exists, unless specifically destroyed, in kernel memory for "at least" the lifetime of the tokens that correspond to it -- typically 25 hours. We've noticed lately, that after a machine as been up for quite a long time that the process of allocationg a PAG becomes unbearably slow. How does this relate to PAG's?
In our first semester of using AFS in production, we noticed a similar problem on our IMAP/POP server. This machine machine handles almost THIRTY THOUSAND logins a day. (each pop access, or IMAP connection is considered a login, and allocates a PAG). We found that after the machine had been up for a certain period of time -- around 20 hours, that things would being to get really slow -- to the point that we couldn't even log in to the machine to fix it. I added a function in each POP & IMAP process that deallocated the PAG that was created for that process on logout, and the problem mysteriously went away...
How does this relate to background jobs?
Since the symptoms that our multi-user machines are having were the same as the IMAP/POP server was having, we have caused PAG's associated with a login session to be destroyed on logout. This *only* destroys the PAG that was created on login -- if other PAGs are created within the session, they are left un-scathed. We hope this will cut down on our need to periodically reboot these machines, leaving work to go un-distrupted.
However, what this does mean, is that if you expect to be running a background job that needs access to your files when you log out, it needs to be ran under a process authentication group that is not the same as your login session's -- actually, this is two steps, one needs to allocate the PAG, and then give that PAG your AFS authentication tokens. However, there is one easy command that will take care of this for you.
The command is aklog (it lives in /usr/k5/bin), and given the argument "-setpag", it creates a new process authentication group, and gives it the authentication tokens associated with your Kerberos 5 tickets that you obtain on login, or via "kinit". After this, any commands ran will be running in this new PAG, which will not be destroyed on logout, letting the jobs continue to run with the appropriate file system authentication.
