Red Hat Bugzilla – Bug 208585
panic when freeing nfs locks of orphaned processes
Last modified: 2008-06-18 11:51:02 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:126.96.36.199) Gecko/20060909 Firefox/188.8.131.52
Description of problem:
SAS 8 has a habit of not dying nicely all the time and leaves orphaned processes hanging around that have nfs locks open. When these processes are killed, the kernel panics with the following (see attachment). This sometimes occurs before the process is fully orphaned as well...(or the crash happens before I can run ps and see the task again.)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start SAS 8 (displayed remotely over ssh tunneled X)
2. Kill -TERM <pid of sshd process>
3. Kill -TERM <pid of orphaned process>
See attached log file (kernel panic BUG() trace).
Process should be terminated cleanly and all be right with the world.
At first I thought that this was only triggered if the process was orphaned completely. I have seen certain instances of this panic where the process is not fully orphaned before the system goes down. It happens regularyly while people are using SAS.
For now, I've applied some patches to SAS that are supposed to help it clean up better if the X server dies beneath it.
I believe that this bug is related to the tgpid being -1 when the nfs locks are released.
Created attachment 137396 [details]
logged output after this BUG() is triggered.
Created attachment 137590 [details]
logged output from kernel panic (with sas 8 patches applied)
I thought that the sas8 patches had at least bandaided things well enough to
prevent this issue from happening, but got a lovely surprise this afternoon.
I hope it helps.
I could also add that this server was recently 'upgraded' from fc4 to as4.
Prior to the upgrade, sas still left it's processes lying around, but this bug
was not triggered. The nfs server is running fc4.
Just a "me too". We just experienced this bug on our general login machine (~180
simultaneous users, lots of NFS homedirs mounted). We were also running the
I experienced the same behaviour after moving to kernel 2.6.9-42.0.3.ELsmp also.
I've had to return to the original fc4 setup as this is a high demand server
and I couldn't have this much downtime. I have the as4 server live still in a
testing area and would be willing to run any tests necessary on it to help
resolve the issue.
I believe that this is a duplicate of bz 218777.
If someone is still seeing this problem, would they mind trying a
2.6.9-52 kernel or newer and if the problem is still occuring in
the newer kernels, than please reopen this bugzilla.
*** This bug has been marked as a duplicate of 218777 ***