Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 4 product line. The current stable release is 4.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 208585

Summary: panic when freeing nfs locks of orphaned processes
Product: Red Hat Enterprise Linux 4 Reporter: Ben Walton <bwalton>
Component: kernelAssignee: Peter Staubach <staubach>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.4CC: dzickus, jbaron, steved, trondham
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-06-18 15:51:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logged output after this BUG() is triggered.
none
logged output from kernel panic (with sas 8 patches applied) none

Description Ben Walton 2006-09-29 14:48:55 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7

Description of problem:
SAS 8 has a habit of not dying nicely all the time and leaves orphaned processes hanging around that have nfs locks open.  When these processes are killed, the kernel panics with the following (see attachment).  This sometimes occurs before the process is fully orphaned as well...(or the crash happens before I can run ps and see the task again.)


Version-Release number of selected component (if applicable):
kernel-smp-2.6.9.-42.0.2

How reproducible:
Sometimes


Steps to Reproduce:
1. Start SAS 8 (displayed remotely over ssh tunneled X)
2. Kill -TERM <pid of sshd process>
3. Kill -TERM <pid of orphaned process>

Actual Results:
See attached log file (kernel panic BUG() trace).

Expected Results:
Process should be terminated cleanly and all be right with the world.

Additional info:
At first I thought that this was only triggered if the process was orphaned completely.  I have seen certain instances of this panic where the process is not fully orphaned before the system goes down.  It happens regularyly while people are using SAS.

For now, I've applied some patches to SAS that are supposed to help it clean up better if the X server dies beneath it.

I believe that this bug is related to the tgpid being -1 when the nfs locks are released.

Comment 1 Ben Walton 2006-09-29 14:50:23 UTC
Created attachment 137396 [details]
logged output after this BUG() is triggered.

Comment 2 Ben Walton 2006-10-02 19:41:39 UTC
Created attachment 137590 [details]
logged output from kernel panic (with sas 8 patches applied)

I thought that the sas8 patches had at least bandaided things well enough to
prevent this issue from happening, but got a lovely surprise this afternoon.

I hope it helps.

-Ben

Comment 3 Ben Walton 2006-10-02 19:46:15 UTC
I could also add that this server was recently 'upgraded' from fc4 to as4. 
Prior to the upgrade, sas still left it's processes lying around, but this bug
was not triggered.  The nfs server is running fc4.

-Ben

Comment 4 Trond H. Amundsen 2006-10-30 14:56:41 UTC
Just a "me too". We just experienced this bug on our general login machine (~180
simultaneous users, lots of NFS homedirs mounted). We were also running the
2.6.9-42.0.2.ELsmp kernel.

-trond

Comment 5 Ben Walton 2006-10-30 15:51:39 UTC
I experienced the same behaviour after moving to kernel 2.6.9-42.0.3.ELsmp also.
 I've had to return to the original fc4 setup as this is a high demand server
and I couldn't have this much downtime.  I have the as4 server live still in a
testing area and would be willing to run any tests necessary on it to help
resolve the issue.

-Ben

Comment 6 Peter Staubach 2006-12-08 14:30:28 UTC
I believe that this is a duplicate of bz 218777.


Comment 7 Peter Staubach 2008-06-18 15:51:02 UTC
If someone is still seeing this problem, would they mind trying a
2.6.9-52 kernel or newer and if the problem is still occuring in
the newer kernels, than please reopen this bugzilla.

*** This bug has been marked as a duplicate of 218777 ***