Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 3 product line. The current stable release is 3.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 165993

Summary: NFS deadlock when multiple processes creating/deleting a file
Product: Red Hat Enterprise Linux 3 Reporter: Neil Horman <nhorman>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: dff, jturner, kanderso, lwang, mjenner, peterm, petrides, rajeev, tao, tburke, tkincaid
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2005-663 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-28 15:33:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 156320    
Attachments:
Description Flags
script to reproduce deadlock problem
none
Proposed Patch
none
Crash dump with the nfs hang patch applied
none
crash log on test kernel IT 75445 none

Description Neil Horman 2005-08-15 15:42:50 UTC
Description of problem:
When running the attached reproduer script, one or more NFS nodes can become
deadlocked

Version-Release number of selected component (if applicable):
tested U5 onward

How reproducible:
always

Steps to Reproduce:
1.Mount an NFS share from two separate nodes
2.run the attached reproducer script on each node, pointing the script to the
NFS share mount point on each node.

  
Actual results:
One or both NFS clients will deadlock

Expected results:
Systems will run without deadlock.

Additional info:

Comment 1 Neil Horman 2005-08-15 15:42:50 UTC
Created attachment 117759 [details]
script to reproduce deadlock problem

Comment 6 Steve Dickson 2005-08-24 18:30:38 UTC
A quick status... I am able to reproduce this and
it appears I'm seeing the same thing Neil was seeing... 

Comment 7 Steve Dickson 2005-08-25 10:25:00 UTC
Created attachment 118104 [details]
Proposed Patch

Please give this patch at try. Its stop an inode from be unhashed when
an ESTALE is returned on a getattr. This in turns stop the sync from
going into an infinite loop which causes the machine to hang.

I was able to continuously run the above reproducer for
a 12 hour period without neither RHLE3 client hanging.

Comment 15 Ernie Petrides 2005-08-30 23:33:43 UTC
TomK/JayT, has Q/A approved of a fix for this bug being taken
into the final RHEL3 U6 kernel respin?  If so, could we please
get the Q/A management ack and the bug moved to the CanFix list?

SteveD, should committing your fix be gated on successful testing
by the bug reporter?  (This bug is still in NEEDINFO_REPORTER.)

Removing block against RHEL4 bug 166772.


Comment 16 Steve Dickson 2005-08-31 08:42:07 UTC
yes

Comment 19 Imed Chihi 2005-08-31 16:10:27 UTC
Created attachment 118308 [details]
Crash dump with the nfs hang patch applied

Comment 22 Chris Williams 2005-09-01 14:58:31 UTC
Created attachment 118346 [details]
crash log on test kernel IT 75445

Comment 25 Ernie Petrides 2005-09-01 19:49:29 UTC
Neil, we're waiting for you to confirm that SteveD's patch in
comment #7 resolves the problem at your customer's site.  We
need this answer by the end of the day today!  Thanks.


Comment 26 Neil Horman 2005-09-02 00:31:39 UTC
Steve, Ernie, Sorry for the delay.  I can confirm that the attached patch fixes
the reported problem.  Now I think you said we just need a QA ACK to move this
along.

Comment 27 Ernie Petrides 2005-09-02 00:42:29 UTC
Thanks, Neil.

TomK/JayT, could you please do the final honors (QA ack and list move)?  Thanks.

Comment 29 Ernie Petrides 2005-09-02 19:18:14 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this afternoon (in kernel version 2.4.21-36.EL).


Comment 32 Red Hat Bugzilla 2005-09-28 15:33:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html