Bug 165993 - NFS deadlock when multiple processes creating/deleting a file
NFS deadlock when multiple processes creating/deleting a file
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
Brian Brock
Depends On:
Blocks: 156320
  Show dependency treegraph
Reported: 2005-08-15 11:42 EDT by Neil Horman
Modified: 2007-11-30 17:07 EST (History)
11 users (show)

See Also:
Fixed In Version: RHSA-2005-663
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-09-28 11:33:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
script to reproduce deadlock problem (585 bytes, text/plain)
2005-08-15 11:42 EDT, Neil Horman
no flags Details
Proposed Patch (558 bytes, patch)
2005-08-25 06:25 EDT, Steve Dickson
no flags Details | Diff
Crash dump with the nfs hang patch applied (36.80 KB, patch)
2005-08-31 12:10 EDT, Imed Chihi
no flags Details | Diff
crash log on test kernel IT 75445 (7.32 KB, text/x-log)
2005-09-01 10:58 EDT, Chris Williams
no flags Details

  None (edit)
Description Neil Horman 2005-08-15 11:42:50 EDT
Description of problem:
When running the attached reproduer script, one or more NFS nodes can become

Version-Release number of selected component (if applicable):
tested U5 onward

How reproducible:

Steps to Reproduce:
1.Mount an NFS share from two separate nodes
2.run the attached reproducer script on each node, pointing the script to the
NFS share mount point on each node.

Actual results:
One or both NFS clients will deadlock

Expected results:
Systems will run without deadlock.

Additional info:
Comment 1 Neil Horman 2005-08-15 11:42:50 EDT
Created attachment 117759 [details]
script to reproduce deadlock problem
Comment 6 Steve Dickson 2005-08-24 14:30:38 EDT
A quick status... I am able to reproduce this and
it appears I'm seeing the same thing Neil was seeing... 
Comment 7 Steve Dickson 2005-08-25 06:25:00 EDT
Created attachment 118104 [details]
Proposed Patch

Please give this patch at try. Its stop an inode from be unhashed when
an ESTALE is returned on a getattr. This in turns stop the sync from
going into an infinite loop which causes the machine to hang.

I was able to continuously run the above reproducer for
a 12 hour period without neither RHLE3 client hanging.
Comment 15 Ernie Petrides 2005-08-30 19:33:43 EDT
TomK/JayT, has Q/A approved of a fix for this bug being taken
into the final RHEL3 U6 kernel respin?  If so, could we please
get the Q/A management ack and the bug moved to the CanFix list?

SteveD, should committing your fix be gated on successful testing
by the bug reporter?  (This bug is still in NEEDINFO_REPORTER.)

Removing block against RHEL4 bug 166772.
Comment 16 Steve Dickson 2005-08-31 04:42:07 EDT
Comment 19 Imed Chihi 2005-08-31 12:10:27 EDT
Created attachment 118308 [details]
Crash dump with the nfs hang patch applied
Comment 22 Chris Williams 2005-09-01 10:58:31 EDT
Created attachment 118346 [details]
crash log on test kernel IT 75445
Comment 25 Ernie Petrides 2005-09-01 15:49:29 EDT
Neil, we're waiting for you to confirm that SteveD's patch in
comment #7 resolves the problem at your customer's site.  We
need this answer by the end of the day today!  Thanks.
Comment 26 Neil Horman 2005-09-01 20:31:39 EDT
Steve, Ernie, Sorry for the delay.  I can confirm that the attached patch fixes
the reported problem.  Now I think you said we just need a QA ACK to move this
Comment 27 Ernie Petrides 2005-09-01 20:42:29 EDT
Thanks, Neil.

TomK/JayT, could you please do the final honors (QA ack and list move)?  Thanks.
Comment 29 Ernie Petrides 2005-09-02 15:18:14 EDT
A fix for this problem has just been committed to the RHEL3 U6
patch pool this afternoon (in kernel version 2.4.21-36.EL).
Comment 32 Red Hat Bugzilla 2005-09-28 11:33:19 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.