Bug 165993 - NFS deadlock when multiple processes creating/deleting a file
NFS deadlock when multiple processes creating/deleting a file
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
Brian Brock
:
Depends On:
Blocks: 156320
  Show dependency treegraph
 
Reported: 2005-08-15 11:42 EDT by Neil Horman
Modified: 2007-11-30 17:07 EST (History)
11 users (show)

See Also:
Fixed In Version: RHSA-2005-663
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-09-28 11:33:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
script to reproduce deadlock problem (585 bytes, text/plain)
2005-08-15 11:42 EDT, Neil Horman
no flags Details
Proposed Patch (558 bytes, patch)
2005-08-25 06:25 EDT, Steve Dickson
no flags Details | Diff
Crash dump with the nfs hang patch applied (36.80 KB, patch)
2005-08-31 12:10 EDT, Imed Chihi
no flags Details | Diff
crash log on test kernel IT 75445 (7.32 KB, text/x-log)
2005-09-01 10:58 EDT, Chris Williams
no flags Details

  None (edit)
Description Neil Horman 2005-08-15 11:42:50 EDT
Description of problem:
When running the attached reproduer script, one or more NFS nodes can become
deadlocked

Version-Release number of selected component (if applicable):
tested U5 onward

How reproducible:
always

Steps to Reproduce:
1.Mount an NFS share from two separate nodes
2.run the attached reproducer script on each node, pointing the script to the
NFS share mount point on each node.

  
Actual results:
One or both NFS clients will deadlock

Expected results:
Systems will run without deadlock.

Additional info:
Comment 1 Neil Horman 2005-08-15 11:42:50 EDT
Created attachment 117759 [details]
script to reproduce deadlock problem
Comment 6 Steve Dickson 2005-08-24 14:30:38 EDT
A quick status... I am able to reproduce this and
it appears I'm seeing the same thing Neil was seeing... 
Comment 7 Steve Dickson 2005-08-25 06:25:00 EDT
Created attachment 118104 [details]
Proposed Patch

Please give this patch at try. Its stop an inode from be unhashed when
an ESTALE is returned on a getattr. This in turns stop the sync from
going into an infinite loop which causes the machine to hang.

I was able to continuously run the above reproducer for
a 12 hour period without neither RHLE3 client hanging.
Comment 15 Ernie Petrides 2005-08-30 19:33:43 EDT
TomK/JayT, has Q/A approved of a fix for this bug being taken
into the final RHEL3 U6 kernel respin?  If so, could we please
get the Q/A management ack and the bug moved to the CanFix list?

SteveD, should committing your fix be gated on successful testing
by the bug reporter?  (This bug is still in NEEDINFO_REPORTER.)

Removing block against RHEL4 bug 166772.
Comment 16 Steve Dickson 2005-08-31 04:42:07 EDT
yes
Comment 19 Imed Chihi 2005-08-31 12:10:27 EDT
Created attachment 118308 [details]
Crash dump with the nfs hang patch applied
Comment 22 Chris Williams 2005-09-01 10:58:31 EDT
Created attachment 118346 [details]
crash log on test kernel IT 75445
Comment 25 Ernie Petrides 2005-09-01 15:49:29 EDT
Neil, we're waiting for you to confirm that SteveD's patch in
comment #7 resolves the problem at your customer's site.  We
need this answer by the end of the day today!  Thanks.
Comment 26 Neil Horman 2005-09-01 20:31:39 EDT
Steve, Ernie, Sorry for the delay.  I can confirm that the attached patch fixes
the reported problem.  Now I think you said we just need a QA ACK to move this
along.
Comment 27 Ernie Petrides 2005-09-01 20:42:29 EDT
Thanks, Neil.

TomK/JayT, could you please do the final honors (QA ack and list move)?  Thanks.
Comment 29 Ernie Petrides 2005-09-02 15:18:14 EDT
A fix for this problem has just been committed to the RHEL3 U6
patch pool this afternoon (in kernel version 2.4.21-36.EL).
Comment 32 Red Hat Bugzilla 2005-09-28 11:33:19 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html

Note You need to log in before you can comment on or make changes to this bug.