Bug 165993 - NFS deadlock when multiple processes creating/deleting a file
Summary: NFS deadlock when multiple processes creating/deleting a file
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 156320
TreeView+ depends on / blocked
 
Reported: 2005-08-15 15:42 UTC by Neil Horman
Modified: 2007-11-30 22:07 UTC (History)
11 users (show)

Fixed In Version: RHSA-2005-663
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-09-28 15:33:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
script to reproduce deadlock problem (585 bytes, text/plain)
2005-08-15 15:42 UTC, Neil Horman
no flags Details
Proposed Patch (558 bytes, patch)
2005-08-25 10:25 UTC, Steve Dickson
no flags Details | Diff
Crash dump with the nfs hang patch applied (36.80 KB, patch)
2005-08-31 16:10 UTC, Imed Chihi
no flags Details | Diff
crash log on test kernel IT 75445 (7.32 KB, text/x-log)
2005-09-01 14:58 UTC, Chris Williams
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:663 0 qe-ready SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 6 2005-09-28 04:00:00 UTC

Description Neil Horman 2005-08-15 15:42:50 UTC
Description of problem:
When running the attached reproduer script, one or more NFS nodes can become
deadlocked

Version-Release number of selected component (if applicable):
tested U5 onward

How reproducible:
always

Steps to Reproduce:
1.Mount an NFS share from two separate nodes
2.run the attached reproducer script on each node, pointing the script to the
NFS share mount point on each node.

  
Actual results:
One or both NFS clients will deadlock

Expected results:
Systems will run without deadlock.

Additional info:

Comment 1 Neil Horman 2005-08-15 15:42:50 UTC
Created attachment 117759 [details]
script to reproduce deadlock problem

Comment 6 Steve Dickson 2005-08-24 18:30:38 UTC
A quick status... I am able to reproduce this and
it appears I'm seeing the same thing Neil was seeing... 

Comment 7 Steve Dickson 2005-08-25 10:25:00 UTC
Created attachment 118104 [details]
Proposed Patch

Please give this patch at try. Its stop an inode from be unhashed when
an ESTALE is returned on a getattr. This in turns stop the sync from
going into an infinite loop which causes the machine to hang.

I was able to continuously run the above reproducer for
a 12 hour period without neither RHLE3 client hanging.

Comment 15 Ernie Petrides 2005-08-30 23:33:43 UTC
TomK/JayT, has Q/A approved of a fix for this bug being taken
into the final RHEL3 U6 kernel respin?  If so, could we please
get the Q/A management ack and the bug moved to the CanFix list?

SteveD, should committing your fix be gated on successful testing
by the bug reporter?  (This bug is still in NEEDINFO_REPORTER.)

Removing block against RHEL4 bug 166772.


Comment 16 Steve Dickson 2005-08-31 08:42:07 UTC
yes

Comment 19 Imed Chihi 2005-08-31 16:10:27 UTC
Created attachment 118308 [details]
Crash dump with the nfs hang patch applied

Comment 22 Chris Williams 2005-09-01 14:58:31 UTC
Created attachment 118346 [details]
crash log on test kernel IT 75445

Comment 25 Ernie Petrides 2005-09-01 19:49:29 UTC
Neil, we're waiting for you to confirm that SteveD's patch in
comment #7 resolves the problem at your customer's site.  We
need this answer by the end of the day today!  Thanks.


Comment 26 Neil Horman 2005-09-02 00:31:39 UTC
Steve, Ernie, Sorry for the delay.  I can confirm that the attached patch fixes
the reported problem.  Now I think you said we just need a QA ACK to move this
along.

Comment 27 Ernie Petrides 2005-09-02 00:42:29 UTC
Thanks, Neil.

TomK/JayT, could you please do the final honors (QA ack and list move)?  Thanks.

Comment 29 Ernie Petrides 2005-09-02 19:18:14 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this afternoon (in kernel version 2.4.21-36.EL).


Comment 32 Red Hat Bugzilla 2005-09-28 15:33:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html



Note You need to log in before you can comment on or make changes to this bug.