Red Hat Bugzilla – Bug 165993
NFS deadlock when multiple processes creating/deleting a file
Last modified: 2007-11-30 17:07:08 EST
Description of problem:
When running the attached reproduer script, one or more NFS nodes can become
Version-Release number of selected component (if applicable):
tested U5 onward
Steps to Reproduce:
1.Mount an NFS share from two separate nodes
2.run the attached reproducer script on each node, pointing the script to the
NFS share mount point on each node.
One or both NFS clients will deadlock
Systems will run without deadlock.
Created attachment 117759 [details]
script to reproduce deadlock problem
A quick status... I am able to reproduce this and
it appears I'm seeing the same thing Neil was seeing...
Created attachment 118104 [details]
Please give this patch at try. Its stop an inode from be unhashed when
an ESTALE is returned on a getattr. This in turns stop the sync from
going into an infinite loop which causes the machine to hang.
I was able to continuously run the above reproducer for
a 12 hour period without neither RHLE3 client hanging.
TomK/JayT, has Q/A approved of a fix for this bug being taken
into the final RHEL3 U6 kernel respin? If so, could we please
get the Q/A management ack and the bug moved to the CanFix list?
SteveD, should committing your fix be gated on successful testing
by the bug reporter? (This bug is still in NEEDINFO_REPORTER.)
Removing block against RHEL4 bug 166772.
Created attachment 118308 [details]
Crash dump with the nfs hang patch applied
Created attachment 118346 [details]
crash log on test kernel IT 75445
Neil, we're waiting for you to confirm that SteveD's patch in
comment #7 resolves the problem at your customer's site. We
need this answer by the end of the day today! Thanks.
Steve, Ernie, Sorry for the delay. I can confirm that the attached patch fixes
the reported problem. Now I think you said we just need a QA ACK to move this
TomK/JayT, could you please do the final honors (QA ack and list move)? Thanks.
A fix for this problem has just been committed to the RHEL3 U6
patch pool this afternoon (in kernel version 2.4.21-36.EL).
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.