Bug 250259 - NFS locking problem
NFS locking problem
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
7
All Linux
low Severity high
: ---
: ---
Assigned To: Steve Dickson
Fedora Extras Quality Assurance
:
Depends On:
Blocks: 250345
  Show dependency treegraph
 
Reported: 2007-07-31 10:02 EDT by Christian Krafft
Modified: 2008-01-11 16:32 EST (History)
10 users (show)

See Also:
Fixed In Version: kernel-2.6.23.13-59.fc7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-11 16:32:34 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
write_back.sh (154 bytes, text/x-sh)
2007-07-31 10:03 EDT, Christian Krafft
no flags Details
read.sh (250 bytes, text/x-sh)
2007-07-31 10:04 EDT, Christian Krafft
no flags Details
this patch fixes the problem (1.06 KB, patch)
2007-07-31 10:07 EDT, Christian Krafft
no flags Details | Diff

  None (edit)
Description Christian Krafft 2007-07-31 10:02:32 EDT
Description of problem:
Under heavy load NFS client runs into deadlock

Version-Release number of selected component (if applicable):
fc7:
2.6.21-1.3194.fc7
2.6.21-1.3228.fc7
vanilla kernel:
2.6.18 up to 2.6.23-rc1

How reproducible:
run the attached scripts a few minutes

Steps to Reproduce:
1. run 6 times write_back.sh concurrently
2. also run read.sh
  
Actual results:
wait a few minutes, scripts will hang
'ps -ef' hangs, too


Expected results:
scripts are running forever, 'ps -ef' doesn't hang

Additional info:
It's a two way machine, with mount options tcp,nolock the problem
doesn't occur (probably due to a different timing).
The bug is also in the RHEL5.1 code base, but can also not be reproduced
(probably due to a different timing, too).

I will attach both scripts and the patch that fixes the problem.
Comment 1 Christian Krafft 2007-07-31 10:03:46 EDT
Created attachment 160321 [details]
write_back.sh

write_back.sh - puts load on the NFS write path
Comment 2 Christian Krafft 2007-07-31 10:04:30 EDT
Created attachment 160322 [details]
read.sh

read.sh - puts stress on the NFS read path
Comment 3 Christian Krafft 2007-07-31 10:07:44 EDT
Created attachment 160323 [details]
this patch fixes the problem

patch fixes concurrency issue in put_nfs_open_context
Comment 4 Christian Krafft 2007-07-31 10:10:34 EDT
Forgot to mention, that the scrips put load on /tmp and on /var/, so you need
NFS root to reproduce
Comment 5 W. Michael Petullo 2007-09-09 19:28:07 EDT
I have the same problem.  I am using Fedora 7 with kernel 2.6.22.4-65.fc7.  I
modified the scripts above to read and write /home/user, as that is what is NFS
mounted on my system.  The scripts hung after approximately five minutes of
running.  Once hung, I could not "ls /home/user," as this process would also hang.

In normal use, spamassassin seems to cause a hang when accessing the file
/home/user/.spamassassin/bayes_toks.
Comment 6 W. Michael Petullo 2007-09-11 20:23:49 EDT
I tried the patch in comment #3.

I ran Christian's scripts for one hour and thirty minutes and did not see a hang.

However, I then ran the scripts while spamassassin was processing approximately
1,000 emails.  In this case, NFS access hung as described in the previous
comment after 52 minutes.

I will continue to experiment and will report what I find.
Comment 7 W. Michael Petullo 2007-09-15 14:29:03 EDT
I also have this problem when using Fedora 8 Test 2.
Comment 8 Thomas J. Baker 2007-10-14 10:33:53 EDT
I believe I'm experiencing this problem in normal usage. My F7 server works for
a few days, serving home directories and other data, but eventually the F7 and
F8Test clients start reporting that the server lockd is not responding. The only
fix I've found is to reboot the server. I did not have this problem when the
server was running FC5.
Comment 9 Christopher Brown 2008-01-10 14:08:18 EST
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.
Comment 10 Thomas J. Baker 2008-01-10 15:31:24 EST
I have not had this problem in some time but I am not the original reporter.
Comment 11 Steve Dickson 2008-01-11 16:32:34 EST
The patch in Comment #3 is in both f8 and f7 at this point which is probably
the reason you are no longer seeing this problem. 

Note You need to log in before you can comment on or make changes to this bug.