Bug 250259 - NFS locking problem
Summary: NFS locking problem
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel   
(Show other bugs)
Version: 7
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 250345
TreeView+ depends on / blocked
 
Reported: 2007-07-31 14:02 UTC by Christian Krafft
Modified: 2008-01-11 21:32 UTC (History)
10 users (show)

Fixed In Version: kernel-2.6.23.13-59.fc7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-11 21:32:34 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
write_back.sh (154 bytes, text/x-sh)
2007-07-31 14:03 UTC, Christian Krafft
no flags Details
read.sh (250 bytes, text/x-sh)
2007-07-31 14:04 UTC, Christian Krafft
no flags Details
this patch fixes the problem (1.06 KB, patch)
2007-07-31 14:07 UTC, Christian Krafft
no flags Details | Diff

Description Christian Krafft 2007-07-31 14:02:32 UTC
Description of problem:
Under heavy load NFS client runs into deadlock

Version-Release number of selected component (if applicable):
fc7:
2.6.21-1.3194.fc7
2.6.21-1.3228.fc7
vanilla kernel:
2.6.18 up to 2.6.23-rc1

How reproducible:
run the attached scripts a few minutes

Steps to Reproduce:
1. run 6 times write_back.sh concurrently
2. also run read.sh
  
Actual results:
wait a few minutes, scripts will hang
'ps -ef' hangs, too


Expected results:
scripts are running forever, 'ps -ef' doesn't hang

Additional info:
It's a two way machine, with mount options tcp,nolock the problem
doesn't occur (probably due to a different timing).
The bug is also in the RHEL5.1 code base, but can also not be reproduced
(probably due to a different timing, too).

I will attach both scripts and the patch that fixes the problem.

Comment 1 Christian Krafft 2007-07-31 14:03:46 UTC
Created attachment 160321 [details]
write_back.sh

write_back.sh - puts load on the NFS write path

Comment 2 Christian Krafft 2007-07-31 14:04:30 UTC
Created attachment 160322 [details]
read.sh

read.sh - puts stress on the NFS read path

Comment 3 Christian Krafft 2007-07-31 14:07:44 UTC
Created attachment 160323 [details]
this patch fixes the problem

patch fixes concurrency issue in put_nfs_open_context

Comment 4 Christian Krafft 2007-07-31 14:10:34 UTC
Forgot to mention, that the scrips put load on /tmp and on /var/, so you need
NFS root to reproduce

Comment 5 W. Michael Petullo 2007-09-09 23:28:07 UTC
I have the same problem.  I am using Fedora 7 with kernel 2.6.22.4-65.fc7.  I
modified the scripts above to read and write /home/user, as that is what is NFS
mounted on my system.  The scripts hung after approximately five minutes of
running.  Once hung, I could not "ls /home/user," as this process would also hang.

In normal use, spamassassin seems to cause a hang when accessing the file
/home/user/.spamassassin/bayes_toks.

Comment 6 W. Michael Petullo 2007-09-12 00:23:49 UTC
I tried the patch in comment #3.

I ran Christian's scripts for one hour and thirty minutes and did not see a hang.

However, I then ran the scripts while spamassassin was processing approximately
1,000 emails.  In this case, NFS access hung as described in the previous
comment after 52 minutes.

I will continue to experiment and will report what I find.

Comment 7 W. Michael Petullo 2007-09-15 18:29:03 UTC
I also have this problem when using Fedora 8 Test 2.

Comment 8 Thomas J. Baker 2007-10-14 14:33:53 UTC
I believe I'm experiencing this problem in normal usage. My F7 server works for
a few days, serving home directories and other data, but eventually the F7 and
F8Test clients start reporting that the server lockd is not responding. The only
fix I've found is to reboot the server. I did not have this problem when the
server was running FC5.

Comment 9 Christopher Brown 2008-01-10 19:08:18 UTC
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Comment 10 Thomas J. Baker 2008-01-10 20:31:24 UTC
I have not had this problem in some time but I am not the original reporter.

Comment 11 Steve Dickson 2008-01-11 21:32:34 UTC
The patch in Comment #3 is in both f8 and f7 at this point which is probably
the reason you are no longer seeing this problem. 


Note You need to log in before you can comment on or make changes to this bug.