Bug 70561 - [NFS] Client and server hang with NFS accesses
[NFS] Client and server hang with NFS accesses
Product: Red Hat Raw Hide
Classification: Retired
Component: kernel (Show other bugs)
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Jones
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2002-08-02 11:01 EDT by Jeremy Sanders
Modified: 2015-01-04 17:01 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-10-29 23:32:47 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
most of traces on system (15.12 KB, text/plain)
2002-08-09 06:44 EDT, Jeremy Sanders
no flags Details
tcpdump -vv -s0 on traffic when trying to mount export (3.13 KB, text/plain)
2002-08-09 06:51 EDT, Jeremy Sanders
no flags Details

  None (edit)
Description Jeremy Sanders 2002-08-02 11:01:04 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1b) Gecko/20020722

Description of problem:
Using the RedHat 7.3 2.4.18-5 kernel on a i686 client, and mounting from a
server running RawHide 2.4.18-7.80 on a i686, nfs accesses lock up the nfs
daemon on the server and the process accessing the nfs mount on the client. The
nfs server processes get stuck in a "D" state and so does the client process. 

The server cannot be shut down properly as the nfsd processes get stuck and
can't be killed by the shutdown scripts. Therefore bug is marked as "severe".

Using 2.4.18-5 on the server works fine, as does running standard 2.4.19-rc2
(Linus kernel).

We also tried 2.4.18-7.86 on the server, but this failed to install (depmod
dependency problems).

Both the machines use eepro100 cards on 100Mps ethernet.
Mount options are:

(also rsize,wsize=8192 tried)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Run 2.4.18-7.80 on a machine exporting nfs mounts
2. Mount export on client running 2.4.18-5
3. ls /mnt/nfs

Actual Results:  Client process hangs. Server processes hang.

Expected Results:  Client should be able to access nfs export.

Additional info:
Comment 1 Jeremy Sanders 2002-08-08 10:12:41 EDT
This is also broken with RedHat 2.4.18-7.93 (updated modutils to install this).
Comment 2 Jeremy Sanders 2002-08-08 11:07:23 EDT
Sorry for more email, but this is also broken with 2.4.18-7.94, which has most
of the nfs patchset in.
Comment 3 Ben LaHaise 2002-08-08 14:16:23 EDT
Could you obtain a backtrace of the stuck processes via Ctrl-Scroll Lock on the
console?  That will give us an idea where the processes are getting stuck to
help track down the problem.
Comment 4 Jeremy Sanders 2002-08-09 06:43:17 EDT
Okay, I'll attach the trace I managed to grab from dmesg after pressing
ctrl+scroll lock. Basically the stuck nfsd processes are like:

nfsd          D F68108E0  5976  1141      1          1140  1142 (L-TLB)
Call Trace: [<c0107f7a>] __down [kernel] 0x6a (0xf6afbde4))
[<c01080d4>] __down_failed [kernel] 0x8 (0xf6afbe08))
[<f8824aa0>] ext3_readdir [ext3] 0x0 (0xf6afbe10))
[<c014fdce>] .text.lock.readdir [kernel] 0x5 (0xf6afbe18))
[<f89997e3>] nfsd_readdir [nfsd] 0xc3 (0xf6afbe38))
[<f89a1070>] nfs3svc_encode_entry_plus [nfsd] 0x0 (0xf6afbe40))
[<f8837ca0>] ext3_dir_operations [ext3] 0x0 (0xf6afbe80))
[<f899ee6e>] nfsd3_proc_readdirplus [nfsd] 0xde (0xf6afbef0))
[<f89a1070>] nfs3svc_encode_entry_plus [nfsd] 0x0 (0xf6afbf04))
[<f89a61c4>] nfsd_procedures3 [nfsd] 0x264 (0xf6afbf24))
[<f89935c0>] nfsd_dispatch [nfsd] 0xd0 (0xf6afbf30))
[<f89a5898>] nfsd_version3 [nfsd] 0x0 (0xf6afbf44))
[<f89754cc>] svc_process_R6eda96b1 [sunrpc] 0x43c (0xf6afbf50))
[<f89a61c4>] nfsd_procedures3 [nfsd] 0x264 (0xf6afbf78))
[<f89a58b8>] nfsd_program [nfsd] 0x0 (0xf6afbf7c))
[<f89933b0>] nfsd [nfsd] 0x1d0 (0xf6afbf98))
[<c010765e>] kernel_thread [kernel] 0x2e (0xf6afbff0))
[<f89931e0>] nfsd [nfsd] 0x0 (0xf6afbff8))

Comment 5 Jeremy Sanders 2002-08-09 06:44:32 EDT
Created attachment 69705 [details]
most of traces on system
Comment 6 Jeremy Sanders 2002-08-09 06:51:46 EDT
Created attachment 69706 [details]
tcpdump -vv -s0 on traffic when trying to mount export
Comment 7 Jeremy Sanders 2002-08-09 06:54:09 EDT
The above tcpdump output is from trying to mount the export on a 2.4.18-5 system
(server running rawhide), using tcpdump -vv -s0. Mount command fails this time with:

[root@xpc1 jss]# mount -t nfs xserv1.ast.cam.ac.uk:/soft3 /mnt/nfs
mount: RPC: Timed out
Comment 8 Jeremy Sanders 2002-08-12 04:46:17 EDT
As the trace suggests this looks like an ext3-nfs interaction bug. I remounted
the exported partition as ext2, restarted the nfs daemon, and it worked fine.
Remounting as ext3 provoked the bug again.
Comment 9 Arjan van de Ven 2002-08-13 05:31:51 EDT
Reproduced; Workaround found, now for the real bugfix
Comment 10 Jeremy Sanders 2002-08-15 10:09:12 EDT
Workaround or fixes in kernel-2.4.18-10.98 solve the problem here.

Note You need to log in before you can comment on or make changes to this bug.