Red Hat Bugzilla – Bug 70561
[NFS] Client and server hang with NFS accesses
Last modified: 2015-01-04 17:01:52 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1b) Gecko/20020722
Description of problem:
Using the RedHat 7.3 2.4.18-5 kernel on a i686 client, and mounting from a
server running RawHide 2.4.18-7.80 on a i686, nfs accesses lock up the nfs
daemon on the server and the process accessing the nfs mount on the client. The
nfs server processes get stuck in a "D" state and so does the client process.
The server cannot be shut down properly as the nfsd processes get stuck and
can't be killed by the shutdown scripts. Therefore bug is marked as "severe".
Using 2.4.18-5 on the server works fine, as does running standard 2.4.19-rc2
We also tried 2.4.18-7.86 on the server, but this failed to install (depmod
Both the machines use eepro100 cards on 100Mps ethernet.
Mount options are:
(also rsize,wsize=8192 tried)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Run 2.4.18-7.80 on a machine exporting nfs mounts
2. Mount export on client running 2.4.18-5
3. ls /mnt/nfs
Actual Results: Client process hangs. Server processes hang.
Expected Results: Client should be able to access nfs export.
This is also broken with RedHat 2.4.18-7.93 (updated modutils to install this).
Sorry for more email, but this is also broken with 2.4.18-7.94, which has most
of the nfs patchset in.
Could you obtain a backtrace of the stuck processes via Ctrl-Scroll Lock on the
console? That will give us an idea where the processes are getting stuck to
help track down the problem.
Okay, I'll attach the trace I managed to grab from dmesg after pressing
ctrl+scroll lock. Basically the stuck nfsd processes are like:
nfsd D F68108E0 5976 1141 1 1140 1142 (L-TLB)
Call Trace: [<c0107f7a>] __down [kernel] 0x6a (0xf6afbde4))
[<c01080d4>] __down_failed [kernel] 0x8 (0xf6afbe08))
[<f8824aa0>] ext3_readdir [ext3] 0x0 (0xf6afbe10))
[<c014fdce>] .text.lock.readdir [kernel] 0x5 (0xf6afbe18))
[<f89997e3>] nfsd_readdir [nfsd] 0xc3 (0xf6afbe38))
[<f89a1070>] nfs3svc_encode_entry_plus [nfsd] 0x0 (0xf6afbe40))
[<f8837ca0>] ext3_dir_operations [ext3] 0x0 (0xf6afbe80))
[<f899ee6e>] nfsd3_proc_readdirplus [nfsd] 0xde (0xf6afbef0))
[<f89a1070>] nfs3svc_encode_entry_plus [nfsd] 0x0 (0xf6afbf04))
[<f89a61c4>] nfsd_procedures3 [nfsd] 0x264 (0xf6afbf24))
[<f89935c0>] nfsd_dispatch [nfsd] 0xd0 (0xf6afbf30))
[<f89a5898>] nfsd_version3 [nfsd] 0x0 (0xf6afbf44))
[<f89754cc>] svc_process_R6eda96b1 [sunrpc] 0x43c (0xf6afbf50))
[<f89a61c4>] nfsd_procedures3 [nfsd] 0x264 (0xf6afbf78))
[<f89a58b8>] nfsd_program [nfsd] 0x0 (0xf6afbf7c))
[<f89933b0>] nfsd [nfsd] 0x1d0 (0xf6afbf98))
[<c010765e>] kernel_thread [kernel] 0x2e (0xf6afbff0))
[<f89931e0>] nfsd [nfsd] 0x0 (0xf6afbff8))
Created attachment 69705 [details]
most of traces on system
Created attachment 69706 [details]
tcpdump -vv -s0 on traffic when trying to mount export
The above tcpdump output is from trying to mount the export on a 2.4.18-5 system
(server running rawhide), using tcpdump -vv -s0. Mount command fails this time with:
[root@xpc1 jss]# mount -t nfs xserv1.ast.cam.ac.uk:/soft3 /mnt/nfs
mount: RPC: Timed out
As the trace suggests this looks like an ext3-nfs interaction bug. I remounted
the exported partition as ext2, restarted the nfs daemon, and it worked fine.
Remounting as ext3 provoked the bug again.
Reproduced; Workaround found, now for the real bugfix
Workaround or fixes in kernel-2.4.18-10.98 solve the problem here.