70561 – [NFS] Client and server hang with NFS accesses

Bug 70561 - [NFS] Client and server hang with NFS accesses

Summary: [NFS] Client and server hang with NFS accesses

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Raw Hide
Classification:	Retired
Component:	kernel
Sub Component:
Version:	1.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-08-02 15:01 UTC by Jeremy Sanders
Modified:	2015-01-04 22:01 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-10-30 03:32:47 UTC
Embargoed:

Attachments	(Terms of Use)
most of traces on system (15.12 KB, text/plain) 2002-08-09 10:44 UTC, Jeremy Sanders	no flags	Details
tcpdump -vv -s0 on traffic when trying to mount export (3.13 KB, text/plain) 2002-08-09 10:51 UTC, Jeremy Sanders	no flags	Details
View All

Description Jeremy Sanders 2002-08-02 15:01:04 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1b) Gecko/20020722

Description of problem:
Using the RedHat 7.3 2.4.18-5 kernel on a i686 client, and mounting from a
server running RawHide 2.4.18-7.80 on a i686, nfs accesses lock up the nfs
daemon on the server and the process accessing the nfs mount on the client. The
nfs server processes get stuck in a "D" state and so does the client process. 

The server cannot be shut down properly as the nfsd processes get stuck and
can't be killed by the shutdown scripts. Therefore bug is marked as "severe".

Using 2.4.18-5 on the server works fine, as does running standard 2.4.19-rc2
(Linus kernel).

We also tried 2.4.18-7.86 on the server, but this failed to install (depmod
dependency problems).

Both the machines use eepro100 cards on 100Mps ethernet.
Mount options are:

rw,rsize=4096,wsize=4096,hard,intr
(also rsize,wsize=8192 tried)

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Run 2.4.18-7.80 on a machine exporting nfs mounts
2. Mount export on client running 2.4.18-5
3. ls /mnt/nfs
	

Actual Results:  Client process hangs. Server processes hang.

Expected Results:  Client should be able to access nfs export.

Additional info:

Comment 1 Jeremy Sanders 2002-08-08 14:12:41 UTC

This is also broken with RedHat 2.4.18-7.93 (updated modutils to install this).

Comment 2 Jeremy Sanders 2002-08-08 15:07:23 UTC

Sorry for more email, but this is also broken with 2.4.18-7.94, which has most
of the nfs patchset in.

Comment 3 Ben LaHaise 2002-08-08 18:16:23 UTC

Could you obtain a backtrace of the stuck processes via Ctrl-Scroll Lock on the
console?  That will give us an idea where the processes are getting stuck to
help track down the problem.

Comment 4 Jeremy Sanders 2002-08-09 10:43:17 UTC

Okay, I'll attach the trace I managed to grab from dmesg after pressing
ctrl+scroll lock. Basically the stuck nfsd processes are like:

nfsd          D F68108E0  5976  1141      1          1140  1142 (L-TLB)
Call Trace: [<c0107f7a>] __down [kernel] 0x6a (0xf6afbde4))
[<c01080d4>] __down_failed [kernel] 0x8 (0xf6afbe08))
[<f8824aa0>] ext3_readdir [ext3] 0x0 (0xf6afbe10))
[<c014fdce>] .text.lock.readdir [kernel] 0x5 (0xf6afbe18))
[<f89997e3>] nfsd_readdir [nfsd] 0xc3 (0xf6afbe38))
[<f89a1070>] nfs3svc_encode_entry_plus [nfsd] 0x0 (0xf6afbe40))
[<f8837ca0>] ext3_dir_operations [ext3] 0x0 (0xf6afbe80))
[<f899ee6e>] nfsd3_proc_readdirplus [nfsd] 0xde (0xf6afbef0))
[<f89a1070>] nfs3svc_encode_entry_plus [nfsd] 0x0 (0xf6afbf04))
[<f89a61c4>] nfsd_procedures3 [nfsd] 0x264 (0xf6afbf24))
[<f89935c0>] nfsd_dispatch [nfsd] 0xd0 (0xf6afbf30))
[<f89a5898>] nfsd_version3 [nfsd] 0x0 (0xf6afbf44))
[<f89754cc>] svc_process_R6eda96b1 [sunrpc] 0x43c (0xf6afbf50))
[<f89a61c4>] nfsd_procedures3 [nfsd] 0x264 (0xf6afbf78))
[<f89a58b8>] nfsd_program [nfsd] 0x0 (0xf6afbf7c))
[<f89933b0>] nfsd [nfsd] 0x1d0 (0xf6afbf98))
[<c010765e>] kernel_thread [kernel] 0x2e (0xf6afbff0))
[<f89931e0>] nfsd [nfsd] 0x0 (0xf6afbff8))

Comment 5 Jeremy Sanders 2002-08-09 10:44:32 UTC

Created attachment 69705 [details]
most of traces on system

Comment 6 Jeremy Sanders 2002-08-09 10:51:46 UTC

Created attachment 69706 [details]
tcpdump -vv -s0 on traffic when trying to mount export

Comment 7 Jeremy Sanders 2002-08-09 10:54:09 UTC

The above tcpdump output is from trying to mount the export on a 2.4.18-5 system
(server running rawhide), using tcpdump -vv -s0. Mount command fails this time with:

[root@xpc1 jss]# mount -t nfs xserv1.ast.cam.ac.uk:/soft3 /mnt/nfs
mount: RPC: Timed out

Comment 8 Jeremy Sanders 2002-08-12 08:46:17 UTC

As the trace suggests this looks like an ext3-nfs interaction bug. I remounted
the exported partition as ext2, restarted the nfs daemon, and it worked fine.
Remounting as ext3 provoked the bug again.

Comment 9 Arjan van de Ven 2002-08-13 09:31:51 UTC

Reproduced; Workaround found, now for the real bugfix

Comment 10 Jeremy Sanders 2002-08-15 14:09:12 UTC

Workaround or fixes in kernel-2.4.18-10.98 solve the problem here.

Note You need to log in before you can comment on or make changes to this bug.