Red Hat Bugzilla – Bug 168897
NFS clients hang with a RHEL3 U5 nfs server
Last modified: 2007-11-30 17:07:08 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; ja-JP; rv:1.7.8) Gecko/20050517 Firefox/1.0.4 (Debian package 1.0.4-2)
Description of problem:
NFS clients(I tried with Solaris 5.8 and Redhat EL3 U4) hang
during performance tests with iozone. Iozone does not finish.
NFS server is running on RedhatEL3 U5.
It had been nice with an EL3 U4.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.mount nfs with rsize/wsize 32KB or 16KB
2.run iozone benchmark
3.wait a few hours
Actual Results: Iozone does not finish.
Command 'df' hangs and never returns.
vmstat says the server cpu is 100% idle.
EL3U4 client's cpu was 100% "wait".
Tcpdump shows no traffic between the client and the server.
"rpcinfo -p" from the nfs client and from other machines sometime shows
nfs protocols(mountd, nfsd...), sometime not(only portmaper, ypbind, etc).
After I restart the portmapper by '/etc/init.d/portmap restart' on the nfs
server, it answers correctly.
The nfs server can be mounted from other machines, and you can see the files.
Expected Results: iozone finishes usually.
iozone command is something like:
/opt/iozone/bin/iozone -a -i0 -i1 -i2 -s 16g -r 64k -f fileA
/opt/iozone/bin/iozone -i0 -i1 -i2 -s 2g -r 64k -t 8 -F file1 file2 ... file8
nfs mount options are:
In both case below(32KB and 16KB), it hangs easily, say, in a few hours.
With default 8192, I am not very sure. I once hit a hang, but not easy to
I had changed NFSSVC_MAXBLKSIZE(include/linux/nfsd/const.h) to 32KB.
The NFS server is on a Redhat EL3 U5 machine:
It's a single cpu machine and it runs smp kernel. I have not tried with
up kernel, though I could try.
The number of nfs daemons is set to 168.
Could you please post a SysRq-t system backtrace
of the server when this hang occurs
Created attachment 119244 [details]
Processes, name starting with CL, are clustering software.
xlan is something like the bonding driver. I am not sure
whether they are involved in the nfs hang, though they worked
nicely with RedHatEL3U4.
It looks like some processes are sleeping in __alloc_pages?
I wonder how they are called from schedule().
How much memory is on this server? By increasing the max block size to 32k
and increasing the number of nfsd, you can easily run the machine out of
memory since each nfsd will allocate 32K for a place for incoming messages...
Please post the output of a SysRq-M, which will show the state of memory.
It has 2gigbytes.
total used free shared buffers cached
Mem: 2055444 486108 1569336 0 146644 94768
-/+ buffers/cache: 244696 1810748
Swap: 2040244 0 2040244
Zone:DMA freepages: 2892 min: 0 low: 0 high: 0
Zone:Normal freepages: 1922 min: 1278 low: 4543 high: 6303
Zone:HighMem freepages: 1236 min: 255 low: 4606 high: 6909
Free pages: 6050 ( 1236 HighMem)
( Active: 23097/354529, inactive_laundry: 97873, inactive_clean: 8512, free: 6050 )
aa:0 ac:0 id:0 il:0 ic:0 fr:2892
aa:0 ac:14036 id:136504 il:36767 ic:4190 fr:1922
aa:3742 ac:5319 id:218025 il:61106 ic:4322 fr:1236
2*4kB 3*8kB 5*16kB 4*32kB 3*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB
2*4096kB = 11568kB)
46*4kB 84*8kB 75*16kB 14*32kB 3*64kB 3*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB
1*4096kB = 7688kB)
32*4kB 26*8kB 16*16kB 24*32kB 26*64kB 7*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB
0*4096kB = 4944kB)
Swap cache: add 0, delete 0, find 0/0, race 0+0
20448 pages of slabcache
482 pages of kernel stacks
0 lowmem pagetables, 428 highmem pagetables
32 bounce buffer pages, 32 are on the emergency list
Free swap: 2040244kB
524272 pages of RAM
294896 pages of HIGHMEM
10411 reserved pages
384100 pages shared
0 pages swap cached
I think the problem happened because of the switching hub.
With a new hub, NFS works nicely.
Thank you, Steve. I appreciate your help.
I will use TCP rather than UDP with 32KB buffer, anyway.
Hello, Yoshihiro. Should this bug report be closed as NOTABUG?
Hi, Ernie. I am going to post the status change with this message.