From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; ja-JP; rv:1.7.8) Gecko/20050517 Firefox/1.0.4 (Debian package 1.0.4-2) Description of problem: NFS clients(I tried with Solaris 5.8 and Redhat EL3 U4) hang during performance tests with iozone. Iozone does not finish. NFS server is running on RedhatEL3 U5. It had been nice with an EL3 U4. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.mount nfs with rsize/wsize 32KB or 16KB 2.run iozone benchmark 3.wait a few hours Actual Results: Iozone does not finish. Command 'df' hangs and never returns. vmstat says the server cpu is 100% idle. EL3U4 client's cpu was 100% "wait". Tcpdump shows no traffic between the client and the server. "rpcinfo -p" from the nfs client and from other machines sometime shows nfs protocols(mountd, nfsd...), sometime not(only portmaper, ypbind, etc). After I restart the portmapper by '/etc/init.d/portmap restart' on the nfs server, it answers correctly. The nfs server can be mounted from other machines, and you can see the files. Expected Results: iozone finishes usually. Additional info: iozone command is something like: /opt/iozone/bin/iozone -a -i0 -i1 -i2 -s 16g -r 64k -f fileA or /opt/iozone/bin/iozone -i0 -i1 -i2 -s 2g -r 64k -t 8 -F file1 file2 ... file8 nfs mount options are: -o hard,intr,rw,vers=3,rsize=$RWSIZE,wsize=$RWSIZE,timeo=11,retrans=5 In both case below(32KB and 16KB), it hangs easily, say, in a few hours. RWSIZE=32768 RWSIZE=16384 With default 8192, I am not very sure. I once hit a hang, but not easy to reproduce. I had changed NFSSVC_MAXBLKSIZE(include/linux/nfsd/const.h) to 32KB. The NFS server is on a Redhat EL3 U5 machine: It's a single cpu machine and it runs smp kernel. I have not tried with up kernel, though I could try. The number of nfs daemons is set to 168.
Could you please post a SysRq-t system backtrace of the server when this hang occurs
Created attachment 119244 [details] sysrq-t output Processes, name starting with CL, are clustering software. xlan is something like the bonding driver. I am not sure whether they are involved in the nfs hang, though they worked nicely with RedHatEL3U4. It looks like some processes are sleeping in __alloc_pages? I wonder how they are called from schedule().
How much memory is on this server? By increasing the max block size to 32k and increasing the number of nfsd, you can easily run the machine out of memory since each nfsd will allocate 32K for a place for incoming messages... Please post the output of a SysRq-M, which will show the state of memory.
It has 2gigbytes. total used free shared buffers cached Mem: 2055444 486108 1569336 0 146644 94768 -/+ buffers/cache: 244696 1810748 Swap: 2040244 0 2040244 ------ Mem-info: Zone:DMA freepages: 2892 min: 0 low: 0 high: 0 Zone:Normal freepages: 1922 min: 1278 low: 4543 high: 6303 Zone:HighMem freepages: 1236 min: 255 low: 4606 high: 6909 Free pages: 6050 ( 1236 HighMem) ( Active: 23097/354529, inactive_laundry: 97873, inactive_clean: 8512, free: 6050 ) aa:0 ac:0 id:0 il:0 ic:0 fr:2892 aa:0 ac:14036 id:136504 il:36767 ic:4190 fr:1922 aa:3742 ac:5319 id:218025 il:61106 ic:4322 fr:1236 2*4kB 3*8kB 5*16kB 4*32kB 3*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 2*4096kB = 11568kB) 46*4kB 84*8kB 75*16kB 14*32kB 3*64kB 3*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 7688kB) 32*4kB 26*8kB 16*16kB 24*32kB 26*64kB 7*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 4944kB) Swap cache: add 0, delete 0, find 0/0, race 0+0 20448 pages of slabcache 482 pages of kernel stacks 0 lowmem pagetables, 428 highmem pagetables 32 bounce buffer pages, 32 are on the emergency list Free swap: 2040244kB 524272 pages of RAM 294896 pages of HIGHMEM 10411 reserved pages 384100 pages shared 0 pages swap cached
I think the problem happened because of the switching hub. With a new hub, NFS works nicely. Thank you, Steve. I appreciate your help. I will use TCP rather than UDP with 32KB buffer, anyway.
Hello, Yoshihiro. Should this bug report be closed as NOTABUG?
Hi, Ernie. I am going to post the status change with this message.