I conducted iozone tests between 192.168.1.73 (running the gluster NFS server) and 192.168.1.93 (running CentOS). There were some interesting findings: Once in a while, the CPU usage on the server went close to a 100% with the systime close to 60% and the user time ~40%. bash-3.00# mpstat 2 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 11 0 0 153 98 163 22 0 1 0 7482 1 88 0 11 0 16 0 0 1386 959 1405 663 0 7 0 341906 35 59 0 5 0 0 0 0 1273 840 1285 615 0 6 0 355732 36 59 0 5 0 0 0 0 1273 842 1341 640 0 6 0 347742 35 58 0 7 0 1 0 0 1479 974 1330 284 0 3 0 5021 2 9 0 89 0 0 0 0 1304 790 988 60 0 7 0 4116 2 7 0 91 A huge number of syscalls were being made during that time. The vast majority of the syscalls recorded above are write calls. And these happens to be writes to the socket - write libc.so.1`_write+0xa server.so.0.0.0`nfs_rpcsvc_socket_write+0x48 server.so.0.0.0`__nfs_rpcsvc_conn_data_poll_out+0x11b server.so.0.0.0`nfs_rpcsvc_conn_data_poll_out+0x36 server.so.0.0.0`nfs_rpcsvc_conn_data_handler+0x54 libglusterfs.so.0.0.0`event_dispatch_poll_handler+0x1c9 libglusterfs.so.0.0.0`event_dispatch_poll+0x11c libglusterfs.so.0.0.0`event_dispatch+0x82 server.so.0.0.0`nfs_rpcsvc_stage_proc+0x32 libc.so.1`_thr_setup+0x5b libc.so.1`_lwp_start 666772 which means it is a read/re-read on the client that is causing the huge bunch of socket writes on the server. But, Netstat at server: bash-3.00# netstat -i -I e1000g0 2 input e1000g output input (Total) output packets errs packets errs colls packets errs packets errs colls 16049592 0 12655957 0 0 16049746 0 12656111 0 0 9023 0 17338 0 0 9023 0 17338 0 0 9057 0 17241 0 0 9057 0 17241 0 0 9064 0 17322 0 0 9064 0 17322 0 0 9093 0 17258 0 0 9093 0 17258 0 0 9016 0 17347 0 0 9016 0 17347 0 0 9013 0 17305 0 0 9013 0 17305 0 0 9066 0 17248 0 0 9066 0 17248 0 0 9038 0 17320 0 0 9038 0 17320 0 0 9061 0 17247 0 0 9061 0 17247 0 0 8192 0 15702 0 0 8192 0 15702 0 0 9025 0 17112 0 0 9025 0 17112 0 0 The number of packets are much much lower compared to the writes. Why? Because most of these writes are failing: bash-3.00# dtrace -n 'syscall::write:return/execname=="glusterfsd"/{@[arg1, ustack()] = count();}' dtrace: description 'syscall::write:return' matched 1 probe ... ... -1 libc.so.1`_write+0xa server.so.0.0.0`nfs_rpcsvc_socket_write+0x48 server.so.0.0.0`__nfs_rpcsvc_conn_data_poll_out+0x11b server.so.0.0.0`nfs_rpcsvc_conn_data_poll_out+0x36 server.so.0.0.0`nfs_rpcsvc_conn_data_handler+0x54 libglusterfs.so.0.0.0`event_dispatch_poll_handler+0x1c9 libglusterfs.so.0.0.0`event_dispatch_poll+0x11c libglusterfs.so.0.0.0`event_dispatch+0x82 server.so.0.0.0`nfs_rpcsvc_stage_proc+0x32 libc.so.1`_thr_setup+0x5b libc.so.1`_lwp_start 1733319 Call failures seen on different write sizes - The first column is the size passed to the write syscall. The second column is the write call return value, and the third column is the count of number of write calls with a given size and the corresponding return value. bash-3.00# dtrace -n 'syscall::write:entry/execname=="glusterfsd"/{self->arg2 = arg2;}' -n 'syscall::write:return/self->arg2/{@[self->arg2, arg1] = count();}' dtrace: description 'syscall::write:entry' matched 1 probe dtrace: description 'syscall::write:return' matched 1 probe ^C 88 88 1 16384 16384 1 49152 49152 1 51056 51056 563 65536 14480 563 36576 36576 569 65536 28960 569 22096 22096 601 65536 43440 601 7616 7616 635 65536 57920 636 65536 65536 2293 104 104 4664 28 28 4665 28 -1 254745 36576 -1 1068406 51056 -1 1073457 22096 -1 1152251 7616 -1 1209377 This time - printing the ERRNO (Column 3 below): bash-3.00# dtrace -n 'syscall::write:entry/execname=="glusterfsd"/{self->arg2 = arg2;}' -n 'syscall::write:return/self->arg2/{@[self->arg2, arg1, errno] = count();}' dtrace: description 'syscall::write:entry' matched 1 probe dtrace: description 'syscall::write:return' matched 1 probe ^C 20192 20192 0 1 49152 28960 0 1 8 8 0 2 16 16 0 2 96 96 0 2 256 256 0 2 88 88 0 3 120 120 0 3 65536 43440 0 409 22096 22096 0 410 36576 36576 0 442 65536 28960 0 442 7616 7616 0 449 51056 51056 0 449 65536 14480 0 449 65536 57920 0 449 20192 -1 11 1771 65536 65536 0 1789 104 104 0 3539 28 28 0 3551 28 -1 11 158961 22096 -1 11 741846 36576 -1 11 814891 51056 -1 11 823120 7616 -1 11 845557 bash-3.00# 11 is EAGAIN. The following piece of (pseudo) code seems to have a problem: [xlators/nfs/lib/src/rpcsvc.c] __nfs_rpcsvc_conn_data_poll_out() { tx_remaining: err = nfs_rpcsvc_socket_write() if (err == -1) break; else goto tx_remaining; } nfs_rpcsvc_socket_write() returns 0 when write (called by the former) return EAGAIN. So, there is a never-ending syscall loop which consumes CPU whenever the system gets flow-controlled.
PATCH: http://patches.gluster.com/patch/6247 in master (Eliminate syscall tight loop when handling EAGAIN in NFS.)
PATCH: http://patches.gluster.com/patch/6333 in master (Make sure we are looking at the right errno in the fix for bug 764184.)
Fix to this bug exposed another bug in the NFS server. Refer 2491. Shehjar has included this fix in 2491. Once it gets integrated to the gate, we can close this bug as a duplicate of the former, or have both included in the description of the commit.
*** This bug has been marked as a duplicate of bug 2491 ***