Bug 764184 (GLUSTER-2452) - Excessive CPU usage /very low throughput while using NFS mounts
Summary: Excessive CPU usage /very low throughput while using NFS mounts
Keywords:
Status: CLOSED DUPLICATE of bug 764223
Alias: GLUSTER-2452
Product: GlusterFS
Classification: Community
Component: nfs
Version: mainline
Hardware: All
OS: All
medium
medium
Target Milestone: ---
Assignee: tcp
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-23 10:56 UTC by tcp
Modified: 2015-12-01 16:45 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: nfs
Documentation: DNR
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description tcp 2011-02-23 10:56:27 UTC
I conducted iozone tests between 192.168.1.73 (running the gluster NFS server) and 192.168.1.93 (running CentOS). There were some interesting findings:

Once in a while, the CPU usage on the server went close to a 100% with the systime close to 60% and the user time ~40%.

bash-3.00# mpstat 2
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   11   0    0   153   98  163   22    0    1    0  7482    1  88 0  11
  0   16   0    0  1386  959 1405  663    0    7    0 341906   35  59 0   5
  0    0   0    0  1273  840 1285  615    0    6    0 355732   36  59 0   5
  0    0   0    0  1273  842 1341  640    0    6    0 347742   35  58 0   7
  0    1   0    0  1479  974 1330  284    0    3    0  5021    2   9 0  89
  0    0   0    0  1304  790  988   60    0    7    0  4116    2   7 0  91

A huge number of syscalls were being made during that time. The vast majority of the syscalls recorded above are write calls.

And these happens to be writes to the socket -
  write
              libc.so.1`_write+0xa
              server.so.0.0.0`nfs_rpcsvc_socket_write+0x48
              server.so.0.0.0`__nfs_rpcsvc_conn_data_poll_out+0x11b
              server.so.0.0.0`nfs_rpcsvc_conn_data_poll_out+0x36
              server.so.0.0.0`nfs_rpcsvc_conn_data_handler+0x54
              libglusterfs.so.0.0.0`event_dispatch_poll_handler+0x1c9
              libglusterfs.so.0.0.0`event_dispatch_poll+0x11c
              libglusterfs.so.0.0.0`event_dispatch+0x82
              server.so.0.0.0`nfs_rpcsvc_stage_proc+0x32
              libc.so.1`_thr_setup+0x5b
              libc.so.1`_lwp_start
           666772

which means it is a read/re-read on the client that is causing the huge bunch of socket writes on the server. But,

Netstat at server:

bash-3.00# netstat -i -I e1000g0 2
    input   e1000g    output       input  (Total)    output
packets errs  packets errs  colls  packets errs  packets errs  colls
16049592 0     12655957 0     0      16049746 0     12656111 0     0
9023    0     17338   0     0      9023    0     17338   0     0
9057    0     17241   0     0      9057    0     17241   0     0
9064    0     17322   0     0      9064    0     17322   0     0
9093    0     17258   0     0      9093    0     17258   0     0
9016    0     17347   0     0      9016    0     17347   0     0
9013    0     17305   0     0      9013    0     17305   0     0
9066    0     17248   0     0      9066    0     17248   0     0
9038    0     17320   0     0      9038    0     17320   0     0
9061    0     17247   0     0      9061    0     17247   0     0
8192    0     15702   0     0      8192    0     15702   0     0
9025    0     17112   0     0      9025    0     17112   0     0

The number of packets are much much lower compared to the writes.
Why?

Because most of these writes are failing:

bash-3.00# dtrace -n 'syscall::write:return/execname=="glusterfsd"/{@[arg1, ustack()] = count();}'
dtrace: description 'syscall::write:return' matched 1 probe
...
...
               -1
              libc.so.1`_write+0xa
              server.so.0.0.0`nfs_rpcsvc_socket_write+0x48
              server.so.0.0.0`__nfs_rpcsvc_conn_data_poll_out+0x11b
              server.so.0.0.0`nfs_rpcsvc_conn_data_poll_out+0x36
              server.so.0.0.0`nfs_rpcsvc_conn_data_handler+0x54
              libglusterfs.so.0.0.0`event_dispatch_poll_handler+0x1c9
              libglusterfs.so.0.0.0`event_dispatch_poll+0x11c
              libglusterfs.so.0.0.0`event_dispatch+0x82
              server.so.0.0.0`nfs_rpcsvc_stage_proc+0x32
              libc.so.1`_thr_setup+0x5b
              libc.so.1`_lwp_start
          1733319


Call failures seen on different write sizes -

The first column is the size passed to the write syscall. The second column is the write call return value, and the third column is the count of number of write calls with a given size and the corresponding return value.

bash-3.00# dtrace -n 'syscall::write:entry/execname=="glusterfsd"/{self->arg2 = arg2;}' -n 'syscall::write:return/self->arg2/{@[self->arg2, arg1] = count();}'
dtrace: description 'syscall::write:entry' matched 1 probe
dtrace: description 'syscall::write:return' matched 1 probe
^C

               88               88                1
            16384            16384                1
            49152            49152                1
            51056            51056              563
            65536            14480              563
            36576            36576              569
            65536            28960              569
            22096            22096              601
            65536            43440              601
             7616             7616              635
            65536            57920              636
            65536            65536             2293
              104              104             4664
               28               28             4665
               28               -1           254745
            36576               -1          1068406
            51056               -1          1073457
            22096               -1          1152251
             7616               -1          1209377


This time - printing the ERRNO (Column 3 below):

bash-3.00# dtrace -n 'syscall::write:entry/execname=="glusterfsd"/{self->arg2 = arg2;}' -n 'syscall::write:return/self->arg2/{@[self->arg2, arg1, errno] = count();}'
dtrace: description 'syscall::write:entry' matched 1 probe
dtrace: description 'syscall::write:return' matched 1 probe
^C

            20192            20192        0                1
            49152            28960        0                1
                8                8        0                2
               16               16        0                2
               96               96        0                2
              256              256        0                2
               88               88        0                3
              120              120        0                3
            65536            43440        0              409
            22096            22096        0              410
            36576            36576        0              442
            65536            28960        0              442
             7616             7616        0              449
            51056            51056        0              449
            65536            14480        0              449
            65536            57920        0              449
            20192               -1       11             1771
            65536            65536        0             1789
              104              104        0             3539
               28               28        0             3551
               28               -1       11           158961
            22096               -1       11           741846
            36576               -1       11           814891
            51056               -1       11           823120
             7616               -1       11           845557
bash-3.00#

11 is EAGAIN.

The following piece of (pseudo) code seems to have a problem:
[xlators/nfs/lib/src/rpcsvc.c]

__nfs_rpcsvc_conn_data_poll_out()
{
tx_remaining:
        err = nfs_rpcsvc_socket_write()
        if (err == -1)
            break;
        else
            goto tx_remaining;
}

nfs_rpcsvc_socket_write() returns 0 when write (called by the former) return EAGAIN. So, there is a never-ending syscall loop which consumes CPU whenever the system gets flow-controlled.

Comment 1 Anand Avati 2011-03-03 23:26:48 UTC
PATCH: http://patches.gluster.com/patch/6247 in master (Eliminate syscall tight loop when handling EAGAIN in NFS.)

Comment 2 Anand Avati 2011-03-04 04:30:13 UTC
PATCH: http://patches.gluster.com/patch/6333 in master (Make sure we are looking at the right errno in the fix for bug 764184.)

Comment 3 tcp 2011-03-16 07:29:53 UTC
Fix to this bug exposed another bug in the NFS server. Refer 2491.
Shehjar has included this fix in 2491. Once it gets integrated to the gate, we can close this bug as a duplicate of the former, or have both included in the description of the commit.

Comment 4 Amar Tumballi 2011-03-16 08:33:50 UTC

*** This bug has been marked as a duplicate of bug 2491 ***


Note You need to log in before you can comment on or make changes to this bug.