Bug 1166862 - rmtab file is a bottleneck when lot of clients are accessing a volume through NFS
Summary: rmtab file is a bottleneck when lot of clients are accessing a volume through...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: 3.5.2
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Niels de Vos
QA Contact:
URL:
Whiteboard:
Depends On: 1169317
Blocks: glusterfs-3.5.5
TreeView+ depends on / blocked
 
Reported: 2014-11-21 18:53 UTC by Cyril Peponnet
Modified: 2016-06-16 12:40 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1169317 1169320 (view as bug list)
Environment:
Last Closed: 2016-06-16 12:40:24 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Load average (113.56 KB, image/png)
2014-11-21 18:53 UTC, Cyril Peponnet
no flags Details
Disk usage (118.04 KB, image/png)
2014-11-21 18:53 UTC, Cyril Peponnet
no flags Details

Description Cyril Peponnet 2014-11-21 18:53:21 UTC
Created attachment 959917 [details]
Load average

Description of problem:

This feature: http://review.gluster.org/#/c/4430/

Create bottleneck when several clients are accessing a nfs volume.

On our setup:

Gluster 3.5.2 under centos7.

Hardware:

	dual Xeon® CPU E5-2640
	64GB RAM
	SSD for rootfs
	10Gb NIC

Context:

	Around 700 nfs clients for small file or vm images.


Version-Release number of selected component (if applicable):

3.5.2

How reproducible:

Always as long as you have enough NFS clients

Steps to Reproduce:
1. Create a volume accessible through gluster nfs
2. Make it accessible for 700 clients
3. See how it hangs

Actual results:

NFS client got intermittent hang (every minutes / for 10s each time). Even an “rpcinfo -t server nfs 3" will hang.

Gluster nfs process literally eat the CPU of the server

Expected results:

No hanging

Additional info:

The cause:

the rmtab file located in /var/lib/glusterd/nfs/ is flushed from memory to  /var/lib/glusterd/nfs/rmtab.tmp. During this time, NFS server hang literraly.

Workaroud:

Move the file to memory for faster I/O using this option:

set nfs.mount-rmtab: /dev/shm/glusterfs.rmtab

Result:

We still have some hang but for ~300ms now, the load average of the server is WAY better.

Personal thought:

This feature is not usable and should be disabled by default.

You can find attached load average and Disk usage before and after using SHM for rmtab.

Comment 1 Cyril Peponnet 2014-11-21 18:53:50 UTC
Created attachment 959918 [details]
Disk usage

Comment 2 Anand Avati 2014-12-01 10:29:51 UTC
REVIEW: http://review.gluster.org/9223 (nfs: make it possible to disable nfs.mount-rmtab) posted (#1) for review on master by Niels de Vos (ndevos)

Comment 3 Niels de Vos 2014-12-01 10:34:37 UTC
Note that the proposed change in comment #2 is actually for the master branch, not for release-3.5. When it has been merged in the master branch, we can backport the fix to release-3.5 and release-3.6.

Comment 4 Anand Avati 2014-12-01 13:37:50 UTC
REVIEW: http://review.gluster.org/9223 (nfs: make it possible to disable nfs.mount-rmtab) posted (#3) for review on master by Niels de Vos (ndevos)

Comment 5 Anand Avati 2015-04-28 09:56:53 UTC
REVIEW: http://review.gluster.org/10419 (nfs: fix spurious failure in bug-1166862.t) posted (#1) for review on master by Niels de Vos (ndevos)

Comment 6 Anand Avati 2015-04-28 18:10:34 UTC
REVIEW: http://review.gluster.org/10419 (nfs: fix spurious failure in bug-1166862.t) posted (#2) for review on master by Niels de Vos (ndevos)

Comment 7 Anand Avati 2015-05-01 04:12:38 UTC
COMMIT: http://review.gluster.org/10419 committed in master by Vijay Bellur (vbellur) 
------
commit ee9b35a780607daddc2832b9af5ed6bf414aebc0
Author: Niels de Vos <ndevos>
Date:   Tue Apr 28 11:53:33 2015 +0200

    nfs: fix spurious failure in bug-1166862.t
    
    In some environments, "showmount" could return an NFS-client that does
    not start with "1". This would cause the test-case to fail. The check is
    incorrect, the number of lines should get counted instead.
    
    Also moving the test-case to the .../nfs/... subdirectory.
    
    BUG: 1166862
    Change-Id: Ic03aa8145ca57d78aea01564466e924b03bb302a
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: http://review.gluster.org/10419
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 8 Anand Avati 2015-06-21 10:29:14 UTC
REVIEW: http://review.gluster.org/11336 (nfs: make it possible to disable nfs.mount-rmtab) posted (#1) for review on release-3.5 by Niels de Vos (ndevos)

Comment 9 Anand Avati 2015-06-21 13:10:27 UTC
REVIEW: http://review.gluster.org/11336 (nfs: make it possible to disable nfs.mount-rmtab) posted (#2) for review on release-3.5 by Niels de Vos (ndevos)

Comment 10 Anand Avati 2015-06-28 19:59:35 UTC
REVIEW: http://review.gluster.org/11336 (nfs: make it possible to disable nfs.mount-rmtab) posted (#3) for review on release-3.5 by Niels de Vos (ndevos)

Comment 11 Anand Avati 2015-07-05 21:46:40 UTC
REVIEW: http://review.gluster.org/11336 (nfs: make it possible to disable nfs.mount-rmtab) posted (#4) for review on release-3.5 by Niels de Vos (ndevos)

Comment 12 Anand Avati 2015-07-07 16:12:04 UTC
COMMIT: http://review.gluster.org/11336 committed in release-3.5 by Niels de Vos (ndevos) 
------
commit 3cf776c49bc60b7f616a4c503a8b10b2d19ad04b
Author: Niels de Vos <ndevos>
Date:   Sun Jun 21 15:07:58 2015 +0200

    nfs: make it possible to disable nfs.mount-rmtab
    
    When there are many NFS-clients doing very often mount/unmount actions,
    the updating of the 'rmtab' can become a bottleneck and cause delays. In
    these situations, the output of 'showmount' may be less important than
    the responsiveness of the (un)mounting.
    
    By setting 'nfs.mount-rmtab' to the value "/-", the cache file is not
    updated anymore, and the entries are only kept in memory.
    
    Cherry picked from commit 40407afb529f6e5fa2f79e9778c2f527122d75eb:
    > Cherry picked from commit 331ef6e1a86bfc0a93f8a9dec6ad35c417873849:
    >> BUG: 1169317
    >> Change-Id: I40c4d8d754932f86fb2b1b2588843390464c773d
    >> Reported-by: Cyril Peponnet <cyril>
    >> Signed-off-by: Niels de Vos <ndevos>
    >> Reviewed-on: http://review.gluster.org/9223
    >> Tested-by: Gluster Build System <jenkins.com>
    >> Reviewed-by: soumya k <skoduri>
    >> Reviewed-by: jiffin tony Thottan <jthottan>
    >> Reviewed-by: Kaleb KEITHLEY <kkeithle>
    >
    > This change also contains the fixes to the test-case from:
    >>
    >> nfs: fix spurious failure in bug-1166862.t
    >>
    >> In some environments, "showmount" could return an NFS-client that does
    >> not start with "1". This would cause the test-case to fail. The check is
    >> incorrect, the number of lines should get counted instead.
    >>
    >> Also moving the test-case to the .../nfs/... subdirectory.
    >>
    >> Cherry picked from commit ee9b35a780607daddc2832b9af5ed6bf414aebc0:
    >> BUG: 1166862
    >> Change-Id: Ic03aa8145ca57d78aea01564466e924b03bb302a
    >> Signed-off-by: Niels de Vos <ndevos>
    >> Reviewed-on: http://review.gluster.org/10419
    >> Tested-by: Gluster Build System <jenkins.com>
    >> Reviewed-by: Vijay Bellur <vbellur>
    >>
    >
    > Change-Id: I40c4d8d754932f86fb2b1b2588843390464c773d
    > BUG: 1215385
    > Signed-off-by: Niels de Vos <ndevos>
    > Reviewed-on: http://review.gluster.org/10379
    > Tested-by: NetBSD Build System
    > Tested-by: Gluster Build System <jenkins.com>
    > Reviewed-by: Vijay Bellur <vbellur>
    
    GLUSTERD_WORKDIR has been added to tests/include.rc and is not
    configurable through ./configure like on newer branches. It is not
    suitable to change the GlusterD working directory in an update for a
    stable release.
    
    Change-Id: I40c4d8d754932f86fb2b1b2588843390464c773d
    BUG: 1166862
    Signed-off-by: Niels de Vos <ndevos>
    Reviewed-on: http://review.gluster.org/11336
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: jiffin tony Thottan <jthottan>
    Reviewed-by: Kaleb KEITHLEY <kkeithle>

Comment 13 Niels de Vos 2015-07-19 19:30:40 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.5, please reopen this bug report.

glusterfs-3.5.5 has been announced on the Gluster Packaging mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://article.gmane.org/gmane.comp.file-systems.gluster.devel/11986
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 14 Niels de Vos 2016-06-16 12:40:24 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.