1449782 – quota: limit-usage command failed with error " Failed to start aux mount"

Bug 1449782 - quota: limit-usage command failed with error " Failed to start aux mount"

Summary: quota: limit-usage command failed with error " Failed to start aux mount"

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	quota
Sub Component:
Version:	3.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Sanoj Unnikrishnan
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	glusterfs-3.8.13
Blocks:
TreeView+	depends on / blocked

Reported:	2017-05-10 16:29 UTC by Sanoj Unnikrishnan
Modified:	2017-06-29 09:54 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.8.13
Clone Of:	1433906
Environment:
Last Closed:	2017-06-29 09:54:50 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sanoj Unnikrishnan 2017-05-10 16:29:16 UTC

+++ This bug was initially created as a clone of Bug #1433906 +++

Description of problem:

While running di-staf automation tests, After enable quota on volume, limit-usage command failed with error "Failed to start aux mount"

How reproducible:

Intermittent 

Steps to Reproduce:
1. Create 6*2 distribute replicate volume
2. start the volume
3. Enable quota 
4. Set limit-usage 

Actual results:

limit-usage command failed with error " Failed to start aux mount  "

Expected results:

Limit usage command should not failed

logs from glusterd
====================================
.so.0(runner_log+0x115) [0x7f3fc890e8d5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=testvol0 --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-01-19 07:42:41.402803] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped
[2017-01-19 07:42:41.402923] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: quotad service is stopped
[2017-01-19 07:42:41.402952] I [MSGID: 106567] [glusterd-svc-mgmt.c:196:glusterd_svc_start] 0-management: Starting quotad service
[2017-01-19 07:43:34.720648] E [MSGID: 106176] [glusterd-quota.c:1929:glusterd_create_quota_auxiliary_mount] 0-management: Failed to mount glusterfs client. Please check the log file /var/log/glusterfs/quota-mount-testvol0.log for more details [File exists]
[2017-01-19 07:43:34.720703] E [MSGID: 106528] [glusterd-quota.c:2107:glusterd_op_stage_quota] 0-management: Failed to start aux mount
[2017-01-19 07:43:34.720715] E [MSGID: 106301] [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of operation 'Volume Quota' failed on localhost : Failed to start aux mount
[2017-01-19 07:43:52.293410] W [socket.c:590:__socket_rwv] 0-management: readv on 10.70.36.4:24007 failed (No data available)

====================================================

[2017-01-19 07:43:34.719329] I [MSGID: 100030] [glusterfsd.c:2412:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.4 (args: /usr/sbin/glusterfs --volfile-server localhost --volfile-id testvol0 -l /var/log/glusterfs/quota-mount-testvol0.log -p /var/run/gluster/testvol0.pid --client-pid -5 /var/run/gluster/testvol0/)
[2017-01-19 07:43:34.719758] E [fuse-bridge.c:5518:init] 0-fuse: Mountpoint /var/run/gluster/testvol0/ seems to have a stale mount, run 'umount /var/run/gluster/testvol0/' and try again.
[2017-01-19 07:43:34.719775] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-fuse: Initialization of volume 'fuse' failed, review your volfile again



--- Additional comment from Sanoj Unnikrishnan on 2017-01-19 09:17:32 EST ---

From the logs, The aux mount location /var/run/gluster/testvol0/ has not been cleanly unmounted from the previous run.

We can also infer that no process was mounted on aux-mount path.

The aux mount is created on the first limit/remove_limit/list command on the volume and it remains until volume is stopped / deleted / quota is disabled on the volume (where we do a lazy unmount).

A lazy unmount would have instantaneously removed the path to the mount point, since the path still exists we can rule out a lazy unmount on the path.

Hence it looks like the process (aux)mounted was uncleanly terminated and hence we did not do a lazy unmount. 



create volume, start, mount,  enable quota 

[root@localhost mnt]# mount | grep gluster
localhost:v1 on /run/gluster/v1 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
10.70.1.217:v1 on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

[root@localhost mnt]# kill -9 9317

>> notice the stale mout on /run/gluster/v1
[root@localhost mnt]# mount | grep gluster
localhost:v1 on /run/gluster/v1 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
10.70.1.217:v1 on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

>> 
[root@localhost mnt]# ls /run/gluster/v1
ls: cannot access '/run/gluster/v1': Transport endpoint is not connected

--- Additional comment from Sanoj Unnikrishnan on 2017-01-23 04:09:20 EST ---

While the scenario leading to the stale mount hasn't been RCA'd, One plausible approach to avoid the issue would be to have all commands(limit/remove_limit/list) umount the aux path before it finishes. 

The reason why this was not done in the first place is to avoid mount on subsequent commands, this is a tiny performance improvement we could do away with.

Another risk with keeping the aux mount around too long is that if the user inadvertently did an 'rm' over the /var/run. It could delete all the persistent filesystem data.

while clearing /var/run is not expected. It shouldn't have such side effect (being a temporary directory).

Comment 1 Worker Ant 2017-05-10 16:32:53 UTC

REVIEW: https://review.gluster.org/17242 (Fixes quota aux mount failure) posted (#1) for review on release-3.8 by sanoj-unnikrishnan (sunnikri)

Comment 2 Worker Ant 2017-06-19 04:56:47 UTC

COMMIT: https://review.gluster.org/17242 committed in release-3.8 by jiffin tony Thottan (jthottan) 
------
commit 2dcb19813e7dbb2afd2f482ed9a3401371325b1d
Author: Sanoj Unnikrishnan <sunnikri>
Date:   Wed Mar 22 15:02:12 2017 +0530

    Fixes quota aux mount failure
    
    The aux mount is created on the first limit/remove_limit/list command
    and it remains until volume is stopped / deleted / (quota is disabled)
    , where we do a lazy unmount. If the process is uncleanly terminated,
    then the mount entry remains and we get (Transport disconnected) error
    on subsequent attempts to run quota list/limit-usage/remove commands.
    
    Second issue, There is also a risk of inadvertent rm -rf on the
    /var/run/gluster causing data loss for the user. Ideally, /var/run is
    a temp path for application use and should not cause any data loss to
    persistent storage.
    
    Solution:
    1) unmount the aux mount after each use.
    2) clean stale mount before mounting, if any.
    
    One caveat with doing mount/unmount on each command is that we cannot
    use same mount point for both list and limit commands.
    The reason for this is that list command needs mount to be accessible
    in cli after response from glusterd, So it could be unmounted by a
    limit command if executed in parallel (had we used same mount point)
    Hence we use separate mount points for list and limit commands.
    
    > Reviewed-on: https://review.gluster.org/16938
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > Smoke: Gluster Build System <jenkins.org>
    > Reviewed-by: Manikandan Selvaganesh <manikandancs333>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Raghavendra G <rgowdapp>
    > Reviewed-by: Atin Mukherjee <amukherj>
    > (cherry picked from commit 2ae4b4058691b324535d802f4e6d24cce89a10e5)
    
    Change-Id: I4f9e39da2ac2b65941399bffb6440db8a6ba59d0
    BUG: 1449782
    Signed-off-by: Sanoj Unnikrishnan <sunnikri>
    Reviewed-on: https://review.gluster.org/17242
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>

Comment 3 Niels de Vos 2017-06-29 09:54:50 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.13, please open a new bug report.

glusterfs-3.8.13 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2017-June/000075.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.