Bug 1414758

Summary: quota: limit-usage command failed with error " Failed to start aux mount"
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Anil Shah <ashah>
Component: quotaAssignee: Sanoj Unnikrishnan <sunnikri>
Status: CLOSED ERRATA QA Contact: Anil Shah <ashah>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, asrivast, rhinduja, rhs-bugs, storage-qa-internal, sunnikri
Target Milestone: ---   
Target Release: RHGS 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-25 Doc Type: Bug Fix
Doc Text:
Quota list and limit commands now create and destroy aux mounts each time they are run to reduce the risk of encountering stale mount points. Additionally, a separate mount point is now used for list and limit commands in order to avoid issues when these commands are run in parallel.
Story Points: ---
Clone Of:
: 1433906 (view as bug list) Environment:
Last Closed: 2017-09-21 04:30:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1433906    
Bug Blocks: 1417147    

Description Anil Shah 2017-01-19 11:36:52 UTC
Description of problem:

While running di-staf automation tests, After enable quota on volume, limit-usage command failed with error "Failed to start aux mount"

Version-Release number of selected component (if applicable):

glusterfs-3.8.4-12.el7rhgs.x86_64

How reproducible:

Intermittent 

Steps to Reproduce:
1. Create 6*2 distribute replicate volume
2. start the volume
3. Enable quota 
4. Set limit-usage 

Actual results:

limit-usage command failed with error " Failed to start aux mount  "

Expected results:

Limit usage command should not failed

logs from glusterd
====================================
.so.0(runner_log+0x115) [0x7f3fc890e8d5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=testvol0 --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
[2017-01-19 07:42:41.402803] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped
[2017-01-19 07:42:41.402923] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: quotad service is stopped
[2017-01-19 07:42:41.402952] I [MSGID: 106567] [glusterd-svc-mgmt.c:196:glusterd_svc_start] 0-management: Starting quotad service
[2017-01-19 07:43:34.720648] E [MSGID: 106176] [glusterd-quota.c:1929:glusterd_create_quota_auxiliary_mount] 0-management: Failed to mount glusterfs client. Please check the log file /var/log/glusterfs/quota-mount-testvol0.log for more details [File exists]
[2017-01-19 07:43:34.720703] E [MSGID: 106528] [glusterd-quota.c:2107:glusterd_op_stage_quota] 0-management: Failed to start aux mount
[2017-01-19 07:43:34.720715] E [MSGID: 106301] [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of operation 'Volume Quota' failed on localhost : Failed to start aux mount
[2017-01-19 07:43:52.293410] W [socket.c:590:__socket_rwv] 0-management: readv on 10.70.36.4:24007 failed (No data available)

====================================================

[2017-01-19 07:43:34.719329] I [MSGID: 100030] [glusterfsd.c:2412:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.4 (args: /usr/sbin/glusterfs --volfile-server localhost --volfile-id testvol0 -l /var/log/glusterfs/quota-mount-testvol0.log -p /var/run/gluster/testvol0.pid --client-pid -5 /var/run/gluster/testvol0/)
[2017-01-19 07:43:34.719758] E [fuse-bridge.c:5518:init] 0-fuse: Mountpoint /var/run/gluster/testvol0/ seems to have a stale mount, run 'umount /var/run/gluster/testvol0/' and try again.
[2017-01-19 07:43:34.719775] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-fuse: Initialization of volume 'fuse' failed, review your volfile again


Additional info:

Comment 3 Sanoj Unnikrishnan 2017-01-19 14:17:32 UTC
From the logs, The aux mount location /var/run/gluster/testvol0/ has not been cleanly unmounted from the previous run.

We can also infer that no process was mounted on aux-mount path.

The aux mount is created on the first limit/remove_limit/list command on the volume and it remains until volume is stopped / deleted / quota is disabled on the volume (where we do a lazy unmount).

A lazy unmount would have instantaneously removed the path to the mount point, since the path still exists we can rule out a lazy unmount on the path.

Hence it looks like the process (aux)mounted was uncleanly terminated and hence we did not do a lazy unmount. 

Need to see if the cleanup script in di-staf can be doing this. 


create volume, start, mount,  enable quota 

[root@localhost mnt]# mount | grep gluster
localhost:v1 on /run/gluster/v1 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
10.70.1.217:v1 on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

[root@localhost mnt]# kill -9 9317

>> notice the stale mout on /run/gluster/v1
[root@localhost mnt]# mount | grep gluster
localhost:v1 on /run/gluster/v1 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
10.70.1.217:v1 on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

>> 
[root@localhost mnt]# ls /run/gluster/v1
ls: cannot access '/run/gluster/v1': Transport endpoint is not connected

Comment 4 Sanoj Unnikrishnan 2017-01-23 09:09:20 UTC
While the scenario leading to the stale mount hasn't been RCA'd, One plausible approach to avoid the issue would be to have all commands(limit/remove_limit/list) umount the aux path before it finishes. 

The reason why this was not done in the first place is to avoid mount on subsequent commands, this is a tiny performance improvement we could do away with.

Another risk with keeping the aux mount around too long is that if the user inadvertently did an 'rm' over the /var/run. It could delete all the persistent filesystem data.

while clearing /var/run is not expected. It shouldn't have such side effect (being a temporary directory).

Comment 7 Atin Mukherjee 2017-04-21 03:59:15 UTC
upstream patch : https://review.gluster.org/16938

Comment 8 Atin Mukherjee 2017-05-08 11:20:41 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/105515/

Comment 10 Anil Shah 2017-06-15 07:34:34 UTC
Observer aux mount is not created at path /var/run/gluster/

Explicitly mounted other volume on aux mount path , quota list command 
 gives the correct output.

Bug verified on build glusterfs-3.8.4-27.el7rhgs.x86_64

Comment 16 errata-xmlrpc 2017-09-21 04:30:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 17 errata-xmlrpc 2017-09-21 04:56:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774