Description of problem: While running di-staf automation tests, After enable quota on volume, limit-usage command failed with error "Failed to start aux mount" Version-Release number of selected component (if applicable): glusterfs-3.8.4-12.el7rhgs.x86_64 How reproducible: Intermittent Steps to Reproduce: 1. Create 6*2 distribute replicate volume 2. start the volume 3. Enable quota 4. Set limit-usage Actual results: limit-usage command failed with error " Failed to start aux mount " Expected results: Limit usage command should not failed logs from glusterd ==================================== .so.0(runner_log+0x115) [0x7f3fc890e8d5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=testvol0 --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd [2017-01-19 07:42:41.402803] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped [2017-01-19 07:42:41.402923] I [MSGID: 106568] [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: quotad service is stopped [2017-01-19 07:42:41.402952] I [MSGID: 106567] [glusterd-svc-mgmt.c:196:glusterd_svc_start] 0-management: Starting quotad service [2017-01-19 07:43:34.720648] E [MSGID: 106176] [glusterd-quota.c:1929:glusterd_create_quota_auxiliary_mount] 0-management: Failed to mount glusterfs client. Please check the log file /var/log/glusterfs/quota-mount-testvol0.log for more details [File exists] [2017-01-19 07:43:34.720703] E [MSGID: 106528] [glusterd-quota.c:2107:glusterd_op_stage_quota] 0-management: Failed to start aux mount [2017-01-19 07:43:34.720715] E [MSGID: 106301] [glusterd-syncop.c:1302:gd_stage_op_phase] 0-management: Staging of operation 'Volume Quota' failed on localhost : Failed to start aux mount [2017-01-19 07:43:52.293410] W [socket.c:590:__socket_rwv] 0-management: readv on 10.70.36.4:24007 failed (No data available) ==================================================== [2017-01-19 07:43:34.719329] I [MSGID: 100030] [glusterfsd.c:2412:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.4 (args: /usr/sbin/glusterfs --volfile-server localhost --volfile-id testvol0 -l /var/log/glusterfs/quota-mount-testvol0.log -p /var/run/gluster/testvol0.pid --client-pid -5 /var/run/gluster/testvol0/) [2017-01-19 07:43:34.719758] E [fuse-bridge.c:5518:init] 0-fuse: Mountpoint /var/run/gluster/testvol0/ seems to have a stale mount, run 'umount /var/run/gluster/testvol0/' and try again. [2017-01-19 07:43:34.719775] E [MSGID: 101019] [xlator.c:433:xlator_init] 0-fuse: Initialization of volume 'fuse' failed, review your volfile again Additional info:
From the logs, The aux mount location /var/run/gluster/testvol0/ has not been cleanly unmounted from the previous run. We can also infer that no process was mounted on aux-mount path. The aux mount is created on the first limit/remove_limit/list command on the volume and it remains until volume is stopped / deleted / quota is disabled on the volume (where we do a lazy unmount). A lazy unmount would have instantaneously removed the path to the mount point, since the path still exists we can rule out a lazy unmount on the path. Hence it looks like the process (aux)mounted was uncleanly terminated and hence we did not do a lazy unmount. Need to see if the cleanup script in di-staf can be doing this. create volume, start, mount, enable quota [root@localhost mnt]# mount | grep gluster localhost:v1 on /run/gluster/v1 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) 10.70.1.217:v1 on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) [root@localhost mnt]# kill -9 9317 >> notice the stale mout on /run/gluster/v1 [root@localhost mnt]# mount | grep gluster localhost:v1 on /run/gluster/v1 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) 10.70.1.217:v1 on /mnt type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) >> [root@localhost mnt]# ls /run/gluster/v1 ls: cannot access '/run/gluster/v1': Transport endpoint is not connected
While the scenario leading to the stale mount hasn't been RCA'd, One plausible approach to avoid the issue would be to have all commands(limit/remove_limit/list) umount the aux path before it finishes. The reason why this was not done in the first place is to avoid mount on subsequent commands, this is a tiny performance improvement we could do away with. Another risk with keeping the aux mount around too long is that if the user inadvertently did an 'rm' over the /var/run. It could delete all the persistent filesystem data. while clearing /var/run is not expected. It shouldn't have such side effect (being a temporary directory).
upstream patch : https://review.gluster.org/16938
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/105515/
Observer aux mount is not created at path /var/run/gluster/ Explicitly mounted other volume on aux mount path , quota list command gives the correct output. Bug verified on build glusterfs-3.8.4-27.el7rhgs.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774