Description of problem: volume having quota enabled volume stopped while having quota enabled volume start fails Version-Release number of selected component (if applicable): glusterfs-fuse-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-libs-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-api-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-geo-replication-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-server-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-3.4.0.20rhsquota5-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.20rhsquota5-1.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1. create a volume of 6x2, start it 2. enable quota, 3. mount over nfs 4. create some direcotries. 5. set quota on the directories and root of the volume 6. stop the volume 7. start the volume Now, after this if I do these steps, a. quota list --- this is a pass b. gluster volume info <volume name> --- this shows <volume name> volume is started c. gluster volume status <volume name> shows this response string, "Staging failed on 10.70.37.7. Error: Volume <volume-name> is not started" Actual results: result of step 6. and 7. ------------------------- [root@rhsauto034 ~]# gluster volume stop --mode=script dist-rep3 volume stop: dist-rep3: success [root@rhsauto034 ~]# gluster volume info dist-rep3 Volume Name: dist-rep3 Type: Distributed-Replicate Volume ID: b305d605-3b96-4278-9005-e8249e4bb7f7 Status: Stopped Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d1r1-3 Brick2: rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d1r2-3 Brick3: rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d2r1-3 Brick4: rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d2r2-3 Brick5: rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d3r1-3 Brick6: rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d3r2-3 Brick7: rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d4r1-3 Brick8: rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d4r2-3 Brick9: rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d5r1-3 Brick10: rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d5r2-3 Brick11: rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d6r1-3 Brick12: rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d6r2-3 Options Reconfigured: features.quota: on [root@rhsauto034 ~]# ps -eaf | grep quotad root 525 1 0 06:04 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/lib/glusterd/quotad/run/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/56f694ad321d4c09fd535f813a2aa43a.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off root 552 15055 0 06:04 pts/2 00:00:00 grep quotad [root@rhsauto034 ~]# gluster volume start dist-rep3 volume start: dist-rep3: failed: Commit failed on 10.70.37.7. Please check log file for details. Expected results: start should not fail Additional info: quotad process is running between stop and start because there was one more volume having quota enabled.
The problem here is not with quota. rhsauto032 ran out of privileged ports (which may or may not have been due to quota, most likely because of running gluster commands). The brick process on rhsauto032 connected to glusterd to fetch the brick volfile using an insecure port (>1024). Currently glusterd (and gluster as a whole) rejects incoming requests from insecure ports. Since the brick process couldn't get its volfile, if failed to start and this lead to the inconsistent state observed in the bug report. The current workaround for this issue is to set the option, 'management.rpc-auth-allow-insecure on' in /etc/glusterfs/glusterd.vol and restart glusterd. Setting this option allows request from insecure ports. There have been patches posted upstream for the following downstream bugs which track the unprivileged ports issue. 1. https://bugzilla.redhat.com/show_bug.cgi?id=979926 -> upstream bug https://bugzilla.redhat.com/show_bug.cgi?id=980746 2. https://bugzilla.redhat.com/show_bug.cgi?id=979861 -> upstream bug https://bugzilla.redhat.com/show_bug.cgi?id=980754 The upstream patches haven't been accepted yet because of some regression failures. Once those patches are accepted, they can be backported downstream to the u1 branch.
Want to let the patches soak in for U2. removing from u1 list
The fix for 979861 also addresses this. Moving to ON_QA.
didn't the same problem again, tried out several times on glusterfs-3.4.0.49rhs
Can you please verify this doc text for technical accuracy?
Closing this bug as dup of 979861, as this bug is just a specific incarnation of it. *** This bug has been marked as a duplicate of bug 979861 ***