Bug 1001577

Summary:	quota build 3: volume start failed
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Saurabh <saujain>
Component:	glusterd	Assignee:	Kaushal <kaushal>
Status:	CLOSED DUPLICATE	QA Contact:	Saurabh <saujain>
Severity:	high	Docs Contact:
Priority:	medium
Version:	2.1	CC:	kaushal, kparthas, mzywusko, psriniva, rhs-bugs, vagarwal, vbellur
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 2.1.2
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.4.0.49rhs	Doc Type:	Bug Fix
Doc Text:	Previously the glusterd would listen on port 24007 for CLI requests, there was a possibility for glusterd to reject CLI requests from unprivileged ports (>1024) leading to CLI command execution failures. With this fix, glusterd listens to CLI requests through a socket file (in /var/run/glusterd.sock) preventing CLI command execution failures.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-01-03 07:31:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Saurabh 2013-08-27 10:35:26 UTC

Description of problem:
volume having quota enabled 
volume stopped while having quota enabled
volume start fails

Version-Release number of selected component (if applicable):
glusterfs-fuse-3.4.0.20rhsquota5-1.el6rhs.x86_64
glusterfs-libs-3.4.0.20rhsquota5-1.el6rhs.x86_64
glusterfs-api-3.4.0.20rhsquota5-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.20rhsquota5-1.el6rhs.x86_64
glusterfs-server-3.4.0.20rhsquota5-1.el6rhs.x86_64
glusterfs-3.4.0.20rhsquota5-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.20rhsquota5-1.el6rhs.x86_64


How reproducible:
always

Steps to Reproduce:
1. create a volume of 6x2, start it
2. enable quota,
3. mount over nfs
4. create some direcotries.
5. set quota on the directories and root of the volume
6. stop the volume
7. start the volume 

Now, after this if I do these steps,
a. quota list --- this is a pass
b. gluster volume info <volume name> --- this shows <volume name> volume is started
c. gluster volume status  <volume name>
   shows this response string,
   "Staging failed on 10.70.37.7. Error: Volume <volume-name> is not started"

Actual results:
result of step 6. and  7.
-------------------------
[root@rhsauto034 ~]# gluster volume stop --mode=script dist-rep3
volume stop: dist-rep3: success
[root@rhsauto034 ~]# gluster volume info dist-rep3

Volume Name: dist-rep3
Type: Distributed-Replicate
Volume ID: b305d605-3b96-4278-9005-e8249e4bb7f7
Status: Stopped
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d1r1-3
Brick2: rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d1r2-3
Brick3: rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d2r1-3
Brick4: rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d2r2-3
Brick5: rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d3r1-3
Brick6: rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d3r2-3
Brick7: rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d4r1-3
Brick8: rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d4r2-3
Brick9: rhsauto032.lab.eng.blr.redhat.com:/rhs/bricks/d5r1-3
Brick10: rhsauto033.lab.eng.blr.redhat.com:/rhs/bricks/d5r2-3
Brick11: rhsauto034.lab.eng.blr.redhat.com:/rhs/bricks/d6r1-3
Brick12: rhsauto035.lab.eng.blr.redhat.com:/rhs/bricks/d6r2-3
Options Reconfigured:
features.quota: on
[root@rhsauto034 ~]# ps -eaf | grep quotad
root       525     1  0 06:04 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/lib/glusterd/quotad/run/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/56f694ad321d4c09fd535f813a2aa43a.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root       552 15055  0 06:04 pts/2    00:00:00 grep quotad
[root@rhsauto034 ~]# gluster volume start dist-rep3
volume start: dist-rep3: failed: Commit failed on 10.70.37.7. Please check log file for details.




Expected results:
start should not fail

Additional info:

quotad process is running between stop and start because there was one more volume having quota enabled.

Comment 3 Kaushal 2013-09-30 11:47:11 UTC

The problem here is not with quota. rhsauto032 ran out of privileged ports (which may or may not have been due to quota, most likely because of running gluster commands). The brick process on rhsauto032 connected to glusterd to fetch the brick volfile using an insecure port (>1024). Currently glusterd (and gluster as a whole) rejects incoming requests from insecure ports. Since the brick process couldn't get its volfile, if failed to start and this lead to the inconsistent state observed in the bug report.

The current workaround for this issue is to set the option, 'management.rpc-auth-allow-insecure on' in /etc/glusterfs/glusterd.vol and restart glusterd. Setting this option allows request from insecure ports.

There have been patches posted upstream for the following downstream bugs which track the unprivileged ports issue.
1. https://bugzilla.redhat.com/show_bug.cgi?id=979926 -> upstream bug https://bugzilla.redhat.com/show_bug.cgi?id=980746
2. https://bugzilla.redhat.com/show_bug.cgi?id=979861 -> upstream bug https://bugzilla.redhat.com/show_bug.cgi?id=980754

The upstream patches haven't been accepted yet because of some regression failures. Once those patches are accepted, they can be backported downstream to the u1 branch.

Comment 4 Vivek Agarwal 2013-10-18 14:24:56 UTC

Want to let the patches soak in for U2. removing from u1 list

Comment 5 Kaushal 2013-12-13 07:29:37 UTC

The fix for 979861 also addresses this. Moving to ON_QA.

Comment 6 Saurabh 2013-12-19 05:08:29 UTC

didn't the same problem again, tried out several times on glusterfs-3.4.0.49rhs

Comment 7 Pavithra 2014-01-03 06:06:15 UTC

Can you please verify this doc text for technical accuracy?

Comment 8 Kaushal 2014-01-03 07:31:32 UTC

Closing this bug as dup of 979861, as this bug is just a specific incarnation of it.

*** This bug has been marked as a duplicate of bug 979861 ***