Bug 1020331

Summary:	The output of command "gluster volume status all tasks --xml" and "gluster volume remove-brick <vol> <brick> status --xml" not in agreement
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Shubhendu Tripathi <shtripat>
Component:	glusterfs	Assignee:	Kaushal <kaushal>
Status:	CLOSED ERRATA	QA Contact:	SATHEESARAN <sasundar>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	2.1	CC:	dpati, grajaiya, kaushal, psriniva, rnachimu, vbellur
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 2.1.2
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.4.0.42.1u2rhs-1	Doc Type:	Bug Fix
Doc Text:	Previously, the remove-brick status displayed by the volume status command was inconsistent on different peers, whereas the remove-brick status command displayed a consistent output. With this fix, the status displayed by the volume status command and the remove-brick status command is consistent across the cluster.	Story Points:	---
Clone Of:
Clones:	1027094 (view as bug list)		Environment:
Last Closed:	2014-02-25 07:54:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1027094
Bug Blocks:	1015659, 1020189, 1020325, 1021816, 1022511

Description Shubhendu Tripathi 2013-10-17 13:05:02 UTC

Description of problem:
VDSM has a verb to get the list tasks on a volume. Internally it -

1. Uses the command "gluster volume remove-brick <vol> <brick> status --xml" to get the status of remove brick action on a brick.
2. uses the command "gluster volume status all tasks --xml" to get the overall status of the tasks.

If there are two hosts in a cluster and a remove brick action is performed while getting the overall status of the tasks, the output of the command "gluster volume status all tasks --xml" is different on the two hosts -

Version-Release number of selected component (if applicable):


How reproducible:
Almost always

Steps to Reproduce:
1. Make sure two hosts are present in peer group
2. Create a distributed volume with 2 bricks (brick dirs from server-1 only)
3. Populate the volume with data
4. Start remove brick for one of the bricks on the volume
5. Individually run the command "gluster volume status all tasks --xml"

Actual results:
The status values returned on the hosts differ

Expected results:
Both the hosts should return the same status value

Additional info:

Comment 2 Dusmant 2013-10-18 05:02:48 UTC

This bug is blocking the RHSC remove brick feature, which is giving in-consistent information because of this issue. We need a fix for this ASAP.

Thanks,
-Dusmant

Comment 3 Ramesh N 2013-10-21 11:39:15 UTC

Same scenario is applicable for volume rebalance task.

Comment 5 SATHEESARAN 2013-12-23 07:11:00 UTC

Verified with glusterfs-3.4.0.51rhs.el6rhs

Now the "remove-brick" and "rebalance status" information obtained using "gluster volume status all --xml", is uniform across all RHSS Nodes in the "Trusted Storage Pool"

Performed the following steps to verify this bug,
1. Created a trusted storage pool of 4 RHSS Nodes
(i.e) gluster peer probe <RHSS-NODE-IP>

2. Created a distribute-replicate volume of 6 bricks ( 3X2 )
(i.e) gluster volume create <vol-name> replica 2 <brick1>..<brick8>

3. Start the volume
(i.e) gluster volume start <vol-name>

4. Fuse mount the volume 
(i.e) mount.glusterfs <RHSS-NODE>:<vol-name> <mount-point>

5. Created some files on the mount point
(i.e) for i in {1..200}; do dd if=/dev/urandom of=<mount-point>/file$i bs=4k count=1000;done

6. Add pair of bricks to the volume
(i.e) gluster volume add-brick <vol-name> <brick1> brick2>

7. Start rebalance on the volume
(i.e) gluster volume rebalance <vol-name> start

8. Get the status of all volumes using --xml
(i.e) gluster volume status all --xml

9. Get the status on all RHSS Nodes. (i.e) repeat step 9 on all RHSS Nodes

Observation -  Rebalance status was consistent across all the nodes

11. Now, remove a pair of bricks from the volume
(i.e) gluster volume remove-brick <vol-name> <brick1> <brick2> start

12. Repeat step 8, and step 9

Observation : remove brick status was seen consistent across all the RHSS Nodes

Comment 6 Pavithra 2014-01-03 11:04:10 UTC

Kaushal, I've made minor changes. Please verify.

Comment 7 Kaushal 2014-01-16 11:02:29 UTC

Doc text looks fine.

Comment 9 errata-xmlrpc 2014-02-25 07:54:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html