1020331 – The output of command "gluster volume status all tasks --xml" and "gluster volume remove-brick <vol> <brick> status --xml" not in agreement

Bug 1020331 - The output of command "gluster volume status all tasks --xml" and "gluster volume remove-brick <vol> <brick> status --xml" not in agreement

Summary: The output of command "gluster volume status all tasks --xml" and "gluster vo...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	RHGS 2.1.2
Assignee:	Kaushal
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1027094
Blocks:	1015659 1020189 1020325 1021816 1022511
TreeView+	depends on / blocked

Reported:	2013-10-17 13:05 UTC by Shubhendu Tripathi
Modified:	2015-05-13 16:34 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.4.0.42.1u2rhs-1
Doc Type:	Bug Fix
Doc Text:	Previously, the remove-brick status displayed by the volume status command was inconsistent on different peers, whereas the remove-brick status command displayed a consistent output. With this fix, the status displayed by the volume status command and the remove-brick status command is consistent across the cluster.
Clone Of:
Clones:	1027094 (view as bug list)
Environment:
Last Closed:	2014-02-25 07:54:34 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2014:0208	0	normal	SHIPPED_LIVE	Red Hat Storage 2.1 enhancement and bug fix update #2	2014-02-25 12:20:30 UTC

Description Shubhendu Tripathi 2013-10-17 13:05:02 UTC

Description of problem:
VDSM has a verb to get the list tasks on a volume. Internally it -

1. Uses the command "gluster volume remove-brick <vol> <brick> status --xml" to get the status of remove brick action on a brick.
2. uses the command "gluster volume status all tasks --xml" to get the overall status of the tasks.

If there are two hosts in a cluster and a remove brick action is performed while getting the overall status of the tasks, the output of the command "gluster volume status all tasks --xml" is different on the two hosts -

Version-Release number of selected component (if applicable):


How reproducible:
Almost always

Steps to Reproduce:
1. Make sure two hosts are present in peer group
2. Create a distributed volume with 2 bricks (brick dirs from server-1 only)
3. Populate the volume with data
4. Start remove brick for one of the bricks on the volume
5. Individually run the command "gluster volume status all tasks --xml"

Actual results:
The status values returned on the hosts differ

Expected results:
Both the hosts should return the same status value

Additional info:

Comment 2 Dusmant 2013-10-18 05:02:48 UTC

This bug is blocking the RHSC remove brick feature, which is giving in-consistent information because of this issue. We need a fix for this ASAP.

Thanks,
-Dusmant

Comment 3 Ramesh N 2013-10-21 11:39:15 UTC

Same scenario is applicable for volume rebalance task.

Comment 5 SATHEESARAN 2013-12-23 07:11:00 UTC

Verified with glusterfs-3.4.0.51rhs.el6rhs

Now the "remove-brick" and "rebalance status" information obtained using "gluster volume status all --xml", is uniform across all RHSS Nodes in the "Trusted Storage Pool"

Performed the following steps to verify this bug,
1. Created a trusted storage pool of 4 RHSS Nodes
(i.e) gluster peer probe <RHSS-NODE-IP>

2. Created a distribute-replicate volume of 6 bricks ( 3X2 )
(i.e) gluster volume create <vol-name> replica 2 <brick1>..<brick8>

3. Start the volume
(i.e) gluster volume start <vol-name>

4. Fuse mount the volume 
(i.e) mount.glusterfs <RHSS-NODE>:<vol-name> <mount-point>

5. Created some files on the mount point
(i.e) for i in {1..200}; do dd if=/dev/urandom of=<mount-point>/file$i bs=4k count=1000;done

6. Add pair of bricks to the volume
(i.e) gluster volume add-brick <vol-name> <brick1> brick2>

7. Start rebalance on the volume
(i.e) gluster volume rebalance <vol-name> start

8. Get the status of all volumes using --xml
(i.e) gluster volume status all --xml

9. Get the status on all RHSS Nodes. (i.e) repeat step 9 on all RHSS Nodes

Observation -  Rebalance status was consistent across all the nodes

11. Now, remove a pair of bricks from the volume
(i.e) gluster volume remove-brick <vol-name> <brick1> <brick2> start

12. Repeat step 8, and step 9

Observation : remove brick status was seen consistent across all the RHSS Nodes

Comment 6 Pavithra 2014-01-03 11:04:10 UTC

Kaushal, I've made minor changes. Please verify.

Comment 7 Kaushal 2014-01-16 11:02:29 UTC

Doc text looks fine.

Comment 9 errata-xmlrpc 2014-02-25 07:54:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html

Note You need to log in before you can comment on or make changes to this bug.