Bug 1020331

Summary: The output of command "gluster volume status all tasks --xml" and "gluster volume remove-brick <vol> <brick> status --xml" not in agreement
Product: Red Hat Gluster Storage Reporter: Shubhendu Tripathi <shtripat>
Component: glusterfsAssignee: Kaushal <kaushal>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: urgent    
Version: 2.1CC: dpati, grajaiya, kaushal, psriniva, rnachimu, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 2.1.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.42.1u2rhs-1 Doc Type: Bug Fix
Doc Text:
Previously, the remove-brick status displayed by the volume status command was inconsistent on different peers, whereas the remove-brick status command displayed a consistent output. With this fix, the status displayed by the volume status command and the remove-brick status command is consistent across the cluster.
Story Points: ---
Clone Of:
: 1027094 (view as bug list) Environment:
Last Closed: 2014-02-25 07:54:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1027094    
Bug Blocks: 1015659, 1020189, 1020325, 1021816, 1022511    

Description Shubhendu Tripathi 2013-10-17 13:05:02 UTC
Description of problem:
VDSM has a verb to get the list tasks on a volume. Internally it -

1. Uses the command "gluster volume remove-brick <vol> <brick> status --xml" to get the status of remove brick action on a brick.
2. uses the command "gluster volume status all tasks --xml" to get the overall status of the tasks.

If there are two hosts in a cluster and a remove brick action is performed while getting the overall status of the tasks, the output of the command "gluster volume status all tasks --xml" is different on the two hosts -

Version-Release number of selected component (if applicable):


How reproducible:
Almost always

Steps to Reproduce:
1. Make sure two hosts are present in peer group
2. Create a distributed volume with 2 bricks (brick dirs from server-1 only)
3. Populate the volume with data
4. Start remove brick for one of the bricks on the volume
5. Individually run the command "gluster volume status all tasks --xml"

Actual results:
The status values returned on the hosts differ

Expected results:
Both the hosts should return the same status value

Additional info:

Comment 2 Dusmant 2013-10-18 05:02:48 UTC
This bug is blocking the RHSC remove brick feature, which is giving in-consistent information because of this issue. We need a fix for this ASAP.

Thanks,
-Dusmant

Comment 3 Ramesh N 2013-10-21 11:39:15 UTC
Same scenario is applicable for volume rebalance task.

Comment 5 SATHEESARAN 2013-12-23 07:11:00 UTC
Verified with glusterfs-3.4.0.51rhs.el6rhs

Now the "remove-brick" and "rebalance status" information obtained using "gluster volume status all --xml", is uniform across all RHSS Nodes in the "Trusted Storage Pool"

Performed the following steps to verify this bug,
1. Created a trusted storage pool of 4 RHSS Nodes
(i.e) gluster peer probe <RHSS-NODE-IP>

2. Created a distribute-replicate volume of 6 bricks ( 3X2 )
(i.e) gluster volume create <vol-name> replica 2 <brick1>..<brick8>

3. Start the volume
(i.e) gluster volume start <vol-name>

4. Fuse mount the volume 
(i.e) mount.glusterfs <RHSS-NODE>:<vol-name> <mount-point>

5. Created some files on the mount point
(i.e) for i in {1..200}; do dd if=/dev/urandom of=<mount-point>/file$i bs=4k count=1000;done

6. Add pair of bricks to the volume
(i.e) gluster volume add-brick <vol-name> <brick1> brick2>

7. Start rebalance on the volume
(i.e) gluster volume rebalance <vol-name> start

8. Get the status of all volumes using --xml
(i.e) gluster volume status all --xml

9. Get the status on all RHSS Nodes. (i.e) repeat step 9 on all RHSS Nodes

Observation -  Rebalance status was consistent across all the nodes

11. Now, remove a pair of bricks from the volume
(i.e) gluster volume remove-brick <vol-name> <brick1> <brick2> start

12. Repeat step 8, and step 9

Observation : remove brick status was seen consistent across all the RHSS Nodes

Comment 6 Pavithra 2014-01-03 11:04:10 UTC
Kaushal, I've made minor changes. Please verify.

Comment 7 Kaushal 2014-01-16 11:02:29 UTC
Doc text looks fine.

Comment 9 errata-xmlrpc 2014-02-25 07:54:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html