Description of problem: VDSM has a verb to get the list tasks on a volume. Internally it - 1. Uses the command "gluster volume remove-brick <vol> <brick> status --xml" to get the status of remove brick action on a brick. 2. uses the command "gluster volume status all tasks --xml" to get the overall status of the tasks. If there are two hosts in a cluster and a remove brick action is performed while getting the overall status of the tasks, the output of the command "gluster volume status all tasks --xml" is different on the two hosts - Version-Release number of selected component (if applicable): How reproducible: Almost always Steps to Reproduce: 1. Make sure two hosts are present in peer group 2. Create a distributed volume with 2 bricks (brick dirs from server-1 only) 3. Populate the volume with data 4. Start remove brick for one of the bricks on the volume 5. Individually run the command "gluster volume status all tasks --xml" Actual results: The status values returned on the hosts differ Expected results: Both the hosts should return the same status value Additional info:
This bug is blocking the RHSC remove brick feature, which is giving in-consistent information because of this issue. We need a fix for this ASAP. Thanks, -Dusmant
Same scenario is applicable for volume rebalance task.
Verified with glusterfs-3.4.0.51rhs.el6rhs Now the "remove-brick" and "rebalance status" information obtained using "gluster volume status all --xml", is uniform across all RHSS Nodes in the "Trusted Storage Pool" Performed the following steps to verify this bug, 1. Created a trusted storage pool of 4 RHSS Nodes (i.e) gluster peer probe <RHSS-NODE-IP> 2. Created a distribute-replicate volume of 6 bricks ( 3X2 ) (i.e) gluster volume create <vol-name> replica 2 <brick1>..<brick8> 3. Start the volume (i.e) gluster volume start <vol-name> 4. Fuse mount the volume (i.e) mount.glusterfs <RHSS-NODE>:<vol-name> <mount-point> 5. Created some files on the mount point (i.e) for i in {1..200}; do dd if=/dev/urandom of=<mount-point>/file$i bs=4k count=1000;done 6. Add pair of bricks to the volume (i.e) gluster volume add-brick <vol-name> <brick1> brick2> 7. Start rebalance on the volume (i.e) gluster volume rebalance <vol-name> start 8. Get the status of all volumes using --xml (i.e) gluster volume status all --xml 9. Get the status on all RHSS Nodes. (i.e) repeat step 9 on all RHSS Nodes Observation - Rebalance status was consistent across all the nodes 11. Now, remove a pair of bricks from the volume (i.e) gluster volume remove-brick <vol-name> <brick1> <brick2> start 12. Repeat step 8, and step 9 Observation : remove brick status was seen consistent across all the RHSS Nodes
Kaushal, I've made minor changes. Please verify.
Doc text looks fine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html