Description of problem: =========================== Rebalance Status command showing incorrect status of node after reboot while rebalance is in progress Version-Release number of selected component (if applicable): =========================================================== 3.4.0.14rhs-1.el6rhs.x86_64.rpm How reproducible: ================== Always Steps to Reproduce: ==================== 1.Create a 2x2 distributed replicate Volume 2.Fuse mount the volume and create some files for i in {1..300} ; do dd if=/dev/urandom of=f"$i" bs=10M count=1; done 3.Add 2 bricks to the volume and start rebalance 4.Some failures are reported in rebalance status due to space issue , execute rebalance start force command and check status. gluster v rebalance dist_repl status Node Rebalanced-files size scanned failures status run time in secs ----- ---------------- ---- -------- -------- ------- ----------------- localhost 17 170.0MB 80 0 in progress 5.00 10.70.34.88 0 0Bytes 304 0 completed 1.00 10.70.34.86 0 0Bytes 304 0 completed 0.00 10.70.34.87 14 140.0MB 176 0 in progress 5.00 5. Reboot one of the nodes - 10.70.34.87 6. Check Rebalance Status gluster v rebalance dist_repl status Node Rebalanced-files size scanned failures status run time in secs ----- ---------------- ---- -------- -------- ------- ----------------- localhost 36 360.0MB 336 0 completed 5.00 10.70.34.88 0 0Bytes 304 0 completed 1.00 10.70.34.86 0 0Bytes 304 0 completed 0.00 volume rebalance: dist_repl: success: 7. After the node comes back online , check status again , Status is shown as 'not started' Node Rebalanced-files size scanned failures status run time in secs ----- ---------------- ---- -------- -------- ------- ----------------- localhost 36 360.0MB 336 0 completed 15.00 10.70.34.88 0 0Bytes 304 0 completed 0.00 10.70.34.86 0 0Bytes 304 0 completed 0.00 10.70.34.87 0 0Bytes 0 0 not started 0.00 volume rebalance: dist_repl: success: Actual results: ================= Rebalance Status should shows the status as 'not started' after the node comes back online . Expected results: ================= Rebalance Status should show the status as 'Completed' if rebalance process is completed or if rebalance process was in progress when the node went down , it should start rebalance process when the node comes back online . Additional info: ================= [root@boost tmp]# gluster v i dist_repl Volume Name: dist_repl Type: Distributed-Replicate Volume ID: 767665be-5dba-4a74-8b1e-251bb9d91f50 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: 10.70.34.85:/rhs/brick1/ab1 Brick2: 10.70.34.86:/rhs/brick1/ab2 Brick3: 10.70.34.87:/rhs/brick1/ab3 Brick4: 10.70.34.88:/rhs/brick1/ab4 Brick5: 10.70.34.86:/rhs/brick1/ab5 Brick6: 10.70.34.87:/rhs/brick1/ab6
sosreports : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/991025/