Bug 1035698
Summary: | "rebalance status" not showing any/some of the online storage nodes when some of the nodes in cluster are offline | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | spandura | |
Component: | distribute | Assignee: | Nithya Balachandran <nbalacha> | |
Status: | CLOSED DEFERRED | QA Contact: | storage-qa-internal <storage-qa-internal> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 2.1 | CC: | grajaiya, kaushal, spalai, vbellur | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1286160 (view as bug list) | Environment: | ||
Last Closed: | 2015-11-27 12:11:07 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1286160 |
Description
spandura
2013-11-28 10:12:57 UTC
Once again able to recreate the issue with the following case on the build : glusterfs 3.4.0.57rhs built on Jan 13 2014 06:59:05 ============================================================== The following case is executed on AWS-RHS Instances. 1) Create 2 x 3 distribute-replicate volume (3 volumes: exporter, importer, ftp. ftp doesn't have any data. Only exporter and importer has the data) 2) Filled each brick with 320GB of data 3) Terminated an Instance (NODE2) 4) Replaced the terminated instance. 5) Started heal full. (Heal is successfully complete) 6) Bricks disks got almost full. Remaining was 20GB out of 840GB. 7) Added 3 nodes to the pool. 8) added 3 bricks to the volume. (exporter, importer) 9) start reblance on exporter , importer. 10) while rebalance is in progress, Terminate NODE5 and NODE9. root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:24:48] >gluster v rebalance importer status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 6540 35.3GB 43532 0 592 in progress 13061.00 ip-10-234-21-235.ec2.internal 0 0Bytes 157780 0 0 completed 10310.00 ip-10-2-34-53.ec2.internal 6514 35.4GB 61381 0 3092 in progress 13060.00 ip-10-114-195-155.ec2.internal 0 0Bytes 157781 0 0 completed 10310.00 ip-10-159-26-108.ec2.internal 0 0Bytes 157780 0 0 completed 10310.00 ip-10-194-111-63.ec2.internal 0 0Bytes 157781 0 0 completed 10310.00 domU-12-31-39-07-74-A5.compute-1.internal 0 0Bytes 157781 0 4506 completed 10680.00 ip-10-62-118-194.ec2.internal 0 0Bytes 157781 0 0 completed 10310.00 ip-10-182-195-170.ec2.internal 0 0Bytes 157781 0 0 completed 10309.00 volume rebalance: importer: success: root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:24:53] >gluster v rebalance exporter status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 4914 26.3GB 40222 0 295 in progress 13180.00 ip-10-234-21-235.ec2.internal 0 0Bytes 159575 0 0 completed 11259.00 ip-10-2-34-53.ec2.internal 5338 29.0GB 53529 0 2377 in progress 13180.00 ip-10-114-195-155.ec2.internal 0 0Bytes 158415 0 0 completed 11636.00 ip-10-159-26-108.ec2.internal 0 0Bytes 158406 0 0 completed 11259.00 ip-10-194-111-63.ec2.internal 0 0Bytes 158417 0 0 completed 11633.00 domU-12-31-39-07-74-A5.compute-1.internal 0 0Bytes 158421 0 1281 completed 11671.00 ip-10-62-118-194.ec2.internal 0 0Bytes 158416 0 0 completed 11427.00 ip-10-182-195-170.ec2.internal 0 0Bytes 158405 0 0 completed 11260.00 volume rebalance: exporter: success: root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:24:58] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:24:59] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:25:08] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:25:08] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:25:09] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:25:09] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:25:10] >df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda1 99G 1.8G 96G 2% / none 3.7G 0 3.7G 0% /dev/shm /dev/md0 840G 771G 69G 92% /rhs/bricks root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:25:11] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:25:12] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:27:52] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:27:52] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:27:52] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:27:53] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:47:00] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:47:00] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:47:03] >gluster v status Status of volume: exporter Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick domU-12-31-39-0A-99-B2.compute-1.internal:/rhs/br icks/exporter 49152 Y 19405 Brick ip-10-194-111-63.ec2.internal:/rhs/bricks/exporte r 49152 Y 6496 Brick ip-10-234-21-235.ec2.internal:/rhs/bricks/exporte r 49152 Y 20226 Brick ip-10-2-34-53.ec2.internal:/rhs/bricks/exporter 49152 Y 20910 Brick ip-10-159-26-108.ec2.internal:/rhs/bricks/exporte r 49152 Y 20196 Brick domU-12-31-39-07-74-A5.compute-1.internal:/rhs/br icks/exporter 49152 Y 6553 Brick ip-10-62-118-194.ec2.internal:/rhs/bricks/exporte r 49152 Y 6391 NFS Server on localhost 2049 Y 30972 Self-heal Daemon on localhost N/A Y 30985 NFS Server on ip-10-234-21-235.ec2.internal 2049 Y 30866 Self-heal Daemon on ip-10-234-21-235.ec2.internal N/A Y 30873 NFS Server on ip-10-2-34-53.ec2.internal 2049 Y 1260 Self-heal Daemon on ip-10-2-34-53.ec2.internal N/A Y 1267 NFS Server on ip-10-159-26-108.ec2.internal 2049 Y 3153 Self-heal Daemon on ip-10-159-26-108.ec2.internal N/A Y 3160 NFS Server on ip-10-194-111-63.ec2.internal 2049 Y 16623 Self-heal Daemon on ip-10-194-111-63.ec2.internal N/A Y 16630 NFS Server on ip-10-62-118-194.ec2.internal 2049 Y 6498 Self-heal Daemon on ip-10-62-118-194.ec2.internal N/A Y 6505 NFS Server on domU-12-31-39-07-74-A5.compute-1.internal 2049 Y 6658 Self-heal Daemon on domU-12-31-39-07-74-A5.compute-1.in ternal N/A Y 6665 Task Status of Volume exporter ------------------------------------------------------------------------------ Task : Rebalance ID : c04d1bda-2482-4930-a69c-d6e0c76a1660 Status : in progress Status of volume: ftp Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick domU-12-31-39-0A-99-B2.compute-1.internal:/rhs/br icks/ftp 49154 Y 19541 Brick ip-10-194-111-63.ec2.internal:/rhs/bricks/ftp 49154 Y 6659 Brick ip-10-234-21-235.ec2.internal:/rhs/bricks/ftp 49154 Y 20333 Brick ip-10-2-34-53.ec2.internal:/rhs/bricks/ftp 49154 Y 21023 Brick ip-10-159-26-108.ec2.internal:/rhs/bricks/ftp 49154 Y 20301 NFS Server on localhost 2049 Y 30972 Self-heal Daemon on localhost N/A Y 30985 NFS Server on ip-10-234-21-235.ec2.internal 2049 Y 30866 Self-heal Daemon on ip-10-234-21-235.ec2.internal N/A Y 30873 NFS Server on ip-10-159-26-108.ec2.internal 2049 Y 3153 Self-heal Daemon on ip-10-159-26-108.ec2.internal N/A Y 3160 NFS Server on ip-10-2-34-53.ec2.internal 2049 Y 1260 Self-heal Daemon on ip-10-2-34-53.ec2.internal N/A Y 1267 NFS Server on ip-10-194-111-63.ec2.internal 2049 Y 16623 Self-heal Daemon on ip-10-194-111-63.ec2.internal N/A Y 16630 NFS Server on domU-12-31-39-07-74-A5.compute-1.internal 2049 Y 6658 Self-heal Daemon on domU-12-31-39-07-74-A5.compute-1.in ternal N/A Y 6665 NFS Server on ip-10-62-118-194.ec2.internal 2049 Y 6498 Self-heal Daemon on ip-10-62-118-194.ec2.internal N/A Y 6505 Task Status of Volume ftp ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: importer Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick domU-12-31-39-0A-99-B2.compute-1.internal:/rhs/br icks/importer 49153 Y 19470 Brick ip-10-194-111-63.ec2.internal:/rhs/bricks/importe r 49153 Y 6615 Brick ip-10-234-21-235.ec2.internal:/rhs/bricks/importe r 49153 Y 20275 Brick ip-10-2-34-53.ec2.internal:/rhs/bricks/importer 49153 Y 20960 Brick ip-10-159-26-108.ec2.internal:/rhs/bricks/importe r 49153 Y 20245 Brick domU-12-31-39-07-74-A5.compute-1.internal:/rhs/br icks/importer 49153 Y 6646 Brick ip-10-62-118-194.ec2.internal:/rhs/bricks/importe r 49153 Y 6486 NFS Server on localhost 2049 Y 30972 Self-heal Daemon on localhost N/A Y 30985 NFS Server on ip-10-234-21-235.ec2.internal 2049 Y 30866 Self-heal Daemon on ip-10-234-21-235.ec2.internal N/A Y 30873 NFS Server on domU-12-31-39-07-74-A5.compute-1.internal 2049 Y 6658 Self-heal Daemon on domU-12-31-39-07-74-A5.compute-1.in ternal N/A Y 6665 NFS Server on ip-10-194-111-63.ec2.internal 2049 Y 16623 Self-heal Daemon on ip-10-194-111-63.ec2.internal N/A Y 16630 NFS Server on ip-10-159-26-108.ec2.internal 2049 Y 3153 Self-heal Daemon on ip-10-159-26-108.ec2.internal N/A Y 3160 NFS Server on ip-10-2-34-53.ec2.internal 2049 Y 1260 Self-heal Daemon on ip-10-2-34-53.ec2.internal N/A Y 1267 NFS Server on ip-10-62-118-194.ec2.internal 2049 Y 6498 Self-heal Daemon on ip-10-62-118-194.ec2.internal N/A Y 6505 Task Status of Volume importer ------------------------------------------------------------------------------ Task : Rebalance ID : 6b5030f9-fdc1-41de-9f05-ea9292a96955 Status : in progress root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:47:05] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:47:07] >gluster v rebalance exporter status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 5738 30.8GB 47148 0 295 in progress 14512.00 ip-10-234-21-235.ec2.internal 0 0Bytes 159575 0 0 completed 11259.00 ip-10-2-34-53.ec2.internal 6148 33.6GB 61869 0 2707 in progress 14512.00 volume rebalance: exporter: success: root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:47:10] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:47:11] > root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:47:12] >gluster v rebalance importer status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 7473 40.2GB 51338 0 754 in progress 14413.00 ip-10-234-21-235.ec2.internal 0 0Bytes 157780 0 0 completed 10310.00 ip-10-2-34-53.ec2.internal 7386 40.1GB 70773 0 3270 in progress 14412.00 volume rebalance: importer: success: root@domU-12-31-39-0A-99-B2 [Jan-21-2014- 7:47:25] > Following are the cases to recreate the issue: Case 1:- ======= 1) Create 2 x 3 distribute-replicate volume 2) Create files/dirs from mount point. 3) Added 3 more bricks to the volume. Start rebalance. Check the rebalance status 4) Poweroff a storage node. 5) Check the rebalance status Result:- =========== Incomplete status . Doesn't show some of the online nodes rebalance status. Case 2:- =========== 1) Create 2 x 3 distribute-replicate volume 2) Create files/dirs from mount point. 3) Added 3 more bricks to the volume. Start rebalance. Check the rebalance status 4) Poweroff a storage node. 5) Check the rebalance status 6) Replace the brick on the poweredoff node with new node. (commit force) 7) Check the rebalance status Actual Result: =============== Even though the Poweredoff storage node is not part of any volume rebalance status is incomplete. It doesn't show all the online nodes rebalance status. 8) peer detach the Poweredoff node and then execute "rebalance status". Result:- ========== show all the online nodes rebalance status. This happens due to changes done to show the rebalance status output in a consistent sequence. Earlier, the status information returned by different peers was indexed based on the order in which the peers responded. This would lead to inconsistent ordering of the rebalance status output. To get consistent ordering, a each peer is now given a fixed index based on its position in the peer list. With this change, it is now possible to have holes in the indexed status information, if a peer is down or is not reachable. But the cli output code hasn't been updated to account for these holes. The cli output code will abruptly stop whenever it hits a hole, even if there is further status information available. This wasn't a problem earlier as there wasn't a chance to get holes in the indices. (In reply to Kaushal from comment #4) > This happens due to changes done to show the rebalance status output in a > consistent sequence. > Can you please provide the RFE bug on why this change was introduced ? The change was introduced as a fix for bug 888390. Test Case: ================== 1. Create a dis-rep volume 2 x 2. Start the volume ( 4 storage nodes ) 2. Create fuse mount. Create files/dirs from mount point. 3. Add bricks to the volume. 4. Start rebalance. While rebalance is in progress, reboot node1 and node3. 5. check the rebalance status from the node which is part of the cluster but not part of the volume Outpout: ============ rebalance status before node reboot: ==================================== root@mia [Jul-04-2014-11:58:08] >gluster v rebalance rep status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- 10.70.36.35 587 11.4MB 2927 1 55 in progress 91.00 rhs-client12 0 0Bytes 6104 0 0 in progress 91.00 rhs-client14 0 0Bytes 6104 0 0 in progress 91.00 rhs-client13 628 13.1MB 2744 0 0 in progress 91.00 volume rebalance: rep: success: root@mia [Jul-04-2014-11:58:42] > root@mia [Jul-04-2014-11:59:34] > rebalance status when the nodes are down: ========================================= root@mia [Jul-04-2014-12:00:31] >gluster v rebalance rep status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- volume rebalance: rep: success: root@mia [Jul-04-2014-12:00:39] > Cloning this to 3.1. to be fixed in future release. |