Description of problem: ============================ On AWS, we had 4 x 3 distribute-replicate volume. (12 nodes and 1 brick per node). For these nodes SSH, glusterd port (24007) , glusterfsd ports (49152-49200) were enabled. The disk usage of the volume was almost 99%. Hence, Added 3 more EC2 instances to the storage pool. On these 3 EC2 instances only SSH, glusterd (24007) ports were enabled. The glusterfsd ports (49152-49200) were blocked during the time of creation of the instance. Added 3 bricks from each of these nodes to the volume. Started rebalance. After some time the rebalance failed on all the bricks of replicate-0 , replicate-1 , replicate-2 and replicate-3. Rebalance got completed on replicate-4. However there were no migration of any data. Only link files were created on the bricks of replicate-4 sub volume. Since bricks from replicate-0, replicate-1, replicate-2 and replicate-3 cannot access bricks on replicate-4 rebalance fails on all of bricks on subvolumes 0-3. Rebalance process on client-0, client-1 and client-2 of replicate-4 subvolume will perform lookup's and create linkto files on it own brick for all the files which hashes to it's own subvolume. Also, since each node in the subvolume is not able to reach other bricks in the subvolume each brick marks the afr extended attributes for data, meta-data self-heal for self-healing onto other bricks leading the files to split-brain state. Also, number of files in split-brain on client-0 and client-1 are the same. But client-2 has lesser number of files in split-brain. When lookup happens on files and rebalance create link-to files it is expected to create the same number of files on all the 3 bricks of the sub-volume. Version-Release number of selected component (if applicable): ================================================================== glusterfs 3.4.0.58rhs built on Jan 25 2014 07:04:08 How reproducible: ==================== Steps to Reproduce: ======================= 1. Create 4 x 3 distribute volume. Start the volume. Create huge set of files and directories from fuse mount. 2. Add 3 more nodes to the storage pool . Block the glusterfsd INBOUND ports for these 3 nodes. i.e All incoming request on brick ports should be blocked. Any process running on the 3 nodes can access only it's own local bricks and the 3 nodes can access all other bricks in replicate-0, replicate-1, replicate-2 and replicate-3 subvolumes. 3. Add 3 bricks from the newly added 3 nodes to the volume changing the volume type to 5 x 3. 4. Start rebalance on the volume. Actual results: ================== root@domU-12-31-39-0A-99-B2 [Jan-24-2014-12:56:30] >gluster v add-brick exporter replica 3 domU-12-31-39-14-3E-21.compute-1.internal:/rhs/bricks/exporter ip-10-38-175-12.ec2.internal:/rhs/bricks/exporter ip-10-182-160-197.ec2.internal:/rhs/bricks/exporter volume add-brick: success root@domU-12-31-39-0A-99-B2 [Jan-24-2014-12:56:53] >gluster v status Status of volume: exporter Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick domU-12-31-39-0A-99-B2.compute-1.internal:/rhs/br icks/exporter 49152 Y 19405 Brick ip-10-194-111-63.ec2.internal:/rhs/bricks/exporte r 49152 Y 3812 Brick ip-10-182-165-181.ec2.internal:/rhs/bricks/export er 49152 Y 3954 Brick ip-10-46-226-179.ec2.internal:/rhs/bricks/exporte r 49152 Y 3933 Brick ip-10-83-5-197.ec2.internal:/rhs/bricks/exporter 49152 Y 8705 Brick ip-10-159-26-108.ec2.internal:/rhs/bricks/exporte r 49152 Y 20196 Brick domU-12-31-39-07-74-A5.compute-1.internal:/rhs/br icks/exporter 49152 Y 6553 Brick ip-10-80-109-233.ec2.internal:/rhs/bricks/exporte r 49152 Y 6450 Brick ip-10-181-128-26.ec2.internal:/rhs/bricks/exporte r 49152 Y 8569 Brick domU-12-31-39-0B-DC-01.compute-1.internal:/rhs/br icks/exporter 49152 Y 7145 Brick ip-10-34-105-112.ec2.internal:/rhs/bricks/exporte r 49152 Y 7123 Brick ip-10-232-7-75.ec2.internal:/rhs/bricks/exporter 49152 Y 3935 Brick domU-12-31-39-14-3E-21.compute-1.internal:/rhs/br icks/exporter 49152 Y 9540 Brick ip-10-38-175-12.ec2.internal:/rhs/bricks/exporter 49152 Y 9084 Brick ip-10-182-160-197.ec2.internal:/rhs/bricks/export er 49152 Y 9075 NFS Server on localhost 2049 Y 7543 Self-heal Daemon on localhost N/A Y 7550 NFS Server on domU-12-31-39-0B-DC-01.compute-1.internal 2049 Y 1750 Self-heal Daemon on domU-12-31-39-0B-DC-01.compute-1.in ternal N/A Y 1757 NFS Server on domU-12-31-39-14-3E-21.compute-1.internal 2049 Y 9592 Self-heal Daemon on domU-12-31-39-14-3E-21.compute-1.in ternal N/A Y 9599 NFS Server on ip-10-181-128-26.ec2.internal 2049 Y 8401 Self-heal Daemon on ip-10-181-128-26.ec2.internal N/A Y 8408 NFS Server on ip-10-182-165-181.ec2.internal 2049 Y 10799 Self-heal Daemon on ip-10-182-165-181.ec2.internal N/A Y 10806 NFS Server on ip-10-182-160-197.ec2.internal 2049 Y 9129 Self-heal Daemon on ip-10-182-160-197.ec2.internal N/A Y 9136 NFS Server on ip-10-232-7-75.ec2.internal 2049 Y 11144 Self-heal Daemon on ip-10-232-7-75.ec2.internal N/A Y 11151 NFS Server on ip-10-194-111-63.ec2.internal 2049 Y 27361 Self-heal Daemon on ip-10-194-111-63.ec2.internal N/A Y 27368 NFS Server on ip-10-34-105-112.ec2.internal 2049 Y 1447 Self-heal Daemon on ip-10-34-105-112.ec2.internal N/A Y 1454 NFS Server on ip-10-159-26-108.ec2.internal 2049 Y 11936 Self-heal Daemon on ip-10-159-26-108.ec2.internal N/A Y 11943 NFS Server on ip-10-80-109-233.ec2.internal 2049 Y 10757 Self-heal Daemon on ip-10-80-109-233.ec2.internal N/A Y 10764 NFS Server on ip-10-38-175-12.ec2.internal 2049 Y 9144 Self-heal Daemon on ip-10-38-175-12.ec2.internal N/A Y 9151 NFS Server on ip-10-83-5-197.ec2.internal 2049 Y 12427 Self-heal Daemon on ip-10-83-5-197.ec2.internal N/A Y 12434 NFS Server on ip-10-46-226-179.ec2.internal 2049 Y 12138 Self-heal Daemon on ip-10-46-226-179.ec2.internal N/A Y 12145 NFS Server on domU-12-31-39-07-74-A5.compute-1.internal 2049 Y 10208 Self-heal Daemon on domU-12-31-39-07-74-A5.compute-1.in ternal N/A Y 10215 Task Status of Volume exporter ------------------------------------------------------------------------------ There are no active volume tasks root@domU-12-31-39-0A-99-B2 [Jan-24-2014-12:56:57] >gluster v rebalance exporter start volume rebalance: exporter: success: Starting rebalance on volume exporter has been successful. ID: 26d198ec-658e-4b4c-80aa-51b225eccee2 root@domU-12-31-39-0A-99-B2 [Jan-28-2014- 2:43:06] >gluster v rebalance exporter status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 0 1 0 failed 0.00 ip-10-159-26-108.ec2.internal 0 0Bytes 0 1 0 failed 0.00 ip-10-194-111-63.ec2.internal 0 0Bytes 0 1 0 failed 0.00 domU-12-31-39-07-74-A5.compute-1.internal 0 0Bytes 0 1 0 failed 0.00 ip-10-83-5-197.ec2.internal 0 0Bytes 0 1 0 failed 0.00 ip-10-181-128-26.ec2.internal 0 0Bytes 0 1 0 failed 0.00 domU-12-31-39-0B-DC-01.compute-1.internal 0 0Bytes 0 1 0 failed 0.00 ip-10-34-105-112.ec2.internal 0 0Bytes 0 1 0 failed 0.00 ip-10-182-165-181.ec2.internal 0 0Bytes 0 1 0 failed 0.00 ip-10-46-226-179.ec2.internal 0 0Bytes 0 1 0 failed 0.00 ip-10-80-109-233.ec2.internal 0 0Bytes 0 1 0 failed 0.00 ip-10-232-7-75.ec2.internal 0 0Bytes 0 1 0 failed 0.00 domU-12-31-39-14-3E-21.compute-1.internal 0 0Bytes 309739 0 0 completed 33302.00 ip-10-38-175-12.ec2.internal 0 0Bytes 309735 0 0 completed 33302.00 ip-10-182-160-197.ec2.internal 0 0Bytes 310382 0 0 completed 28729.00 volume rebalance: exporter: success: root@domU-12-31-39-0A-99-B2 [Jan-28-2014- 2:43:13] > root@domU-12-31-39-0A-99-B2 [Jan-29-2014- 6:56:02] >gluster v heal exporter info | grep entries Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 0 Number of entries: 92755 Number of entries: 92755 Number of entries: 48120 root@domU-12-31-39-0A-99-B2 [Jan-29-2014- 8:38:01] > client-0 of replicate-4:- ========================= root@domU-12-31-39-14-3E-21 [Jan-30-2014- 5:36:36] >ls /rhs/bricks/exporter/.glusterfs/indices/xattrop/ | wc 92756 92756 3431980 root@domU-12-31-39-14-3E-21 [Jan-30-2014- 5:36:48] > client-1 of replicate-4:- ========================= root@ip-10-38-175-12 [Jan-30-2014- 5:36:36] >ls /rhs/bricks/exporter/.glusterfs/indices/xattrop/ | wc 92756 92756 3431980 root@ip-10-38-175-12 [Jan-30-2014- 5:36:48] > client-2 of replicate-4:- ========================= root@ip-10-182-160-197 [Jan-30-2014- 5:36:36] >ls /rhs/bricks/exporter/.glusterfs/indices/xattrop/ | wc 48121 48121 1780485 root@ip-10-182-160-197 [Jan-30-2014- 5:36:48] > Expected results: ====================== TBD Additional info: ======================= root@domU-12-31-39-0A-99-B2 [Jan-30-2014- 6:59:37] >gluster v info Volume Name: exporter Type: Distributed-Replicate Volume ID: 31e01742-36c4-4fbf-bffb-bc9ae98920a7 Status: Started Number of Bricks: 5 x 3 = 15 Transport-type: tcp Bricks: Brick1: domU-12-31-39-0A-99-B2.compute-1.internal:/rhs/bricks/exporter Brick2: ip-10-194-111-63.ec2.internal:/rhs/bricks/exporter Brick3: ip-10-182-165-181.ec2.internal:/rhs/bricks/exporter Brick4: ip-10-46-226-179.ec2.internal:/rhs/bricks/exporter Brick5: ip-10-83-5-197.ec2.internal:/rhs/bricks/exporter Brick6: ip-10-159-26-108.ec2.internal:/rhs/bricks/exporter Brick7: domU-12-31-39-07-74-A5.compute-1.internal:/rhs/bricks/exporter Brick8: ip-10-80-109-233.ec2.internal:/rhs/bricks/exporter Brick9: ip-10-181-128-26.ec2.internal:/rhs/bricks/exporter Brick10: domU-12-31-39-0B-DC-01.compute-1.internal:/rhs/bricks/exporter Brick11: ip-10-34-105-112.ec2.internal:/rhs/bricks/exporter Brick12: ip-10-232-7-75.ec2.internal:/rhs/bricks/exporter Brick13: domU-12-31-39-14-3E-21.compute-1.internal:/rhs/bricks/exporter Brick14: ip-10-38-175-12.ec2.internal:/rhs/bricks/exporter Brick15: ip-10-182-160-197.ec2.internal:/rhs/bricks/exporter root@domU-12-31-39-0A-99-B2 [Jan-28-2014- 2:42:54] >gluster v status Status of volume: exporter Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick domU-12-31-39-0A-99-B2.compute-1.internal:/rhs/br icks/exporter 49152 Y 19405 Brick ip-10-194-111-63.ec2.internal:/rhs/bricks/exporte r 49152 Y 3812 Brick ip-10-182-165-181.ec2.internal:/rhs/bricks/export er 49152 Y 3954 Brick ip-10-46-226-179.ec2.internal:/rhs/bricks/exporte r 49152 Y 3933 Brick ip-10-83-5-197.ec2.internal:/rhs/bricks/exporter 49152 Y 8705 Brick ip-10-159-26-108.ec2.internal:/rhs/bricks/exporte r 49152 Y 20196 Brick domU-12-31-39-07-74-A5.compute-1.internal:/rhs/br icks/exporter 49152 Y 6553 Brick ip-10-80-109-233.ec2.internal:/rhs/bricks/exporte r 49152 Y 6450 Brick ip-10-181-128-26.ec2.internal:/rhs/bricks/exporte r 49152 Y 8569 Brick domU-12-31-39-0B-DC-01.compute-1.internal:/rhs/br icks/exporter 49152 Y 7145 Brick ip-10-34-105-112.ec2.internal:/rhs/bricks/exporte r 49152 Y 7123 Brick ip-10-232-7-75.ec2.internal:/rhs/bricks/exporter 49152 Y 3935 Brick domU-12-31-39-14-3E-21.compute-1.internal:/rhs/br icks/exporter 49152 Y 9540 Brick ip-10-38-175-12.ec2.internal:/rhs/bricks/exporter 49152 Y 9084 Brick ip-10-182-160-197.ec2.internal:/rhs/bricks/export er 49152 Y 9075 NFS Server on localhost 2049 Y 7543 Self-heal Daemon on localhost N/A Y 7550 NFS Server on domU-12-31-39-0B-DC-01.compute-1.internal 2049 Y 1750 Self-heal Daemon on domU-12-31-39-0B-DC-01.compute-1.in ternal N/A Y 1757 NFS Server on ip-10-181-128-26.ec2.internal 2049 Y 8401 Self-heal Daemon on ip-10-181-128-26.ec2.internal N/A Y 8408 NFS Server on ip-10-232-7-75.ec2.internal 2049 Y 11144 Self-heal Daemon on ip-10-232-7-75.ec2.internal N/A Y 11151 NFS Server on ip-10-182-165-181.ec2.internal 2049 Y 10799 Self-heal Daemon on ip-10-182-165-181.ec2.internal N/A Y 10806 NFS Server on ip-10-182-160-197.ec2.internal 2049 Y 9129 Self-heal Daemon on ip-10-182-160-197.ec2.internal N/A Y 9136 NFS Server on ip-10-194-111-63.ec2.internal 2049 Y 27361 Self-heal Daemon on ip-10-194-111-63.ec2.internal N/A Y 27368 NFS Server on ip-10-159-26-108.ec2.internal 2049 Y 11936 Self-heal Daemon on ip-10-159-26-108.ec2.internal N/A Y 11943 NFS Server on ip-10-46-226-179.ec2.internal 2049 Y 12138 Self-heal Daemon on ip-10-46-226-179.ec2.internal N/A Y 12145 NFS Server on ip-10-83-5-197.ec2.internal 2049 Y 12427 Self-heal Daemon on ip-10-83-5-197.ec2.internal N/A Y 12434 NFS Server on domU-12-31-39-14-3E-21.compute-1.internal 2049 Y 9592 Self-heal Daemon on domU-12-31-39-14-3E-21.compute-1.in ternal N/A Y 9599 NFS Server on ip-10-34-105-112.ec2.internal 2049 Y 1447 Self-heal Daemon on ip-10-34-105-112.ec2.internal N/A Y 1454 NFS Server on ip-10-38-175-12.ec2.internal 2049 Y 9144 Self-heal Daemon on ip-10-38-175-12.ec2.internal N/A Y 9151 NFS Server on domU-12-31-39-07-74-A5.compute-1.internal 2049 Y 10208 Self-heal Daemon on domU-12-31-39-07-74-A5.compute-1.in ternal N/A Y 10215 NFS Server on ip-10-80-109-233.ec2.internal 2049 Y 10757 Self-heal Daemon on ip-10-80-109-233.ec2.internal N/A Y 10764 Task Status of Volume exporter ------------------------------------------------------------------------------ Task : Rebalance ID : 26d198ec-658e-4b4c-80aa-51b225eccee2 Status : failed
SOS Reports : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1059551/
Component is gluster-afr so removing zteam from devel-whiteboard.
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/ If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.