Description of problem: ========================= On a 2 x 2 distribute-replicate volume all the bricks were 100% full. Hence, added new bricks to the volume changing the volume type to 3 x 2. Started rebalance on the volume to start the migration of files from existing bricks to the newly added bricks. Migration of files were skipped and only directories were created on newly added bricks. Output from "rebalance status" command: ======================================= root@ip-10-64-69-235 [Nov-27-2013- 9:53:47] >gluster v rebalance vol_rep status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 1753 0 273 completed 51.00 10.202.206.127 0 0Bytes 1753 0 0 completed 40.00 10.111.67.22 1 3.8KB 1754 0 260 completed 51.00 10.101.31.43 0 0Bytes 1753 0 0 completed 46.00 10.235.46.241 0 0Bytes 1753 0 0 completed 39.00 10.29.187.33 0 0Bytes 1753 0 0 completed 39.00 Rebalance Log messages: ========================== [2013-11-27 09:51:09.478324] I [dht-rebalance.c:672:dht_migrate_file] 0-vol_rep-dht: /user1/TestDir1/file1: attempting to move from vol_rep-replicate-0 to vol_rep-replicate-2 [2013-11-27 09:51:09.567092] W [dht-rebalance.c:374:__dht_check_free_space] 0-vol_rep-dht: data movement attempted from node (vol_rep-replicate-0) with higher disk space to a node (vol_rep-replicate-2) with lesser disk space (/user1/TestDir1/file1) [2013-11-27 09:51:09.578221] I [dht-rebalance.c:672:dht_migrate_file] 0-vol_rep-dht: /user1/TestDir1/file2: attempting to move from vol_rep-replicate-0 to vol_rep-replicate-2 [2013-11-27 09:51:09.596087] W [dht-rebalance.c:374:__dht_check_free_space] 0-vol_rep-dht: data movement attempted from node (vol_rep-replicate-0) with higher disk space to a node (vol_rep-replicate-2) with lesser disk space (/user1/TestDir1/file2) Actual results: ================= 1) The data movement was attempted but the files are not migrated 2) The warning message tells (vol_rep-replicate-0) as higher disk space and (vol_rep-replicate-2) as lesser disk space. In our case (vol_rep-replicate-2) is the newly added brick and has 100% disk free where as (vol_rep-replicate-0) is 100% full. Additional Info: =================== 1) Each of the bricks in the volume contained "840GB" space. 2) When "gluster volume rebalance <volume_name> force" was executed , The migration of data started and completed successfully. 3) The test was performed on RHS-AWS-AMI's. Version-Release number of selected component (if applicable): ============================================================= glusterfs 3.4.0.44.1u2rhs built on Nov 25 2013 08:17:39 How reproducible: ================= Steps to Reproduce: ====================== 1. Create a 2 x 2 distribute-replicate volume with 4 storage nodes and 1 brick per storage node. 2. Create fuse mount. Fill the volume by creating directories and files. 3. Once the volume is filled, add 2 new servers to the cluster 4. Add bricks from the 2 new servers to the volume. 5. Start rebalance (gluster volume rebalance <volume_name> start" Expected results: ================== Some files should have migrated to the newly added subvolume. Additional Info: ================ root@ip-10-64-69-235 [Nov-27-2013- 9:49:54] >gluster v add-brick vol_rep replica 2 10.235.46.241:/rhs/bricks/b3 10.29.187.33:/rhs/bricks/b3_rep1 volume add-brick: success root@ip-10-64-69-235 [Nov-27-2013- 9:50:12] >gluster v info Volume Name: vol_rep Type: Distributed-Replicate Volume ID: 02b066e9-4800-43ca-9556-2b06973d9cdf Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: 10.64.69.235:/rhs/bricks/b1 Brick2: 10.202.206.127:/rhs/bricks/b1_rep1 Brick3: 10.111.67.22:/rhs/bricks/b2 Brick4: 10.101.31.43:/rhs/bricks/b2_rep1 Brick5: 10.235.46.241:/rhs/bricks/b3 Brick6: 10.29.187.33:/rhs/bricks/b3_rep1 root@ip-10-64-69-235 [Nov-27-2013- 9:50:17] >gluster v status Status of volume: vol_rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.64.69.235:/rhs/bricks/b1 49152 Y 6466 Brick 10.202.206.127:/rhs/bricks/b1_rep1 49152 Y 6310 Brick 10.111.67.22:/rhs/bricks/b2 49152 Y 6292 Brick 10.101.31.43:/rhs/bricks/b2_rep1 49152 Y 6286 Brick 10.235.46.241:/rhs/bricks/b3 49152 Y 15112 Brick 10.29.187.33:/rhs/bricks/b3_rep1 49152 Y 15224 NFS Server on localhost 2049 Y 16350 Self-heal Daemon on localhost N/A Y 16357 NFS Server on 10.235.46.241 2049 Y 15124 Self-heal Daemon on 10.235.46.241 N/A Y 15131 NFS Server on 10.29.187.33 2049 Y 15236 Self-heal Daemon on 10.29.187.33 N/A Y 15243 NFS Server on 10.202.206.127 2049 Y 16308 Self-heal Daemon on 10.202.206.127 N/A Y 16315 NFS Server on 10.101.31.43 2049 Y 27670 Self-heal Daemon on 10.101.31.43 N/A Y 27677 NFS Server on 10.111.67.22 2049 Y 15770 Self-heal Daemon on 10.111.67.22 N/A Y 15777 Task Status of Volume vol_rep ------------------------------------------------------------------------------ There are no active volume tasks root@ip-10-64-69-235 [Nov-27-2013- 9:50:24] > root@ip-10-64-69-235 [Nov-27-2013- 9:50:25] > root@ip-10-64-69-235 [Nov-27-2013- 9:50:25] >gluster v rebalance vol_rep start volume rebalance: vol_rep: success: Starting rebalance on volume vol_rep has been successful. ID: f928d2ea-f98e-41f8-b275-f1043b149f94 root@ip-10-64-69-235 [Nov-27-2013- 9:51:08] >gluster v rebalance vol_rep status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 174 0 18 in progress 8.00 10.202.206.127 0 0Bytes 225 0 0 in progress 8.00 10.111.67.22 1 3.8KB 146 0 28 in progress 8.00 10.101.31.43 0 0Bytes 185 0 0 in progress 8.00 10.235.46.241 0 0Bytes 234 0 0 in progress 8.00 10.29.187.33 0 0Bytes 234 0 0 in progress 8.00 volume rebalance: vol_rep: success: root@ip-10-64-69-235 [Nov-27-2013- 9:51:16] > root@ip-10-64-69-235 [Nov-27-2013- 9:51:18] > root@ip-10-64-69-235 [Nov-27-2013- 9:53:47] >gluster v rebalance vol_rep status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 1753 0 273 completed 51.00 10.202.206.127 0 0Bytes 1753 0 0 completed 40.00 10.111.67.22 1 3.8KB 1754 0 260 completed 51.00 10.101.31.43 0 0Bytes 1753 0 0 completed 46.00 10.235.46.241 0 0Bytes 1753 0 0 completed 39.00 10.29.187.33 0 0Bytes 1753 0 0 completed 39.00 volume rebalance: vol_rep: success:
Based on discussion with developers, seems this is fixed. Hence marking for 3.0.4 for verification. This can removed from the list for 3.0.4 if it fails.
This bug fix has been verified and found no issues. Steps Followed: 1. mounted the 2x2 volume, filled the bricks. 2. add new brick, kick rebalance. 3. rebalance was successful Result: Test case completed by checking log message that rebalance happened successfully and files have moved to newly added brick. Output: [root@rhsauto032 ~]# gluster v rebalance small status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 21 201.4MB 121 0 3 completed 7.00 rhsauto034.lab.eng.blr.redhat.com 18 171.3MB 139 0 0 completed 12.00 volume rebalance: small: success: [root@rhsauto032 ~] [root@rhsauto032 ~]# gluster v info small Volume Name: small Type: Distribute Volume ID: 991c8931-264d-4d5d-8652-ce5343cdaa1f Status: Started Snap Volume: no Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: rhsauto032.lab.eng.blr.redhat.com:/smallbrick1/s0 Brick2: rhsauto034.lab.eng.blr.redhat.com:/smallbrick1/s1 Brick3: rhsauto032:/rhs/brick4/s2 Options Reconfigured: server.allow-insecure: on cluster.min-free-disk: 10 features.quota-deem-statfs: on features.quota: on performance.readdir-ahead: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 [root@rhsauto032 ~]# [root@rhsauto032 ~]# gluster v status small Status of volume: small Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhsauto032.lab.eng.blr.redhat.com:/smallbrick1/s0 49163 Y 26855 Brick rhsauto034.lab.eng.blr.redhat.com:/smallbrick1/s1 49163 Y 26297 Brick rhsauto032:/rhs/brick4/s2 49164 Y 27450 NFS Server on localhost 2049 Y 27463 Quota Daemon on localhost N/A Y 27480 NFS Server on rhsauto040.lab.eng.blr.redhat.com 2049 Y 15943 Quota Daemon on rhsauto040.lab.eng.blr.redhat.com N/A Y 15957 NFS Server on rhsauto034.lab.eng.blr.redhat.com 2049 Y 26655 Quota Daemon on rhsauto034.lab.eng.blr.redhat.com N/A Y 26663 Task Status of Volume small ------------------------------------------------------------------------------ Task : Rebalance ID : 9e55c5dc-d284-4093-859e-2d069a75a1d2 Status : completed [root@rhsauto032 ~]# Log messages: [2015-02-19 20:35:00.377480] I [dht-common.c:3250:dht_setxattr] 0-small-dht: fixing the layout of / [2015-02-19 20:35:00.381496] I [dht-rebalance.c:1430:gf_defrag_migrate_data] 0-small-dht: migrate data called on / [2015-02-19 20:35:00.412380] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file11: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:00.760704] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file11 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:00.766851] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file16: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:01.050157] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file16 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:01.056407] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file17: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:01.219526] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file17 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:01.225594] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file19: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:01.474963] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file19 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:01.483279] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file24: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:01.737746] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file24 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:01.754422] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file25: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:02.165525] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file25 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:02.178419] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file31: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:02.513237] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file31 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:02.526518] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file38: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:02.839475] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file38 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:02.855769] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file41: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:03.107426] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file41 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:03.115580] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file42: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:03.361683] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file42 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:03.368999] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file44: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:03.621624] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file44 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:04.100482] I [MSGID: 109028] [dht-rebalance.c:2139:gf_defrag_status_get] 0-glusterfs: Files migrated: 12, size: 125829120, lookups: 33, failures: 0, skipped: 0 [2015-02-19 20:35:04.171980] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file62 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:04.175992] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file63: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:04.456370] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file63 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:04.464592] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file69: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:04.801834] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file69 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:04.806772] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file73: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:05.068680] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file73 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:05.074675] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file80: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:05.435411] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file80 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:05.450156] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file87: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:05.789593] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file87 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:05.795245] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file88: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:06.074558] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file88 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:06.079392] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file89: attempting to move from small-client-0 to small-client-1 [2015-02-19 20:35:06.094846] W [MSGID: 109023] [dht-rebalance.c:568:__dht_check_free_space] 0-small-dht: data movement attempted from node (small-client-0:389200) with higher disk space to a node (small-client-1:143432) with lesser disk space, file { blocks:20480, name:(/file89) } [2015-02-19 20:35:06.103482] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file90: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:06.446711] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file90 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:06.453115] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file92: attempting to move from small-client-0 to small-client-1 [2015-02-19 20:35:06.468046] W [MSGID: 109023] [dht-rebalance.c:568:__dht_check_free_space] 0-small-dht: data movement attempted from node (small-client-0:409680) with higher disk space to a node (small-client-1:163912) with lesser disk space, file { blocks:20480, name:(/file92) } [2015-02-19 20:35:06.482520] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file96: attempting to move from small-client-0 to small-client-1 [2015-02-19 20:35:06.500995] W [MSGID: 109023] [dht-rebalance.c:568:__dht_check_free_space] 0-small-dht: data movement attempted from node (small-client-0:409680) with higher disk space to a node (small-client-1:163912) with lesser disk space, file { blocks:20480, name:(/file96) } [2015-02-19 20:35:06.507909] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file97: attempting to move from small-client-0 to small-client-2 [2015-02-19 20:35:06.623404] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file97 from subvolume small-client-0 to small-client-2 [2015-02-19 20:35:06.883261] I [dht-common.c:1563:dht_lookup_everywhere_cbk] 0-small-dht: attempting deletion of stale linkfile /file93 on small-client-0 (hashed subvol is small-client-2) [2015-02-19 20:35:06.883695] I [dht-common.c:892:dht_lookup_unlink_cbk] 0-small-dht: lookup_unlink returned with op_ret -> 0 and op-errno -> 0 for /file93 [2015-02-19 20:35:06.940133] I [dht-rebalance.c:1673:gf_defrag_migrate_data] 0-small-dht: Migration operation on dir / took 6.56 secs [2015-02-19 20:35:07.017437] I [MSGID: 109028] [dht-rebalance.c:2135:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 7.00 secs [2015-02-19 20:35:07.017482] I [MSGID: 109028] [dht-rebalance.c:2139:gf_defrag_status_get] 0-glusterfs: Files migrated: 21, size: 211161088, lookups: 121, failures: 0, skipped: 3 [2015-02-19 20:35:07.017909] W [glusterfsd.c:1183:cleanup_and_exit] (--> 0-: received signum (15), shutting down
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0682.html