1035647 – Migration of files to the newly added bricks were skipped from the bricks which were 100% filled.

Bug 1035647 - Migration of files to the newly added bricks were skipped from the bricks which were 100% filled.

Summary: Migration of files to the newly added bricks were skipped from the bricks whi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.0.4
Assignee:	Nithya Balachandran
QA Contact:	Triveni Rao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1182947
TreeView+	depends on / blocked

Reported:	2013-11-28 09:11 UTC by spandura
Modified:	2016-05-16 04:38 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.6.0.44-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-03-26 06:33:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:0682	0	normal	SHIPPED_LIVE	Red Hat Storage 3.0 enhancement and bug fix update #4	2015-03-26 10:32:55 UTC

Description spandura 2013-11-28 09:11:30 UTC

Description of problem:
=========================
On a 2 x 2 distribute-replicate volume all the bricks were 100% full. Hence, added new bricks to the volume changing the volume type to 3 x 2. Started rebalance on the volume to start the migration of files from existing bricks to the newly added bricks. 

Migration of files were skipped and only directories were created on newly added bricks. 

Output from "rebalance status" command:
=======================================
root@ip-10-64-69-235 [Nov-27-2013- 9:53:47] >gluster v rebalance vol_rep status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes          1753             0           273            completed              51.00
                          10.202.206.127                0        0Bytes          1753             0             0            completed              40.00
                            10.111.67.22                1         3.8KB          1754             0           260            completed              51.00
                            10.101.31.43                0        0Bytes          1753             0             0            completed              46.00
                           10.235.46.241                0        0Bytes          1753             0             0            completed              39.00
                            10.29.187.33                0        0Bytes          1753             0             0            completed              39.00

Rebalance Log messages: 
==========================
[2013-11-27 09:51:09.478324] I [dht-rebalance.c:672:dht_migrate_file] 0-vol_rep-dht: /user1/TestDir1/file1: attempting to move from vol_rep-replicate-0 to vol_rep-replicate-2
[2013-11-27 09:51:09.567092] W [dht-rebalance.c:374:__dht_check_free_space] 0-vol_rep-dht: data movement attempted from node (vol_rep-replicate-0) with higher disk space to a node (vol_rep-replicate-2) with lesser disk space (/user1/TestDir1/file1)

[2013-11-27 09:51:09.578221] I [dht-rebalance.c:672:dht_migrate_file] 0-vol_rep-dht: /user1/TestDir1/file2: attempting to move from vol_rep-replicate-0 to vol_rep-replicate-2
[2013-11-27 09:51:09.596087] W [dht-rebalance.c:374:__dht_check_free_space] 0-vol_rep-dht: data movement attempted from node (vol_rep-replicate-0) with higher disk space to a node (vol_rep-replicate-2) with lesser disk space (/user1/TestDir1/file2)

Actual results:
=================
1) The data movement was attempted but the files are not migrated 

2) The warning message tells (vol_rep-replicate-0) as higher disk space and (vol_rep-replicate-2) as lesser disk space. In our case (vol_rep-replicate-2) is the newly added brick and has 100% disk free where as (vol_rep-replicate-0) is 100% full.

Additional Info:
===================
1) Each of the bricks in the volume contained "840GB" space. 

2) When "gluster volume rebalance <volume_name> force" was executed , The migration of data started and completed successfully. 

3) The test was performed on RHS-AWS-AMI's. 

Version-Release number of selected component (if applicable):
=============================================================
glusterfs 3.4.0.44.1u2rhs built on Nov 25 2013 08:17:39

How reproducible:
=================

Steps to Reproduce:
======================
1. Create a 2 x 2 distribute-replicate volume with 4 storage nodes and 1 brick per storage node. 

2. Create fuse mount. Fill the volume by creating directories and files. 

3. Once the volume is filled, add 2 new servers to the cluster

4. Add bricks from the 2 new servers to the volume. 

5. Start rebalance (gluster volume rebalance <volume_name> start"

Expected results:
==================
Some files should have migrated to the newly added subvolume. 

Additional Info:
================
root@ip-10-64-69-235 [Nov-27-2013- 9:49:54] >gluster v add-brick vol_rep replica 2 10.235.46.241:/rhs/bricks/b3 10.29.187.33:/rhs/bricks/b3_rep1
volume add-brick: success
root@ip-10-64-69-235 [Nov-27-2013- 9:50:12] >gluster v info
 
Volume Name: vol_rep
Type: Distributed-Replicate
Volume ID: 02b066e9-4800-43ca-9556-2b06973d9cdf
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: 10.64.69.235:/rhs/bricks/b1
Brick2: 10.202.206.127:/rhs/bricks/b1_rep1
Brick3: 10.111.67.22:/rhs/bricks/b2
Brick4: 10.101.31.43:/rhs/bricks/b2_rep1
Brick5: 10.235.46.241:/rhs/bricks/b3
Brick6: 10.29.187.33:/rhs/bricks/b3_rep1
root@ip-10-64-69-235 [Nov-27-2013- 9:50:17] >gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.64.69.235:/rhs/bricks/b1			49152	Y	6466
Brick 10.202.206.127:/rhs/bricks/b1_rep1		49152	Y	6310
Brick 10.111.67.22:/rhs/bricks/b2			49152	Y	6292
Brick 10.101.31.43:/rhs/bricks/b2_rep1			49152	Y	6286
Brick 10.235.46.241:/rhs/bricks/b3			49152	Y	15112
Brick 10.29.187.33:/rhs/bricks/b3_rep1			49152	Y	15224
NFS Server on localhost					2049	Y	16350
Self-heal Daemon on localhost				N/A	Y	16357
NFS Server on 10.235.46.241				2049	Y	15124
Self-heal Daemon on 10.235.46.241			N/A	Y	15131
NFS Server on 10.29.187.33				2049	Y	15236
Self-heal Daemon on 10.29.187.33			N/A	Y	15243
NFS Server on 10.202.206.127				2049	Y	16308
Self-heal Daemon on 10.202.206.127			N/A	Y	16315
NFS Server on 10.101.31.43				2049	Y	27670
Self-heal Daemon on 10.101.31.43			N/A	Y	27677
NFS Server on 10.111.67.22				2049	Y	15770
Self-heal Daemon on 10.111.67.22			N/A	Y	15777
 
Task Status of Volume vol_rep
------------------------------------------------------------------------------
There are no active volume tasks
 
root@ip-10-64-69-235 [Nov-27-2013- 9:50:24] >
root@ip-10-64-69-235 [Nov-27-2013- 9:50:25] >
root@ip-10-64-69-235 [Nov-27-2013- 9:50:25] >gluster v rebalance vol_rep start
volume rebalance: vol_rep: success: Starting rebalance on volume vol_rep has been successful.
ID: f928d2ea-f98e-41f8-b275-f1043b149f94
root@ip-10-64-69-235 [Nov-27-2013- 9:51:08] >gluster v rebalance vol_rep status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes           174             0            18          in progress               8.00
                          10.202.206.127                0        0Bytes           225             0             0          in progress               8.00
                            10.111.67.22                1         3.8KB           146             0            28          in progress               8.00
                            10.101.31.43                0        0Bytes           185             0             0          in progress               8.00
                           10.235.46.241                0        0Bytes           234             0             0          in progress               8.00
                            10.29.187.33                0        0Bytes           234             0             0          in progress               8.00
volume rebalance: vol_rep: success: 
root@ip-10-64-69-235 [Nov-27-2013- 9:51:16] >
root@ip-10-64-69-235 [Nov-27-2013- 9:51:18] >
root@ip-10-64-69-235 [Nov-27-2013- 9:53:47] >gluster v rebalance vol_rep status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost                0        0Bytes          1753             0           273            completed              51.00
                          10.202.206.127                0        0Bytes          1753             0             0            completed              40.00
                            10.111.67.22                1         3.8KB          1754             0           260            completed              51.00
                            10.101.31.43                0        0Bytes          1753             0             0            completed              46.00
                           10.235.46.241                0        0Bytes          1753             0             0            completed              39.00
                            10.29.187.33                0        0Bytes          1753             0             0            completed              39.00
volume rebalance: vol_rep: success:

Comment 2 Vivek Agarwal 2015-02-06 12:51:58 UTC

Based on discussion with developers, seems this is fixed. Hence marking for 3.0.4 for verification. This can removed from the list for 3.0.4 if it fails.

Comment 3 Triveni Rao 2015-02-20 09:29:17 UTC

This bug fix has been verified and found no issues.

Steps Followed:
1. mounted the 2x2 volume, filled the bricks.
2. add new brick, kick rebalance.
3. rebalance was successful 


Result: Test case completed by checking log message that rebalance happened successfully and files have moved to newly added brick.


Output:

[root@rhsauto032 ~]# gluster v rebalance small status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost               21       201.4MB           121             0             3            completed               7.00
       rhsauto034.lab.eng.blr.redhat.com               18       171.3MB           139             0             0            completed              12.00
volume rebalance: small: success: 
[root@rhsauto032 ~]

[root@rhsauto032 ~]# gluster v info small
 
Volume Name: small
Type: Distribute
Volume ID: 991c8931-264d-4d5d-8652-ce5343cdaa1f
Status: Started
Snap Volume: no
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhsauto032.lab.eng.blr.redhat.com:/smallbrick1/s0
Brick2: rhsauto034.lab.eng.blr.redhat.com:/smallbrick1/s1
Brick3: rhsauto032:/rhs/brick4/s2
Options Reconfigured:
server.allow-insecure: on
cluster.min-free-disk: 10
features.quota-deem-statfs: on
features.quota: on
performance.readdir-ahead: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256
[root@rhsauto032 ~]# 


[root@rhsauto032 ~]# gluster v status small
Status of volume: small
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick rhsauto032.lab.eng.blr.redhat.com:/smallbrick1/s0 49163   Y       26855
Brick rhsauto034.lab.eng.blr.redhat.com:/smallbrick1/s1 49163   Y       26297
Brick rhsauto032:/rhs/brick4/s2                         49164   Y       27450
NFS Server on localhost                                 2049    Y       27463
Quota Daemon on localhost                               N/A     Y       27480
NFS Server on rhsauto040.lab.eng.blr.redhat.com         2049    Y       15943
Quota Daemon on rhsauto040.lab.eng.blr.redhat.com       N/A     Y       15957
NFS Server on rhsauto034.lab.eng.blr.redhat.com         2049    Y       26655
Quota Daemon on rhsauto034.lab.eng.blr.redhat.com       N/A     Y       26663
 
Task Status of Volume small
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 9e55c5dc-d284-4093-859e-2d069a75a1d2
Status               : completed           
 
[root@rhsauto032 ~]# 


Log messages:

[2015-02-19 20:35:00.377480] I [dht-common.c:3250:dht_setxattr] 0-small-dht: fixing the layout of /
[2015-02-19 20:35:00.381496] I [dht-rebalance.c:1430:gf_defrag_migrate_data] 0-small-dht: migrate data called on /
[2015-02-19 20:35:00.412380] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file11: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:00.760704] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file11 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:00.766851] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file16: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:01.050157] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file16 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:01.056407] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file17: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:01.219526] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file17 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:01.225594] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file19: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:01.474963] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file19 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:01.483279] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file24: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:01.737746] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file24 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:01.754422] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file25: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:02.165525] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file25 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:02.178419] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file31: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:02.513237] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file31 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:02.526518] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file38: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:02.839475] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file38 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:02.855769] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file41: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:03.107426] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file41 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:03.115580] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file42: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:03.361683] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file42 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:03.368999] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file44: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:03.621624] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file44 from subvolume small-client-0 to small-client-2

[2015-02-19 20:35:04.100482] I [MSGID: 109028] [dht-rebalance.c:2139:gf_defrag_status_get] 0-glusterfs: Files migrated: 12, size: 125829120, lookups: 33, failures: 0, skipped: 0
[2015-02-19 20:35:04.171980] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file62 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:04.175992] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file63: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:04.456370] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file63 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:04.464592] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file69: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:04.801834] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file69 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:04.806772] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file73: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:05.068680] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file73 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:05.074675] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file80: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:05.435411] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file80 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:05.450156] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file87: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:05.789593] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file87 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:05.795245] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file88: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:06.074558] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file88 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:06.079392] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file89: attempting to move from small-client-0 to small-client-1
[2015-02-19 20:35:06.094846] W [MSGID: 109023] [dht-rebalance.c:568:__dht_check_free_space] 0-small-dht: data movement attempted from node (small-client-0:389200) with higher disk space to a node (small-client-1:143432) with lesser disk space, file { blocks:20480, name:(/file89) }
[2015-02-19 20:35:06.103482] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file90: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:06.446711] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file90 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:06.453115] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file92: attempting to move from small-client-0 to small-client-1
[2015-02-19 20:35:06.468046] W [MSGID: 109023] [dht-rebalance.c:568:__dht_check_free_space] 0-small-dht: data movement attempted from node (small-client-0:409680) with higher disk space to a node (small-client-1:163912) with lesser disk space, file { blocks:20480, name:(/file92) }
[2015-02-19 20:35:06.482520] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file96: attempting to move from small-client-0 to small-client-1
[2015-02-19 20:35:06.500995] W [MSGID: 109023] [dht-rebalance.c:568:__dht_check_free_space] 0-small-dht: data movement attempted from node (small-client-0:409680) with higher disk space to a node (small-client-1:163912) with lesser disk space, file { blocks:20480, name:(/file96) }
[2015-02-19 20:35:06.507909] I [dht-rebalance.c:902:dht_migrate_file] 0-small-dht: /file97: attempting to move from small-client-0 to small-client-2
[2015-02-19 20:35:06.623404] I [MSGID: 109022] [dht-rebalance.c:1180:dht_migrate_file] 0-small-dht: completed migration of /file97 from subvolume small-client-0 to small-client-2
[2015-02-19 20:35:06.883261] I [dht-common.c:1563:dht_lookup_everywhere_cbk] 0-small-dht: attempting deletion of stale linkfile /file93 on small-client-0 (hashed subvol is small-client-2)
[2015-02-19 20:35:06.883695] I [dht-common.c:892:dht_lookup_unlink_cbk] 0-small-dht: lookup_unlink returned with op_ret -> 0 and op-errno -> 0 for /file93
[2015-02-19 20:35:06.940133] I [dht-rebalance.c:1673:gf_defrag_migrate_data] 0-small-dht: Migration operation on dir / took 6.56 secs
[2015-02-19 20:35:07.017437] I [MSGID: 109028] [dht-rebalance.c:2135:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 7.00 secs
[2015-02-19 20:35:07.017482] I [MSGID: 109028] [dht-rebalance.c:2139:gf_defrag_status_get] 0-glusterfs: Files migrated: 21, size: 211161088, lookups: 121, failures: 0, skipped: 3
[2015-02-19 20:35:07.017909] W [glusterfsd.c:1183:cleanup_and_exit] (--> 0-: received signum (15), shutting down

Comment 5 errata-xmlrpc 2015-03-26 06:33:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html

Note You need to log in before you can comment on or make changes to this bug.