Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1054816

Summary: Rebalance reporting failure for file migrations
Product: [Community] GlusterFS Reporter: Lalatendu Mohanty <lmohanty>
Component: coreAssignee: Ric Wheeler <rwheeler>
Status: CLOSED CURRENTRELEASE QA Contact: Lalatendu Mohanty <lmohanty>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.5.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-07-11 19:17:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
glusterd logs
none
Rebalance Log none

Description Lalatendu Mohanty 2014-01-17 14:21:27 UTC
Description of problem:

Rebalance status command reports failure and rebalance log file has lots of errors related to file migration.


Version-Release number of selected component (if applicable):

rpm -qa | grep glusterfs
glusterfs-libs-3.5.0-0.1.beta1.el6.x86_64
glusterfs-server-3.5.0-0.1.beta1.el6.x86_64
glusterfs-3.5.0-0.1.beta1.el6.x86_64
glusterfs-cli-3.5.0-0.1.beta1.el6.x86_64
glusterfs-devel-3.5.0-0.1.beta1.el6.x86_64
glusterfs-fuse-3.5.0-0.1.beta1.el6.x86_64
glusterfs-debuginfo-3.5.0-0.1.beta1.el6.x86_64

How reproducible:

The first attempt and it failed 

Steps to Reproduce:
1. Created a dht gluster volume using two bricks.
2. Mounted it using Fuse and NFS on different clients
3. Started I/O on fuse and nfs mount in different directories (i.e. fuse I/O on a directory and nfs I/O on a different directory)
4. Added a new brick from the existing node,  using gluster v add-brick volume2 <IP>:<new brick>
gluster v rebalance volume2 start
gluster v rebalance volume2 status

Actual results:

gluster v rebalance volume2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost            17353        46.2MB        122527          4849             0            completed            1483.00
                            10.16.159.83            11805        37.7MB        116963          9337             1            completed            1472.00
volume rebalance: volume2: success: 


Expected results:


Additional info:

From  volume2-rebalance.log 

2014-01-17 06:39:00.701252] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir3/TestDir4/a0: failed to get trusted.distribute.linkinfo key - Success
[2014-01-17 06:39:00.702088] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir3/TestDir4/a1: failed to get trusted.distribute.linkinfo key - Invalid argument
[2014-01-17 06:39:00.756721] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir3/TestDir7/a0: failed to get trusted.distribute.linkinfo key - Success
[2014-01-17 06:39:00.757736] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir3/TestDir7/a1: failed to get trusted.distribute.linkinfo key - Invalid argument
[2014-01-17 06:39:01.025917] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir4/TestDir2/a1: failed to get trusted.distribute.linkinfo key - Invalid argument
[2014-01-17 06:39:02.306725] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir6/TestDir7/a0: failed to get trusted.distribute.linkinfo key - Invalid argument
[2014-01-17 06:39:02.307997] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir6/TestDir7/a1: failed to get trusted.distribute.linkinfo key - Invalid argument
[2014-01-17 06:39:02.630009] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir7/TestDir5/a0: failed to get trusted.distribute.linkinfo key - Success
[2014-01-17 06:39:02.631110] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir7/TestDir5/a1: failed to get trusted.distribute.linkinfo key - Invalid argument
[2014-01-17 06:39:03.998813] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir2/TestDir0/TestDir0/a0: failed to get trusted.distribute.linkinfo key - Success
[2014-01-17 06:39:03.999937] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir2/TestDir0/TestDir0/a1: failed to get trusted.distribute.linkinfo key - Invalid argument

############################################################################

grep '\] E \[' etc-glusterfs-glusterd.vol.log

[2014-01-17 06:24:30.686127] E [glusterd-utils.c:4112:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/0473c76ba29d6ca04d7410d1d12afc68.socket error: Permission denied
[2014-01-17 06:25:10.414734] E [glusterd-utils.c:8018:glusterd_volume_rebalance_use_rsp_dict] 0-: failed to get index

Comment 1 Lalatendu Mohanty 2014-01-17 14:28:51 UTC
Created attachment 851643 [details]
glusterd logs

Comment 2 Lalatendu Mohanty 2014-01-17 14:29:55 UTC
The rebalance logs are more than 30 MB, hence not able to attach it to bugzilla

Comment 3 Lalatendu Mohanty 2014-01-17 14:36:25 UTC
I also used force option to see if it resolves the issue. But is also returning failures.
gluster v rebalance volume2 start force
gluster v rebalance volume2 status
[root@dhcpxxx-xxx glusterfs]# gluster v rebalance volume2 status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status   run time in secs
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                               localhost               22        45.0KB         61067          2012             0          in progress             505.00
                            10.16.159.83                8        18.0KB         65570          6753             0          in progress             504.00
volume rebalance: volume2: success: 


From rebalance logs:

[2014-01-17 08:30:42.251998] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir2/TestDir0/TestDir4/TestDir5/TestDir6/a0: 
failed to get trusted.distribute.linkinfo key - Success
[2014-01-17 08:30:42.252854] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir2/TestDir0/TestDir4/TestDir5/TestDir6/a1: 
failed to get trusted.distribute.linkinfo key - Success

Comment 4 Lalatendu Mohanty 2014-01-17 14:38:34 UTC
Created attachment 851645 [details]
Rebalance Log

Comment 5 Lalatendu Mohanty 2014-01-17 14:52:50 UTC
Regarding the I/O I running below script. 

https://github.com/LalatenduMohanty/utility_scripts/blob/master/CreateDirAndFileTree.pl

1. on fuse, I was creating files of size 1KB to 4 KB 
  i.e. perl CreateDirAndFileTree.pl /mnt/fuse/io 3 1 4 10 10

2. On NFS, ./CreateDirAndFileTree.pl /mnt/nfs/nfs-io/ 3 10000 15000 5 5 
   and later /CreateDirAndFileTree.pl /mnt/nfs-local/nfs-io/ 100 10 15 5 5

Comment 6 Lalatendu Mohanty 2014-02-03 17:56:28 UTC
Verified on glusterfs-3.5.0-0.4.beta2.el6.x86_64 and the issue is not found. Hence closing the bug.