Bug 1054816
| Summary: | Rebalance reporting failure for file migrations | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Lalatendu Mohanty <lmohanty> | ||||||
| Component: | core | Assignee: | Ric Wheeler <rwheeler> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Lalatendu Mohanty <lmohanty> | ||||||
| Severity: | urgent | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 3.5.0 | CC: | gluster-bugs | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | glusterfs-3.5.1 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-07-11 19:17:40 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 851643 [details]
glusterd logs
The rebalance logs are more than 30 MB, hence not able to attach it to bugzilla I also used force option to see if it resolves the issue. But is also returning failures.
gluster v rebalance volume2 start force
gluster v rebalance volume2 status
[root@dhcpxxx-xxx glusterfs]# gluster v rebalance volume2 status
Node Rebalanced-files size scanned failures skipped status run time in secs
--------- ----------- ----------- ----------- ----------- ----------- ------------ --------------
localhost 22 45.0KB 61067 2012 0 in progress 505.00
10.16.159.83 8 18.0KB 65570 6753 0 in progress 504.00
volume rebalance: volume2: success:
From rebalance logs:
[2014-01-17 08:30:42.251998] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir2/TestDir0/TestDir4/TestDir5/TestDir6/a0:
failed to get trusted.distribute.linkinfo key - Success
[2014-01-17 08:30:42.252854] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir2/TestDir0/TestDir4/TestDir5/TestDir6/a1:
failed to get trusted.distribute.linkinfo key - Success
Created attachment 851645 [details]
Rebalance Log
Regarding the I/O I running below script. https://github.com/LalatenduMohanty/utility_scripts/blob/master/CreateDirAndFileTree.pl 1. on fuse, I was creating files of size 1KB to 4 KB i.e. perl CreateDirAndFileTree.pl /mnt/fuse/io 3 1 4 10 10 2. On NFS, ./CreateDirAndFileTree.pl /mnt/nfs/nfs-io/ 3 10000 15000 5 5 and later /CreateDirAndFileTree.pl /mnt/nfs-local/nfs-io/ 100 10 15 5 5 Verified on glusterfs-3.5.0-0.4.beta2.el6.x86_64 and the issue is not found. Hence closing the bug. |
Description of problem: Rebalance status command reports failure and rebalance log file has lots of errors related to file migration. Version-Release number of selected component (if applicable): rpm -qa | grep glusterfs glusterfs-libs-3.5.0-0.1.beta1.el6.x86_64 glusterfs-server-3.5.0-0.1.beta1.el6.x86_64 glusterfs-3.5.0-0.1.beta1.el6.x86_64 glusterfs-cli-3.5.0-0.1.beta1.el6.x86_64 glusterfs-devel-3.5.0-0.1.beta1.el6.x86_64 glusterfs-fuse-3.5.0-0.1.beta1.el6.x86_64 glusterfs-debuginfo-3.5.0-0.1.beta1.el6.x86_64 How reproducible: The first attempt and it failed Steps to Reproduce: 1. Created a dht gluster volume using two bricks. 2. Mounted it using Fuse and NFS on different clients 3. Started I/O on fuse and nfs mount in different directories (i.e. fuse I/O on a directory and nfs I/O on a different directory) 4. Added a new brick from the existing node, using gluster v add-brick volume2 <IP>:<new brick> gluster v rebalance volume2 start gluster v rebalance volume2 status Actual results: gluster v rebalance volume2 status Node Rebalanced-files size scanned failures skipped status run time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 17353 46.2MB 122527 4849 0 completed 1483.00 10.16.159.83 11805 37.7MB 116963 9337 1 completed 1472.00 volume rebalance: volume2: success: Expected results: Additional info: From volume2-rebalance.log 2014-01-17 06:39:00.701252] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir3/TestDir4/a0: failed to get trusted.distribute.linkinfo key - Success [2014-01-17 06:39:00.702088] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir3/TestDir4/a1: failed to get trusted.distribute.linkinfo key - Invalid argument [2014-01-17 06:39:00.756721] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir3/TestDir7/a0: failed to get trusted.distribute.linkinfo key - Success [2014-01-17 06:39:00.757736] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir3/TestDir7/a1: failed to get trusted.distribute.linkinfo key - Invalid argument [2014-01-17 06:39:01.025917] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir4/TestDir2/a1: failed to get trusted.distribute.linkinfo key - Invalid argument [2014-01-17 06:39:02.306725] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir6/TestDir7/a0: failed to get trusted.distribute.linkinfo key - Invalid argument [2014-01-17 06:39:02.307997] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir6/TestDir7/a1: failed to get trusted.distribute.linkinfo key - Invalid argument [2014-01-17 06:39:02.630009] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir7/TestDir5/a0: failed to get trusted.distribute.linkinfo key - Success [2014-01-17 06:39:02.631110] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir1/TestDir7/TestDir5/a1: failed to get trusted.distribute.linkinfo key - Invalid argument [2014-01-17 06:39:03.998813] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir2/TestDir0/TestDir0/a0: failed to get trusted.distribute.linkinfo key - Success [2014-01-17 06:39:03.999937] E [dht-rebalance.c:1289:gf_defrag_migrate_data] 0-volume2-dht: /io/TestDir0/TestDir0/TestDir0/TestDir0/TestDir0/TestDir1/TestDir6/TestDir2/TestDir0/TestDir0/a1: failed to get trusted.distribute.linkinfo key - Invalid argument ############################################################################ grep '\] E \[' etc-glusterfs-glusterd.vol.log [2014-01-17 06:24:30.686127] E [glusterd-utils.c:4112:glusterd_nodesvc_unlink_socket_file] 0-management: Failed to remove /var/run/0473c76ba29d6ca04d7410d1d12afc68.socket error: Permission denied [2014-01-17 06:25:10.414734] E [glusterd-utils.c:8018:glusterd_volume_rebalance_use_rsp_dict] 0-: failed to get index