Bug 1064481 - DHT: REBALANCE - Rebalance crawl on a directory will never visit peer directories if fix-layout fails for any of the descendant directories
Summary: DHT: REBALANCE - Rebalance crawl on a directory will never visit peer directo...
Keywords:
Status: CLOSED DUPLICATE of bug 1237059
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: 2.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Nithya Balachandran
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1243815
TreeView+ depends on / blocked
 
Reported: 2014-02-12 16:54 UTC by shylesh
Modified: 2015-11-27 12:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1243815 (view as bug list)
Environment:
Last Closed: 2015-11-27 12:06:59 UTC
Embargoed:


Attachments (Terms of Use)

Description shylesh 2014-02-12 16:54:22 UTC
Description of problem:
While rebalance crawls in a depth-first fashion , if for a directory fix-layout fails on any of its descendants then rebalance will exit and never visits the remaining directories at higher levels (peers of the directory in question).

Version-Release number of selected component (if applicable):
3.4.0.59rhs-1.el6rhs.x86_64

How reproducible:
always

Steps to Reproduce:
1.created a 3 brick distribute volume
2.create deep directories say  level 100 and directories and files in each level

for i in {1..100}
do
 mkdir $i
 cd $i
 for j in {1..100}
 do
   mkdir $j
   touch file.$j
 done
done

4.added 2 more bricks and ran rebalance

5. while migration is in progress say crawling is at directory depth 50 (this can be found by monitoring rebalance log) from the mount point delete the directory 50 

rm -rf 50/

6.after some time rebalance got some failures saying fix-layout failed for some directory .



Actual results:
Once the fix-layout fails for directory rebalance process will exit and never bothered about processing the remaining directories at higher level since it does depth first crawl there may be so many directories at the top level which were never visited hence no data migration happens from those directories

Expected results:
Once fix-layout fails for any directory rebalance should continue to fix other directories.

Additional info:
Volume Name: dht1
Type: Distribute
Volume ID: c0abd5ee-2f93-4de8-a287-178fde6e2283
Status: Started
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: 10.70.35.187:/rhs/brick1/d1
Brick2: 10.70.35.187:/rhs/brick1/d2
Brick3: 10.70.35.228:/rhs/brick1/d1
Brick4: 10.70.35.228:/rhs/brick1/212
Brick5: 10.70.35.212:/rhs/brick1/d1


cluster info
----------------
10.70.35.187
10.70.35.212
10.70.35.228


rebalance logs
--------------
[2014-02-12 09:04:58.185772] I [dht-rebalance.c:1121:gf_defrag_migrate_data] 0-dht1-dht: migrate data called on /mv7/8/24/25/27/28/29/30/31/32/34/35
/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events
[2014-02-12 09:04:58.212112] E [dht-rebalance.c:1217:gf_defrag_migrate_data] 0-dht1-dht: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/4
4/45/46/47/48/etc8/libreport/events/report_Kerneloops.xml lookup failed
[2014-02-12 09:04:58.244667] I [dht-common.c:1119:dht_lookup_linkfile_cbk] 0-dht1-dht: lookup of /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41
/42/43/44/45/46/47/48/etc8/libreport/events/report_Mailx.xml on dht1-client-0 (following linkfile) failed (No such file or directory)
[2014-02-12 09:04:58.245925] E [dht-rebalance.c:1217:gf_defrag_migrate_data] 0-dht1-dht: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/4
4/45/46/47/48/etc8/libreport/events/report_Mailx.xml lookup failed
[2014-02-12 09:04:58.249012] I [dht-rebalance.c:1345:gf_defrag_migrate_data] 0-dht1-dht: Migration operation on dir /mv7/8/24/25/27/28/29/30/31/32/3
4/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events took 0.06 secs
[2014-02-12 09:04:58.249687] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-dht1-client-4: remote operation failed: No such file or directory. P
ath: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events (358296be-cf50-4722-8127-87ca87d53e3b)
[2014-02-12 09:04:58.250141] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-dht1-client-0: remote operation failed: No such file or directory. P
ath: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events (358296be-cf50-4722-8127-87ca87d53e3b)
[2014-02-12 09:04:58.250195] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-dht1-client-3: remote operation failed: No such file or directory. P
ath: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events (358296be-cf50-4722-8127-87ca87d53e3b)
[2014-02-12 09:04:58.250247] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-dht1-client-1: remote operation failed: No such file or directory. P
ath: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events (358296be-cf50-4722-8127-87ca87d53e3b)
[2014-02-12 09:04:58.291056] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-dht1-client-2: remote operation failed: No such file or directory. P
ath: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events (358296be-cf50-4722-8127-87ca87d53e3b)
[2014-02-12 09:04:58.291136] E [dht-rebalance.c:1407:gf_defrag_fix_layout] 0-dht1-dht: Failed to open dir /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38
/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events
[2014-02-12 09:04:58.291158] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37
/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events
[2014-02-12 09:04:58.291341] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37
/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport
[2014-02-12 09:04:58.291519] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37
/38/39/40/41/42/43/44/45/46/47/48/etc8
[2014-02-12 09:04:58.291847] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37
/38/39/40/41/42/43/44/45/46/47/48
[2014-02-12 09:04:58.292138] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37
/38/39/40/41/42/43/44/45/46/47
[2014-02-12 09:04:58.292315] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46
[2014-02-12 09:04:58.292573] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45
[2014-02-12 09:04:58.292707] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44
[2014-02-12 09:04:58.293231] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43
[2014-02-12 09:04:58.293455] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42
[2014-02-12 09:04:58.293836] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41
[2014-02-12 09:04:58.293914] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40
[2014-02-12 09:04:58.294245] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39
[2014-02-12 09:04:58.294444] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38
[2014-02-12 09:04:58.294859] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37
[2014-02-12 09:04:58.295116] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35
[2014-02-12 09:04:58.295419] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34
[2014-02-12 09:04:58.295672] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32
[2014-02-12 09:04:58.296050] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31
[2014-02-12 09:04:58.296328] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30
[2014-02-12 09:04:58.296598] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29
[2014-02-12 09:04:58.298708] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28
[2014-02-12 09:04:58.299179] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27
[2014-02-12 09:04:58.299522] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25
[2014-02-12 09:04:58.300027] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24
[2014-02-12 09:04:58.300687] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8
[2014-02-12 09:04:58.300908] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7
[2014-02-12 09:04:58.301004] I [dht-rebalance.c:1783:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 5084.00 secs
[2014-02-12 09:04:58.301015] I [dht-rebalance.c:1786:gf_defrag_status_get] 0-glusterfs: Files migrated: 52862, size: 1036401138, lookups: 172572, failures: 27, skipped: 3
[2014-02-12 09:04:58.366534] W [glusterfsd.c:1099:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3c312e894d] (-->/lib64/libpthread.so.0() [0x3c31607851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x4052fd]))) 0-: received signum (15), shutting down



attached the sosreports

Comment 3 Susant Kumar Palai 2015-11-27 12:06:59 UTC

*** This bug has been marked as a duplicate of bug 1237059 ***


Note You need to log in before you can comment on or make changes to this bug.