+++ This bug was initially created as a clone of Bug #1064481 +++ Description of problem: While rebalance crawls in a depth-first fashion , if for a directory fix-layout fails on any of its descendants then rebalance will exit and never visits the remaining directories at higher levels (peers of the directory in question). Version-Release number of selected component (if applicable): 3.4.0.59rhs-1.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1.created a 3 brick distribute volume 2.create deep directories say level 100 and directories and files in each level for i in {1..100} do mkdir $i cd $i for j in {1..100} do mkdir $j touch file.$j done done 4.added 2 more bricks and ran rebalance 5. while migration is in progress say crawling is at directory depth 50 (this can be found by monitoring rebalance log) from the mount point delete the directory 50 rm -rf 50/ 6.after some time rebalance got some failures saying fix-layout failed for some directory . Actual results: Once the fix-layout fails for directory rebalance process will exit and never bothered about processing the remaining directories at higher level since it does depth first crawl there may be so many directories at the top level which were never visited hence no data migration happens from those directories Expected results: Once fix-layout fails for any directory rebalance should continue to fix other directories. Additional info: Volume Name: dht1 Type: Distribute Volume ID: c0abd5ee-2f93-4de8-a287-178fde6e2283 Status: Started Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: 10.70.35.187:/rhs/brick1/d1 Brick2: 10.70.35.187:/rhs/brick1/d2 Brick3: 10.70.35.228:/rhs/brick1/d1 Brick4: 10.70.35.228:/rhs/brick1/212 Brick5: 10.70.35.212:/rhs/brick1/d1 cluster info ---------------- 10.70.35.187 10.70.35.212 10.70.35.228 rebalance logs -------------- [2014-02-12 09:04:58.185772] I [dht-rebalance.c:1121:gf_defrag_migrate_data] 0-dht1-dht: migrate data called on /mv7/8/24/25/27/28/29/30/31/32/34/35 /37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events [2014-02-12 09:04:58.212112] E [dht-rebalance.c:1217:gf_defrag_migrate_data] 0-dht1-dht: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/4 4/45/46/47/48/etc8/libreport/events/report_Kerneloops.xml lookup failed [2014-02-12 09:04:58.244667] I [dht-common.c:1119:dht_lookup_linkfile_cbk] 0-dht1-dht: lookup of /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41 /42/43/44/45/46/47/48/etc8/libreport/events/report_Mailx.xml on dht1-client-0 (following linkfile) failed (No such file or directory) [2014-02-12 09:04:58.245925] E [dht-rebalance.c:1217:gf_defrag_migrate_data] 0-dht1-dht: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/4 4/45/46/47/48/etc8/libreport/events/report_Mailx.xml lookup failed [2014-02-12 09:04:58.249012] I [dht-rebalance.c:1345:gf_defrag_migrate_data] 0-dht1-dht: Migration operation on dir /mv7/8/24/25/27/28/29/30/31/32/3 4/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events took 0.06 secs [2014-02-12 09:04:58.249687] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-dht1-client-4: remote operation failed: No such file or directory. P ath: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events (358296be-cf50-4722-8127-87ca87d53e3b) [2014-02-12 09:04:58.250141] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-dht1-client-0: remote operation failed: No such file or directory. P ath: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events (358296be-cf50-4722-8127-87ca87d53e3b) [2014-02-12 09:04:58.250195] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-dht1-client-3: remote operation failed: No such file or directory. P ath: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events (358296be-cf50-4722-8127-87ca87d53e3b) [2014-02-12 09:04:58.250247] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-dht1-client-1: remote operation failed: No such file or directory. P ath: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events (358296be-cf50-4722-8127-87ca87d53e3b) [2014-02-12 09:04:58.291056] W [client-rpc-fops.c:2523:client3_3_opendir_cbk] 0-dht1-client-2: remote operation failed: No such file or directory. P ath: /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events (358296be-cf50-4722-8127-87ca87d53e3b) [2014-02-12 09:04:58.291136] E [dht-rebalance.c:1407:gf_defrag_fix_layout] 0-dht1-dht: Failed to open dir /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38 /39/40/41/42/43/44/45/46/47/48/etc8/libreport/events [2014-02-12 09:04:58.291158] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37 /38/39/40/41/42/43/44/45/46/47/48/etc8/libreport/events [2014-02-12 09:04:58.291341] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37 /38/39/40/41/42/43/44/45/46/47/48/etc8/libreport [2014-02-12 09:04:58.291519] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37 /38/39/40/41/42/43/44/45/46/47/48/etc8 [2014-02-12 09:04:58.291847] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37 /38/39/40/41/42/43/44/45/46/47/48 [2014-02-12 09:04:58.292138] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37 /38/39/40/41/42/43/44/45/46/47 [2014-02-12 09:04:58.292315] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45/46 [2014-02-12 09:04:58.292573] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44/45 [2014-02-12 09:04:58.292707] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43/44 [2014-02-12 09:04:58.293231] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42/43 [2014-02-12 09:04:58.293455] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41/42 [2014-02-12 09:04:58.293836] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40/41 [2014-02-12 09:04:58.293914] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39/40 [2014-02-12 09:04:58.294245] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38/39 [2014-02-12 09:04:58.294444] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37/38 [2014-02-12 09:04:58.294859] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35/37 [2014-02-12 09:04:58.295116] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34/35 [2014-02-12 09:04:58.295419] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32/34 [2014-02-12 09:04:58.295672] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31/32 [2014-02-12 09:04:58.296050] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30/31 [2014-02-12 09:04:58.296328] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29/30 [2014-02-12 09:04:58.296598] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28/29 [2014-02-12 09:04:58.298708] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27/28 [2014-02-12 09:04:58.299179] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25/27 [2014-02-12 09:04:58.299522] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24/25 [2014-02-12 09:04:58.300027] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8/24 [2014-02-12 09:04:58.300687] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7/8 [2014-02-12 09:04:58.300908] E [dht-rebalance.c:1498:gf_defrag_fix_layout] 0-dht1-dht: Fix layout failed for /mv7 [2014-02-12 09:04:58.301004] I [dht-rebalance.c:1783:gf_defrag_status_get] 0-glusterfs: Rebalance is completed. Time taken is 5084.00 secs [2014-02-12 09:04:58.301015] I [dht-rebalance.c:1786:gf_defrag_status_get] 0-glusterfs: Files migrated: 52862, size: 1036401138, lookups: 172572, failures: 27, skipped: 3 [2014-02-12 09:04:58.366534] W [glusterfsd.c:1099:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3c312e894d] (-->/lib64/libpthread.so.0() [0x3c31607851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x4052fd]))) 0-: received signum (15), shutting down
REVIEW: http://review.gluster.org/11697 (dht: Continue rebalance crawl if fix-layout fails for any one descendant directory) posted (#1) for review on master by Sakshi Bansal (sabansal)
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.