Bug 803237 - [fa5b0347193f8d1a4b917a2edb338423cb175e66] rmdir hung when graph change is done
Summary: [fa5b0347193f8d1a4b917a2edb338423cb175e66] rmdir hung when graph change is done
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-14 09:52 UTC by Anush Shetty
Modified: 2013-07-24 17:20 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:20:35 UTC
Regression: ---
Mount Type: fuse
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Anush Shetty 2012-03-14 09:52:04 UTC
Description of problem: On a pure replicate setup, rmdir was hung when graph change was done with volume set in a loop.


Version-Release number of selected component (if applicable): upstream


How reproducible: Consistently


Steps to Reproduce:
1. while true; do mkdir -p dot; rm -rf dot; done
2. while true; do gluster volume set test2 performance.write-behind off; sleep 1; gluster volume set test2 performance.write-behind on; sleep 1; done


  
Actual results: rm was hung


Expected results: Graph change shouldn't cause hangs on the client


Additional info:

Client log-

[2012-03-14 15:10:43.036206] D [afr-dir-read.c:135:afr_examine_dir_readdir_cbk] 43-test2-replicate-0: <gfid:cc7894eb-c972-4bbe-b7e8-9a33c81d9
139>: no entries found in test2-client-0
[2012-03-14 15:10:43.036324] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1
[2012-03-14 15:10:43.036511] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1
[2012-03-14 15:10:43.036663] E [fuse-bridge.c:1366:fuse_rmdir_resume] 0-glusterfs-fuse: RMDIR 1 (00000000-0000-0000-0000-000000000000/dot) re
solution failed
[2012-03-14 15:10:43.038464] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 43-test2-replicate-0: <gfid:00000000-0000-0000-0000-0000000000
00>: failed to get the gfid from dict
[2012-03-14 15:10:43.038996] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ]
[2012-03-14 15:10:43.039018] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ]
[2012-03-14 15:10:43.039033] D [afr-self-heal-common.c:753:afr_mark_sources] 43-test2-replicate-0: Number of sources: 0
[2012-03-14 15:10:43.039067] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 43-test2-replicate-0: returning read_child
: 1
[2012-03-14 15:10:43.039081] D [afr-common.c:1275:afr_lookup_select_read_child] 43-test2-replicate-0: Source selected as 1 for <gfid:00000000
-0000-0000-0000-000000000000>
[2012-03-14 15:10:43.039100] D [afr-common.c:1082:afr_lookup_build_response_params] 43-test2-replicate-0: Building lookup response from 1
[2012-03-14 15:10:43.039187] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-03-14 15:10:43.041169] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 43-test2-replicate-0: <gfid:00000000-0000-0000-0000-0000000000
00>: failed to get the gfid from dict
[2012-03-14 15:10:43.041559] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ]
[2012-03-14 15:10:43.041579] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ]
[2012-03-14 15:10:43.041594] D [afr-self-heal-common.c:753:afr_mark_sources] 43-test2-replicate-0: Number of sources: 0
[2012-03-14 15:10:43.041608] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 43-test2-replicate-0: returning read_child
: 0
[2012-03-14 15:10:43.041622] D [afr-common.c:1275:afr_lookup_select_read_child] 43-test2-replicate-0: Source selected as 0 for <gfid:00000000
-0000-0000-0000-000000000000>
[2012-03-14 15:10:43.041639] D [afr-common.c:1082:afr_lookup_build_response_params] 43-test2-replicate-0: Building lookup response from 0
[2012-03-14 15:10:43.042136] D [afr-dir-read.c:135:afr_examine_dir_readdir_cbk] 43-test2-replicate-0: <gfid:cc7894eb-c972-4bbe-b7e8-9a33c81d9
139>: no entries found in test2-client-0
[2012-03-14 15:10:43.042169] D [afr-dir-read.c:135:afr_examine_dir_readdir_cbk] 43-test2-replicate-0: <gfid:cc7894eb-c972-4bbe-b7e8-9a33c81d9
139>: no entries found in test2-client-1
[2012-03-14 15:10:43.042734] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1
[2012-03-14 15:10:43.042962] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1
[2012-03-14 15:10:43.043120] E [fuse-bridge.c:1366:fuse_rmdir_resume] 0-glusterfs-fuse: RMDIR 1 (00000000-0000-0000-0000-000000000000/dot) re
solution failed
[2012-03-14 15:10:43.044941] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 43-test2-replicate-0: <gfid:00000000-0000-0000-0000-0000000000
00>: failed to get the gfid from dict

Comment 1 Raghavendra G 2012-03-14 11:34:37 UTC
statedump of client shows rmdir did not complete from afr. Hence assigning the bug to replicate

Comment 2 Pranith Kumar K 2012-03-18 16:35:40 UTC
I am guessing it to be the same as 803209 for entrylks. There is no way to confirm this as the statedump of bricks, clients is not attached to the bug. I have run the steps above with the fixes to 803209 for ~10 minutes and I observed no hang so closing the bug.

Comment 3 Anush Shetty 2012-03-19 08:32:19 UTC
Verified with 3.3.0qa29


Note You need to log in before you can comment on or make changes to this bug.