Bug 803237

Summary: [fa5b0347193f8d1a4b917a2edb338423cb175e66] rmdir hung when graph change is done
Product: [Community] GlusterFS Reporter: Anush Shetty <ashetty>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, rgowdapp
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:20:35 UTC Type: ---
Regression: --- Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    

Description Anush Shetty 2012-03-14 09:52:04 UTC
Description of problem: On a pure replicate setup, rmdir was hung when graph change was done with volume set in a loop.


Version-Release number of selected component (if applicable): upstream


How reproducible: Consistently


Steps to Reproduce:
1. while true; do mkdir -p dot; rm -rf dot; done
2. while true; do gluster volume set test2 performance.write-behind off; sleep 1; gluster volume set test2 performance.write-behind on; sleep 1; done


  
Actual results: rm was hung


Expected results: Graph change shouldn't cause hangs on the client


Additional info:

Client log-

[2012-03-14 15:10:43.036206] D [afr-dir-read.c:135:afr_examine_dir_readdir_cbk] 43-test2-replicate-0: <gfid:cc7894eb-c972-4bbe-b7e8-9a33c81d9
139>: no entries found in test2-client-0
[2012-03-14 15:10:43.036324] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1
[2012-03-14 15:10:43.036511] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1
[2012-03-14 15:10:43.036663] E [fuse-bridge.c:1366:fuse_rmdir_resume] 0-glusterfs-fuse: RMDIR 1 (00000000-0000-0000-0000-000000000000/dot) re
solution failed
[2012-03-14 15:10:43.038464] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 43-test2-replicate-0: <gfid:00000000-0000-0000-0000-0000000000
00>: failed to get the gfid from dict
[2012-03-14 15:10:43.038996] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ]
[2012-03-14 15:10:43.039018] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ]
[2012-03-14 15:10:43.039033] D [afr-self-heal-common.c:753:afr_mark_sources] 43-test2-replicate-0: Number of sources: 0
[2012-03-14 15:10:43.039067] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 43-test2-replicate-0: returning read_child
: 1
[2012-03-14 15:10:43.039081] D [afr-common.c:1275:afr_lookup_select_read_child] 43-test2-replicate-0: Source selected as 1 for <gfid:00000000
-0000-0000-0000-000000000000>
[2012-03-14 15:10:43.039100] D [afr-common.c:1082:afr_lookup_build_response_params] 43-test2-replicate-0: Building lookup response from 1
[2012-03-14 15:10:43.039187] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 1, last_index: -1
[2012-03-14 15:10:43.041169] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 43-test2-replicate-0: <gfid:00000000-0000-0000-0000-0000000000
00>: failed to get the gfid from dict
[2012-03-14 15:10:43.041559] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ]
[2012-03-14 15:10:43.041579] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ]
[2012-03-14 15:10:43.041594] D [afr-self-heal-common.c:753:afr_mark_sources] 43-test2-replicate-0: Number of sources: 0
[2012-03-14 15:10:43.041608] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 43-test2-replicate-0: returning read_child
: 0
[2012-03-14 15:10:43.041622] D [afr-common.c:1275:afr_lookup_select_read_child] 43-test2-replicate-0: Source selected as 0 for <gfid:00000000
-0000-0000-0000-000000000000>
[2012-03-14 15:10:43.041639] D [afr-common.c:1082:afr_lookup_build_response_params] 43-test2-replicate-0: Building lookup response from 0
[2012-03-14 15:10:43.042136] D [afr-dir-read.c:135:afr_examine_dir_readdir_cbk] 43-test2-replicate-0: <gfid:cc7894eb-c972-4bbe-b7e8-9a33c81d9
139>: no entries found in test2-client-0
[2012-03-14 15:10:43.042169] D [afr-dir-read.c:135:afr_examine_dir_readdir_cbk] 43-test2-replicate-0: <gfid:cc7894eb-c972-4bbe-b7e8-9a33c81d9
139>: no entries found in test2-client-1
[2012-03-14 15:10:43.042734] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1
[2012-03-14 15:10:43.042962] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1
[2012-03-14 15:10:43.043120] E [fuse-bridge.c:1366:fuse_rmdir_resume] 0-glusterfs-fuse: RMDIR 1 (00000000-0000-0000-0000-000000000000/dot) re
solution failed
[2012-03-14 15:10:43.044941] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 43-test2-replicate-0: <gfid:00000000-0000-0000-0000-0000000000
00>: failed to get the gfid from dict

Comment 1 Raghavendra G 2012-03-14 11:34:37 UTC
statedump of client shows rmdir did not complete from afr. Hence assigning the bug to replicate

Comment 2 Pranith Kumar K 2012-03-18 16:35:40 UTC
I am guessing it to be the same as 803209 for entrylks. There is no way to confirm this as the statedump of bricks, clients is not attached to the bug. I have run the steps above with the fixes to 803209 for ~10 minutes and I observed no hang so closing the bug.

Comment 3 Anush Shetty 2012-03-19 08:32:19 UTC
Verified with 3.3.0qa29