Description of problem: On a pure replicate setup, rmdir was hung when graph change was done with volume set in a loop. Version-Release number of selected component (if applicable): upstream How reproducible: Consistently Steps to Reproduce: 1. while true; do mkdir -p dot; rm -rf dot; done 2. while true; do gluster volume set test2 performance.write-behind off; sleep 1; gluster volume set test2 performance.write-behind on; sleep 1; done Actual results: rm was hung Expected results: Graph change shouldn't cause hangs on the client Additional info: Client log- [2012-03-14 15:10:43.036206] D [afr-dir-read.c:135:afr_examine_dir_readdir_cbk] 43-test2-replicate-0: <gfid:cc7894eb-c972-4bbe-b7e8-9a33c81d9 139>: no entries found in test2-client-0 [2012-03-14 15:10:43.036324] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1 [2012-03-14 15:10:43.036511] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1 [2012-03-14 15:10:43.036663] E [fuse-bridge.c:1366:fuse_rmdir_resume] 0-glusterfs-fuse: RMDIR 1 (00000000-0000-0000-0000-000000000000/dot) re solution failed [2012-03-14 15:10:43.038464] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 43-test2-replicate-0: <gfid:00000000-0000-0000-0000-0000000000 00>: failed to get the gfid from dict [2012-03-14 15:10:43.038996] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ] [2012-03-14 15:10:43.039018] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ] [2012-03-14 15:10:43.039033] D [afr-self-heal-common.c:753:afr_mark_sources] 43-test2-replicate-0: Number of sources: 0 [2012-03-14 15:10:43.039067] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 43-test2-replicate-0: returning read_child : 1 [2012-03-14 15:10:43.039081] D [afr-common.c:1275:afr_lookup_select_read_child] 43-test2-replicate-0: Source selected as 1 for <gfid:00000000 -0000-0000-0000-000000000000> [2012-03-14 15:10:43.039100] D [afr-common.c:1082:afr_lookup_build_response_params] 43-test2-replicate-0: Building lookup response from 1 [2012-03-14 15:10:43.039187] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 1, last_index: -1 [2012-03-14 15:10:43.041169] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 43-test2-replicate-0: <gfid:00000000-0000-0000-0000-0000000000 00>: failed to get the gfid from dict [2012-03-14 15:10:43.041559] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ] [2012-03-14 15:10:43.041579] D [afr-self-heal-common.c:148:afr_sh_print_pending_matrix] 43-test2-replicate-0: pending_matrix: [ 0 0 ] [2012-03-14 15:10:43.041594] D [afr-self-heal-common.c:753:afr_mark_sources] 43-test2-replicate-0: Number of sources: 0 [2012-03-14 15:10:43.041608] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 43-test2-replicate-0: returning read_child : 0 [2012-03-14 15:10:43.041622] D [afr-common.c:1275:afr_lookup_select_read_child] 43-test2-replicate-0: Source selected as 0 for <gfid:00000000 -0000-0000-0000-000000000000> [2012-03-14 15:10:43.041639] D [afr-common.c:1082:afr_lookup_build_response_params] 43-test2-replicate-0: Building lookup response from 0 [2012-03-14 15:10:43.042136] D [afr-dir-read.c:135:afr_examine_dir_readdir_cbk] 43-test2-replicate-0: <gfid:cc7894eb-c972-4bbe-b7e8-9a33c81d9 139>: no entries found in test2-client-0 [2012-03-14 15:10:43.042169] D [afr-dir-read.c:135:afr_examine_dir_readdir_cbk] 43-test2-replicate-0: <gfid:cc7894eb-c972-4bbe-b7e8-9a33c81d9 139>: no entries found in test2-client-1 [2012-03-14 15:10:43.042734] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1 [2012-03-14 15:10:43.042962] D [afr-common.c:698:afr_get_call_child] 43-test2-replicate-0: Returning 0, call_child: 0, last_index: -1 [2012-03-14 15:10:43.043120] E [fuse-bridge.c:1366:fuse_rmdir_resume] 0-glusterfs-fuse: RMDIR 1 (00000000-0000-0000-0000-000000000000/dot) re solution failed [2012-03-14 15:10:43.044941] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 43-test2-replicate-0: <gfid:00000000-0000-0000-0000-0000000000 00>: failed to get the gfid from dict
statedump of client shows rmdir did not complete from afr. Hence assigning the bug to replicate
I am guessing it to be the same as 803209 for entrylks. There is no way to confirm this as the statedump of bricks, clients is not attached to the bug. I have run the steps above with the fixes to 803209 for ~10 minutes and I observed no hang so closing the bug.
Verified with 3.3.0qa29