Description of problem: fileop on nfs mount failed in striped-replicated volume when two of the replicated subvolumes were taken down and brought back-up. Version-Release number of selected component (if applicable): glusterfs-3.3.0qa31 How reproducible: random Steps to Reproduce: 1. Create and start 2*2 striped-replicated volume. 2. Now do a fuse mount and run fs-perf-test from it. 3. While fs-per-test is going on take down one sub-volume of replicate translator. 4. Now start fileop from nfs mount (fileop -f 50) 5. After sometime bring back the glusterfsd. Actual results: fileop failed. [root@QA-23 nfs]# /opt/qa/tools/fileop -f 50 Fileop: Working in ., File size is 1, Output is in Ops/sec. (A=Avg, B=Best, W=Worst) . mkdir chdir rmdir create open read write close stat access chmod readdir link unlink delete Total_files Mkdir failed Expected results: fileop should succeed. Additional info: Entries from the nfs log [2012-03-26 06:13:13.732442] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-0: Connected to 172.17.251.63:24009, attached to remote volume '/data/bricks/hosdu_brick1'. [2012-03-26 06:13:13.732472] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-0: Server and Client lk-version numbers are not same, reopening the fds [2012-03-26 06:13:13.733019] I [afr-common.c:3510:afr_notify] 0-hosdu-replicate-0: Subvolume 'hosdu-client-0' came back up; going online. [2012-03-26 06:13:13.733293] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-0: Server lk version = 1 [2012-03-26 06:13:13.743912] W [client.c:2028:client_rpc_notify] 0-hosdu-client-2: Cancelling the grace timer [2012-03-26 06:13:13.747209] I [client-handshake.c:1633:select_server_supported_programs] 0-hosdu-client-2: Using Program GlusterFS 3.3.0qa31, Num (1298437), Version (330) [2012-03-26 06:13:13.747537] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-2: Connected to 172.17.251.65:24009, attached to remote volume '/data/bricks/hosdu_brick3'. [2012-03-26 06:13:13.747595] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-2: Server and Client lk-version numbers are not same, reopening the fds [2012-03-26 06:13:13.747708] I [afr-common.c:3510:afr_notify] 0-hosdu-replicate-1: Subvolume 'hosdu-client-2' came back up; going online. [2012-03-26 06:13:13.748936] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-2: Server lk version = 1 [2012-03-26 06:13:13.757990] W [client.c:2028:client_rpc_notify] 0-hosdu-client-1: Cancelling the grace timer [2012-03-26 06:13:13.760016] I [client-handshake.c:1633:select_server_supported_programs] 0-hosdu-client-1: Using Program GlusterFS 3.3.0qa31, Num (1298437), Version (330) [2012-03-26 06:13:13.760336] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-1: Connected to 172.17.251.66:24009, attached to remote volume '/data/bricks/hosdu_brick2'. [2012-03-26 06:13:13.760367] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-1: Server and Client lk-version numbers are not same, reopening the fds [2012-03-26 06:13:13.760814] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-1: Server lk version = 1 [2012-03-26 06:13:13.767516] W [client.c:2028:client_rpc_notify] 0-hosdu-client-3: Cancelling the grace timer [2012-03-26 06:13:13.767811] I [client-handshake.c:1633:select_server_supported_programs] 0-hosdu-client-3: Using Program GlusterFS 3.3.0qa31, Num (1298437), Version (330) [2012-03-26 06:13:13.768361] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-3: Connected to 172.17.251.64:24009, attached to remote volume '/data/bricks/hosdu_brick4'. [2012-03-26 06:13:13.768381] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-3: Server and Client lk-version numbers are not same, reopening the fds [2012-03-26 06:13:13.768883] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-3: Server lk version = 1 [2012-03-26 06:13:13.768955] I [afr-common.c:1860:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-0: added root inode [2012-03-26 06:13:13.769224] I [afr-common.c:1860:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-1: added root inode [2012-03-26 06:13:14.809569] I [afr-common.c:1198:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-1: entries are missing in lookup of <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>. [2012-03-26 06:13:14.809639] I [afr-common.c:1323:afr_launch_self_heal] 0-hosdu-replicate-1: background meta-data data entry missing-entry gfid self-heal triggered. path: <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>, reason: lookup detec ted pending operations [2012-03-26 06:13:14.809855] I [afr-common.c:1198:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-0: entries are missing in lookup of <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>. [2012-03-26 06:13:14.809886] I [afr-common.c:1323:afr_launch_self_heal] 0-hosdu-replicate-0: background meta-data data entry missing-entry gfid self-heal triggered. path: <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>, reason: lookup detected pending operations [2012-03-26 06:13:14.812818] W [client3_1-fops.c:1224:client3_1_inodelk_cbk] 0-hosdu-client-1: remote operation failed: No such file or directory [2012-03-26 06:13:14.813006] E [afr-self-heal-metadata.c:547:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-hosdu-replicate-0: Non Blocking metadata inodelks failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>. [2012-03-26 06:13:14.813027] E [afr-self-heal-metadata.c:549:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-hosdu-replicate-0: Metadata self-heal failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>. [2012-03-26 06:13:14.813536] W [client3_1-fops.c:1301:client3_1_entrylk_cbk] 0-hosdu-client-1: remote operation failed: No such file or directory [2012-03-26 06:13:14.813661] E [afr-self-heal-entry.c:2375:afr_sh_post_nonblocking_entry_cbk] 0-hosdu-replicate-0: Non Blocking entrylks failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>. [2012-03-26 06:13:14.813685] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background meta-data data entry self-heal failed on <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689> [2012-03-26 06:13:14.872288] W [client3_1-fops.c:1224:client3_1_inodelk_cbk] 0-hosdu-client-2: remote operation failed: No such file or directory [2012-03-26 06:13:14.945495] E [afr-self-heal-metadata.c:547:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-hosdu-replicate-1: Non Blocking metadata inodelks failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>. [2012-03-26 06:13:14.945523] E [afr-self-heal-metadata.c:549:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-hosdu-replicate-1: Metadata self-heal failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>. [2012-03-26 06:13:15.055726] W [client3_1-fops.c:1301:client3_1_entrylk_cbk] 0-hosdu-client-2: remote operation failed: No such file or directory [2012-03-26 06:13:15.359104] E [afr-self-heal-entry.c:2375:afr_sh_post_nonblocking_entry_cbk] 0-hosdu-replicate-1: Non Blocking entrylks failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>. [2012-03-26 06:13:15.359136] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-1: background meta-data data entry self-heal failed on <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689> [2012-03-26 06:13:15.360047] I [afr-common.c:1198:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-0: entries are missing in lookup of <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42. [2012-03-26 06:13:15.360071] I [afr-common.c:1323:afr_launch_self_heal] 0-hosdu-replicate-0: background meta-data data entry missing-entry gfid self-heal triggered. path: <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42, reason: lookup detected pending operations [2012-03-26 06:13:15.361847] W [client3_1-fops.c:1301:client3_1_entrylk_cbk] 0-hosdu-client-1: remote operation failed: No such file or directory [2012-03-26 06:13:15.361992] I [afr-self-heal-common.c:1821:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-hosdu-replicate-0: Non blocking entrylks failed. [2012-03-26 06:13:15.362013] I [afr-self-heal-common.c:917:afr_sh_missing_entries_done] 0-hosdu-replicate-0: split brain found, aborting selfheal of <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42 [2012-03-26 06:13:15.362025] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background meta-data data entry missing-entry gfid self-heal failed on <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42 [2012-03-26 06:13:15.363657] W [client3_1-fops.c:1301:client3_1_entrylk_cbk] 0-hosdu-client-1: remote operation failed: No such file or directory [2012-03-26 06:13:15.364276] W [client3_1-fops.c:1301:client3_1_entrylk_cbk] 0-hosdu-client-1: remote operation failed: No such file or directory [2012-03-26 06:13:15.364446] W [client3_1-fops.c:302:client3_1_mkdir_cbk] 0-hosdu-client-0: remote operation failed: File exists. Path: <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42 [2012-03-26 06:13:15.364480] W [nfs3.c:2728:nfs3svc_mkdir_cbk] 0-nfs: 283207ae: <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42 => -1 (File exists) Entries from other nfs log. [2012-03-26 06:13:13.513305] W [client.c:2028:client_rpc_notify] 0-hosdu-client-2: Cancelling the grace timer [2012-03-26 06:13:13.513560] I [client-handshake.c:1633:select_server_supported_programs] 0-hosdu-client-2: Using Program GlusterFS 3.3.0qa31, Num (1298437), Version (330) [2012-03-26 06:13:13.513876] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-2: Connected to 172.17.251.65:24009, attached to remote volume '/data/bricks/hosdu_brick3'. [2012-03-26 06:13:13.513894] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-2: Server and Client lk-version numbers are not same, reopening the fds [2012-03-26 06:13:13.514144] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-2: Server lk version = 1 [2012-03-26 06:13:13.516278] W [client.c:2028:client_rpc_notify] 0-hosdu-client-0: Cancelling the grace timer [2012-03-26 06:13:13.516668] I [client-handshake.c:1633:select_server_supported_programs] 0-hosdu-client-0: Using Program GlusterFS 3.3.0qa31, Num (1298437), Version (330) [2012-03-26 06:13:13.518409] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-0: Connected to 172.17.251.63:24009, attached to remote volume '/data/bricks/hosdu_brick1'. [2012-03-26 06:13:13.518432] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-0: Server and Client lk-version numbers are not same, reopening the fds [2012-03-26 06:13:13.519015] I [afr-common.c:1860:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-0: added root inode [2012-03-26 06:13:13.519054] I [afr-common.c:1860:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-1: added root inode [2012-03-26 06:13:13.519109] I [afr-common.c:1323:afr_launch_self_heal] 0-hosdu-replicate-1: background entry self-heal triggered. path: /, reason: lookup detected pending operations [2012-03-26 06:13:13.519771] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-0: Server lk version = 1 [2012-03-26 06:13:15.166140] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_0 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.182348] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_1 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.182465] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_2 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.182631] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_3 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.182789] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_4 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.182967] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_5 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.183115] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_6 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.183291] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_7 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.183448] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_8 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.183604] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_9 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.183762] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_10 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.183963] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_11 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.184034] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_12 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.184182] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_13 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.184368] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_14 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.188309] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_15 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.190265] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_16 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.192289] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_17 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.194287] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_18 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.194448] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_19 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.194601] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_20 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.194759] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_21 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.196266] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_22 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.198270] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_23 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.198411] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_24 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.198599] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_25 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.198735] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_26 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.200367] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_27 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.200437] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_28 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.212542] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_29 on subvolume hosdu-client-2 => -1 (No such file or directory) [2012-03-26 06:13:15.399903] I [afr-self-heal-common.c:2037:afr_self_heal_completion_cbk] 0-hosdu-replicate-1: background entry self-heal completed on / I'm archiving all the logs.
Checked on the release-3.3 branch at 281c79c. Couldn't reproduce this, fileop succeeds, when following the steps given. @MS can you confirm?
Closing this as it is no longer reproducible. Please feel free to re-open if observed again.