Bug 854637 - [glusterfs-3.3.0qa31] - fileop failed in striped-replicated volume
Summary: [glusterfs-3.3.0qa31] - fileop failed in striped-replicated volume
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Amar Tumballi
QA Contact: M S Vishwanath Bhat
URL:
Whiteboard:
Depends On: 806851
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-05 13:35 UTC by Vidya Sakar
Modified: 2016-06-01 01:56 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 806851
Environment:
Last Closed: 2013-10-03 09:51:07 UTC
Embargoed:


Attachments (Terms of Use)

Description Vidya Sakar 2012-09-05 13:35:56 UTC
+++ This bug was initially created as a clone of Bug #806851 +++

Description of problem:
fileop on nfs mount failed in striped-replicated volume when two of the replicated subvolumes were taken down and brought back-up.

Version-Release number of selected component (if applicable):
glusterfs-3.3.0qa31

How reproducible:
random

Steps to Reproduce:
1. Create and start 2*2 striped-replicated volume.
2. Now do a fuse mount and run fs-perf-test from it. 
3. While fs-per-test is going on take down one sub-volume of replicate translator.
4. Now start fileop from nfs mount (fileop -f 50)
5. After sometime bring back the glusterfsd.
  
Actual results:
fileop failed.

[root@QA-23 nfs]# /opt/qa/tools/fileop -f 50

Fileop:  Working in ., File size is 1,  Output is in Ops/sec. (A=Avg, B=Best, W=Worst)
 .       mkdir   chdir   rmdir  create    open    read   write   close    stat  access   chmod readdir  link    unlink  delete  Total_files
Mkdir failed


Expected results:
fileop should succeed. 

Additional info:

Entries from the nfs log


[2012-03-26 06:13:13.732442] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-0: Connected to 172.17.251.63:24009, attached to remote volume '/data/bricks/hosdu_brick1'.
[2012-03-26 06:13:13.732472] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2012-03-26 06:13:13.733019] I [afr-common.c:3510:afr_notify] 0-hosdu-replicate-0: Subvolume 'hosdu-client-0' came back up; going online.
[2012-03-26 06:13:13.733293] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-0: Server lk version = 1
[2012-03-26 06:13:13.743912] W [client.c:2028:client_rpc_notify] 0-hosdu-client-2: Cancelling the grace timer
[2012-03-26 06:13:13.747209] I [client-handshake.c:1633:select_server_supported_programs] 0-hosdu-client-2: Using Program GlusterFS 3.3.0qa31, Num (1298437), Version (330)
[2012-03-26 06:13:13.747537] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-2: Connected to 172.17.251.65:24009, attached to remote volume '/data/bricks/hosdu_brick3'.
[2012-03-26 06:13:13.747595] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2012-03-26 06:13:13.747708] I [afr-common.c:3510:afr_notify] 0-hosdu-replicate-1: Subvolume 'hosdu-client-2' came back up; going online.
[2012-03-26 06:13:13.748936] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-2: Server lk version = 1
[2012-03-26 06:13:13.757990] W [client.c:2028:client_rpc_notify] 0-hosdu-client-1: Cancelling the grace timer
[2012-03-26 06:13:13.760016] I [client-handshake.c:1633:select_server_supported_programs] 0-hosdu-client-1: Using Program GlusterFS 3.3.0qa31, Num (1298437), Version (330)
[2012-03-26 06:13:13.760336] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-1: Connected to 172.17.251.66:24009, attached to remote volume '/data/bricks/hosdu_brick2'.
[2012-03-26 06:13:13.760367] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2012-03-26 06:13:13.760814] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-1: Server lk version = 1
[2012-03-26 06:13:13.767516] W [client.c:2028:client_rpc_notify] 0-hosdu-client-3: Cancelling the grace timer
[2012-03-26 06:13:13.767811] I [client-handshake.c:1633:select_server_supported_programs] 0-hosdu-client-3: Using Program GlusterFS 3.3.0qa31, Num (1298437), Version (330)
[2012-03-26 06:13:13.768361] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-3: Connected to 172.17.251.64:24009, attached to remote volume '/data/bricks/hosdu_brick4'.
[2012-03-26 06:13:13.768381] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2012-03-26 06:13:13.768883] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-3: Server lk version = 1
[2012-03-26 06:13:13.768955] I [afr-common.c:1860:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-0: added root inode
[2012-03-26 06:13:13.769224] I [afr-common.c:1860:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-1: added root inode
[2012-03-26 06:13:14.809569] I [afr-common.c:1198:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-1: entries are missing in lookup of <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>.
[2012-03-26 06:13:14.809639] I [afr-common.c:1323:afr_launch_self_heal] 0-hosdu-replicate-1: background  meta-data data entry missing-entry gfid self-heal triggered. path: <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>, reason: lookup detec
ted pending operations
[2012-03-26 06:13:14.809855] I [afr-common.c:1198:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-0: entries are missing in lookup of <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>.
[2012-03-26 06:13:14.809886] I [afr-common.c:1323:afr_launch_self_heal] 0-hosdu-replicate-0: background  meta-data data entry missing-entry gfid self-heal triggered. path: <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>, reason: lookup detected pending operations
[2012-03-26 06:13:14.812818] W [client3_1-fops.c:1224:client3_1_inodelk_cbk] 0-hosdu-client-1: remote operation failed: No such file or directory
[2012-03-26 06:13:14.813006] E [afr-self-heal-metadata.c:547:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-hosdu-replicate-0: Non Blocking metadata inodelks failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>.
[2012-03-26 06:13:14.813027] E [afr-self-heal-metadata.c:549:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-hosdu-replicate-0: Metadata self-heal failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>.
[2012-03-26 06:13:14.813536] W [client3_1-fops.c:1301:client3_1_entrylk_cbk] 0-hosdu-client-1: remote operation failed: No such file or directory
[2012-03-26 06:13:14.813661] E [afr-self-heal-entry.c:2375:afr_sh_post_nonblocking_entry_cbk] 0-hosdu-replicate-0: Non Blocking entrylks failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>.
[2012-03-26 06:13:14.813685] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background  meta-data data entry self-heal failed on <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>
[2012-03-26 06:13:14.872288] W [client3_1-fops.c:1224:client3_1_inodelk_cbk] 0-hosdu-client-2: remote operation failed: No such file or directory
[2012-03-26 06:13:14.945495] E [afr-self-heal-metadata.c:547:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-hosdu-replicate-1: Non Blocking metadata inodelks failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>.
[2012-03-26 06:13:14.945523] E [afr-self-heal-metadata.c:549:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-hosdu-replicate-1: Metadata self-heal failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>.
[2012-03-26 06:13:15.055726] W [client3_1-fops.c:1301:client3_1_entrylk_cbk] 0-hosdu-client-2: remote operation failed: No such file or directory
[2012-03-26 06:13:15.359104] E [afr-self-heal-entry.c:2375:afr_sh_post_nonblocking_entry_cbk] 0-hosdu-replicate-1: Non Blocking entrylks failed for <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>.
[2012-03-26 06:13:15.359136] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-1: background  meta-data data entry self-heal failed on <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>
[2012-03-26 06:13:15.360047] I [afr-common.c:1198:afr_detect_self_heal_by_lookup_status] 0-hosdu-replicate-0: entries are missing in lookup of <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42.
[2012-03-26 06:13:15.360071] I [afr-common.c:1323:afr_launch_self_heal] 0-hosdu-replicate-0: background  meta-data data entry missing-entry gfid self-heal triggered. path: <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42, reason: lookup detected pending operations
[2012-03-26 06:13:15.361847] W [client3_1-fops.c:1301:client3_1_entrylk_cbk] 0-hosdu-client-1: remote operation failed: No such file or directory
[2012-03-26 06:13:15.361992] I [afr-self-heal-common.c:1821:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-hosdu-replicate-0: Non blocking entrylks failed.
[2012-03-26 06:13:15.362013] I [afr-self-heal-common.c:917:afr_sh_missing_entries_done] 0-hosdu-replicate-0: split brain found, aborting selfheal of <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42
[2012-03-26 06:13:15.362025] E [afr-self-heal-common.c:2034:afr_self_heal_completion_cbk] 0-hosdu-replicate-0: background  meta-data data entry missing-entry gfid self-heal failed on <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42

[2012-03-26 06:13:15.363657] W [client3_1-fops.c:1301:client3_1_entrylk_cbk] 0-hosdu-client-1: remote operation failed: No such file or directory
[2012-03-26 06:13:15.364276] W [client3_1-fops.c:1301:client3_1_entrylk_cbk] 0-hosdu-client-1: remote operation failed: No such file or directory
[2012-03-26 06:13:15.364446] W [client3_1-fops.c:302:client3_1_mkdir_cbk] 0-hosdu-client-0: remote operation failed: File exists. Path: <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42
[2012-03-26 06:13:15.364480] W [nfs3.c:2728:nfs3svc_mkdir_cbk] 0-nfs: 283207ae: <gfid:e2375d37-eeeb-4f7e-8be8-69c1d29f0689>/fileop_dir_29_1_42 => -1 (File exists)


Entries from other nfs log.


[2012-03-26 06:13:13.513305] W [client.c:2028:client_rpc_notify] 0-hosdu-client-2: Cancelling the grace timer
[2012-03-26 06:13:13.513560] I [client-handshake.c:1633:select_server_supported_programs] 0-hosdu-client-2: Using Program GlusterFS 3.3.0qa31, Num (1298437), Version (330)
[2012-03-26 06:13:13.513876] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-2: Connected to 172.17.251.65:24009, attached to remote volume '/data/bricks/hosdu_brick3'.
[2012-03-26 06:13:13.513894] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2012-03-26 06:13:13.514144] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-2: Server lk version = 1
[2012-03-26 06:13:13.516278] W [client.c:2028:client_rpc_notify] 0-hosdu-client-0: Cancelling the grace timer
[2012-03-26 06:13:13.516668] I [client-handshake.c:1633:select_server_supported_programs] 0-hosdu-client-0: Using Program GlusterFS 3.3.0qa31, Num (1298437), Version (330)
[2012-03-26 06:13:13.518409] I [client-handshake.c:1430:client_setvolume_cbk] 0-hosdu-client-0: Connected to 172.17.251.63:24009, attached to remote volume '/data/bricks/hosdu_brick1'.
[2012-03-26 06:13:13.518432] I [client-handshake.c:1442:client_setvolume_cbk] 0-hosdu-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2012-03-26 06:13:13.519015] I [afr-common.c:1860:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-0: added root inode
[2012-03-26 06:13:13.519054] I [afr-common.c:1860:afr_set_root_inode_on_first_lookup] 0-hosdu-replicate-1: added root inode
[2012-03-26 06:13:13.519109] I [afr-common.c:1323:afr_launch_self_heal] 0-hosdu-replicate-1: background  entry self-heal triggered. path: /, reason: lookup detected pending operations
[2012-03-26 06:13:13.519771] I [client-handshake.c:456:client_set_lk_version_cbk] 0-hosdu-client-0: Server lk version = 1
[2012-03-26 06:13:15.166140] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_0 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.182348] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_1 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.182465] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_2 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.182631] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_3 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.182789] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_4 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.182967] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_5 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.183115] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_6 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.183291] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_7 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.183448] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_8 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.183604] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_9 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.183762] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_10 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.183963] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_11 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.184034] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_12 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.184182] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_13 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.184368] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_14 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.188309] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_15 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.190265] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_16 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.192289] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_17 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.194287] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_18 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.194448] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_19 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.194601] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_20 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.194759] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_21 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.196266] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_22 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.198270] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_23 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.198411] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_24 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.198599] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_25 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.198735] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_26 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.200367] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_27 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.200437] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_28 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.212542] E [afr-self-heal-common.c:1007:afr_sh_common_lookup_resp_handler] 0-hosdu-replicate-1: path /fileop_L1_29 on subvolume hosdu-client-2 => -1 (No such file or directory)
[2012-03-26 06:13:15.399903] I [afr-self-heal-common.c:2037:afr_self_heal_completion_cbk] 0-hosdu-replicate-1: background  entry self-heal completed on /


I'm archiving all the logs.

--- Additional comment from kaushal on 2012-06-05 03:05:04 EDT ---

Checked on the release-3.3 branch at 281c79c.
Couldn't reproduce this, fileop succeeds, when following the steps given.

@MS can you confirm?

Comment 2 M S Vishwanath Bhat 2013-01-17 11:34:52 UTC
The verification of this bug is being blocked by https://bugzilla.redhat.com/show_bug.cgi?id=896462 The issue 896462 is different from this one. So opened a new bug.

Comment 5 Sudhir D 2013-07-30 05:53:44 UTC
removing 2.1 as stripe is not slated for this release.

Comment 6 Vivek Agarwal 2013-10-03 09:51:07 UTC
Per discussion with Sudhir/Amar, closing it upstream


Note You need to log in before you can comment on or make changes to this bug.