Description of problem: disperse volume, after add-brick success, postmark create file failed by NFS service. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1.Create disperse volume: Volume Name: test Type: Disperse Volume ID: 5a7292e6-b56d-438a-a544-b7a21512c9fc Status: Started Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: node-1:/disk1 Brick2: node-2:/disk1 Brick3: node-3:/disk1 Options Reconfigured: performance.readdir-ahead: on 2.then mount -t nfs IP:/test /mnt/, run postmark. postmark'conf: set size 10240000 10240000 set number 200 set location /mnt/test set subdirectories 20 set read 65536 set write 65536 set transactions 1 run quit 3.gluster vol add-brick test node-{1..3}:/disk2 node-{1..3}:/disk3 force; sleep 1; gluster vol rebalance test fix-layout start; Actual results: Some file create failed.lisk this: PostMark v1.51 : 8/14/01 Reading configuration from file 'conf' Creating subdirectories...Done Creating files...Error: cannot open '/mnt/test/s1/53' for writing Expected results: Create and write all are OK. Additional info:
72: subvolumes test 73: end-volume 74: +------------------------------------------------------------------------------+ [2016-06-18 03:22:01.264991] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:01.265524] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-0: Connected to test-client-0, attached to remote volume '/disk1'. [2016-06-18 03:22:01.265550] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-0: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:01.265813] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-0: Server lk version = 1 [2016-06-18 03:22:01.444191] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-1: changing port to 49386 (from 0) [2016-06-18 03:22:01.456021] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-2: changing port to 49383 (from 0) [2016-06-18 03:22:01.462156] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:01.479843] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:01.479897] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-1: Connected to test-client-1, attached to remote volume '/disk1'. [2016-06-18 03:22:01.479934] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-1: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:01.488189] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-1: Server lk version = 1 [2016-06-18 03:22:01.491425] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-2: Connected to test-client-2, attached to remote volume '/disk1'. [2016-06-18 03:22:01.491456] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-2: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:01.491541] I [MSGID: 122061] [ec.c:313:ec_up] 0-test-disperse-0: Going UP [2016-06-18 03:22:01.498240] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-2: Server lk version = 1 [2016-06-18 03:22:01.505906] I [MSGID: 109063] [dht-layout.c:718:dht_layout_normalize] 0-test-dht: Found anomalies in / (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0 [2016-06-18 03:22:01.515154] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of / with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:04.504321] W [rpcsvc.c:278:rpcsvc_program_actor] 0-rpc-service: RPC program version not available (req 100003 4) for 10.10.21.50:933 [2016-06-18 03:22:04.504364] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2016-06-18 03:22:04.573785] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.091360] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s0 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.101556] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s1 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.109635] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s2 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.118082] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s3 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.127005] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s4 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.136066] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s5 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.145362] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s6 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.153774] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s7 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.162330] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s8 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.171154] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s9 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.179886] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s10 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.188966] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s11 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.197871] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s12 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.205977] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s13 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.214266] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s14 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.222169] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s15 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.230658] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s16 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.239091] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s17 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.247742] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s18 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:09.257356] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s19 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:22.606438] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.5 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/0088583f7375696dfe2eba504070538b.socket) [2016-06-18 03:22:22.614536] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2016-06-18 03:22:23.606427] I [rpcsvc.c:2196:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 16 [2016-06-18 03:22:23.615095] W [MSGID: 112153] [mount3.c:3936:mnt3svc_init] 0-nfs-mount: Exports auth has been disabled! [2016-06-18 03:22:23.635291] I [rpc-drc.c:694:rpcsvc_drc_init] 0-rpc-service: DRC is turned OFF [2016-06-18 03:22:23.635338] I [MSGID: 112110] [nfs.c:1505:init] 0-nfs: NFS service started [2016-06-18 03:22:23.668644] W [graph.c:357:_log_if_unknown_option] 0-nfs-server: option 'rpc-auth.auth-glusterfs' is not recognized [2016-06-18 03:22:23.668772] W [graph.c:357:_log_if_unknown_option] 0-nfs-server: option 'rpc-auth-allow-insecure' is not recognized [2016-06-18 03:22:23.668860] W [graph.c:357:_log_if_unknown_option] 0-nfs-server: option 'transport-type' is not recognized [2016-06-18 03:22:23.669045] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-0: parent translators are ready, attempting connect on transport [2016-06-18 03:22:23.670030] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2016-06-18 03:22:23.670093] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-1: parent translators are ready, attempting connect on transport [2016-06-18 03:22:23.671026] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-0: changing port to 49155 (from 0) [2016-06-18 03:22:23.672298] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-2: parent translators are ready, attempting connect on transport [2016-06-18 03:22:23.672757] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-3: parent translators are ready, attempting connect on transport [2016-06-18 03:22:23.673459] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:23.673900] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-0: Connected to test-client-0, attached to remote volume '/disk1'. [2016-06-18 03:22:23.673931] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-0: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:23.674167] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-0: Server lk version = 1 [2016-06-18 03:22:23.674242] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-4: parent translators are ready, attempting connect on transport [2016-06-18 03:22:23.674705] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-5: parent translators are ready, attempting connect on transport [2016-06-18 03:22:23.675173] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-6: parent translators are ready, attempting connect on transport [2016-06-18 03:22:23.675979] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-3: changing port to 49156 (from 0) [2016-06-18 03:22:23.676874] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:23.677164] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-7: parent translators are ready, attempting connect on transport [2016-06-18 03:22:23.677646] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-8: parent translators are ready, attempting connect on transport Final graph: +------------------------------------------------------------------------------+ 1: volume test-client-0 2: type protocol/client 3: option clnt-lk-version 1 4: option volfile-checksum 0 5: option volfile-key gluster/nfs 6: option client-version 3.7.5 7: option process-uuid node-1-21289-2016/06/18-03:22:22:598221-test-client-0-0-0 8: option fops-version 1298437 9: option ping-timeout 42 10: option remote-host node-1 11: option remote-subvolume /disk1 12: option transport-type socket 13: option username c6ebd82f-94a3-4947-baa4-ddae560558c6 14: option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7 15: option send-gids true 16: end-volume 17: 18: volume test-client-1 19: type protocol/client 20: option ping-timeout 42 21: option remote-host node-2 22: option remote-subvolume /disk1 23: option transport-type socket 24: option username c6ebd82f-94a3-4947-baa4-ddae560558c6 25: option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7 26: option send-gids true 27: end-volume 28: 29: volume test-client-2 30: type protocol/client 31: option ping-timeout 42 32: option remote-host node-3 33: option remote-subvolume /disk1 34: option transport-type socket 35: option username c6ebd82f-94a3-4947-baa4-ddae560558c6 36: option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7 37: option send-gids true 38: end-volume 39: 40: volume test-disperse-0 41: type cluster/disperse 42: option redundancy 1 43: subvolumes test-client-0 test-client-1 test-client-2 44: end-volume 45: 46: volume test-client-3 47: type protocol/client 48: option clnt-lk-version 1 49: option volfile-checksum 0 50: option volfile-key gluster/nfs 51: option client-version 3.7.5 52: option process-uuid node-1-21289-2016/06/18-03:22:22:598221-test-client-3-0-0 53: option fops-version 1298437 54: option ping-timeout 42 55: option remote-host node-1 56: option remote-subvolume /disk2 57: option transport-type socket 58: option username c6ebd82f-94a3-4947-baa4-ddae560558c6 59: option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7 60: option send-gids true 61: end-volume 62: 63: volume test-client-4 64: type protocol/client 65: option ping-timeout 42 66: option remote-host node-2 67: option remote-subvolume /disk2 68: option transport-type socket 69: option username c6ebd82f-94a3-4947-baa4-ddae560558c6 70: option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7 71: option send-gids true 72: end-volume 73: 74: volume test-client-5 75: type protocol/client 76: option ping-timeout 42 77: option remote-host node-3 78: option remote-subvolume /disk2 79: option transport-type socket 80: option username c6ebd82f-94a3-4947-baa4-ddae560558c6 81: option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7 82: option send-gids true 83: end-volume 84: 85: volume test-disperse-1 86: type cluster/disperse 87: option redundancy 1 88: subvolumes test-client-3 test-client-4 test-client-5 89: end-volume 90: 91: volume test-client-6 92: type protocol/client 93: option ping-timeout 42 94: option remote-host node-1 95: option remote-subvolume /disk3 96: option transport-type socket 97: option username c6ebd82f-94a3-4947-baa4-ddae560558c6 98: option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7 99: option send-gids true 100: end-volume 101: 102: volume test-client-7 103: type protocol/client 104: option ping-timeout 42 105: option remote-host node-2 106: option remote-subvolume /disk3 107: option transport-type socket 108: option username c6ebd82f-94a3-4947-baa4-ddae560558c6 109: option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7 110: option send-gids true 111: end-volume 112: 113: volume test-client-8 114: type protocol/client 115: option ping-timeout 42 116: option remote-host node-3 117: option remote-subvolume /disk3 118: option transport-type socket 119: option username c6ebd82f-94a3-4947-baa4-ddae560558c6 120: option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7 121: option send-gids true 122: end-volume 123: 124: volume test-disperse-2 125: type cluster/disperse 126: option redundancy 1 127: subvolumes test-client-6 test-client-7 test-client-8 128: end-volume 129: 130: volume test-dht 131: type cluster/distribute 132: subvolumes test-disperse-0 test-disperse-1 test-disperse-2 133: end-volume 134: 135: volume test-write-behind 136: type performance/write-behind 137: subvolumes test-dht 138: end-volume 139: 140: volume test 141: type debug/io-stats 142: option log-level INFO 143: option latency-measurement off 144: option count-fop-hits off 145: subvolumes test-write-behind 146: end-volume 147: 148: volume nfs-server 149: type nfs/server 150: option rpc-auth.auth-glusterfs on 151: option rpc-auth.auth-unix on 152: option rpc-auth.auth-null on 153: option rpc-auth.ports.insecure on 154: option rpc-auth-allow-insecure on 155: option transport-type socket 156: option transport.socket.listen-port 2049 157: option nfs.dynamic-volumes on 158: option nfs.nlm on 159: option nfs.drc off 160: option rpc-auth.addr.test.allow * 161: option nfs3.test.volume-id 7ddb89bc-5c32-4122-8b45-dee91f03aaca 162: subvolumes test 163: end-volume 164: +------------------------------------------------------------------------------+ [2016-06-18 03:22:23.679499] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-3: Connected to test-client-3, attached to remote volume '/disk2'. [2016-06-18 03:22:23.679523] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-3: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:23.679792] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-3: Server lk version = 1 [2016-06-18 03:22:23.681832] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-6: changing port to 49157 (from 0) [2016-06-18 03:22:23.682691] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-6: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:23.683100] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-6: Connected to test-client-6, attached to remote volume '/disk3'. [2016-06-18 03:22:23.683122] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-6: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:23.683293] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-6: Server lk version = 1 [2016-06-18 03:22:23.683779] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-1: changing port to 49386 (from 0) [2016-06-18 03:22:23.683826] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-test-client-7: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-06-18 03:22:23.683880] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-7: disconnected from test-client-7. Client process will keep trying to connect to glusterd until brick's port is available [2016-06-18 03:22:23.684492] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-test-client-4: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-06-18 03:22:23.684547] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-4: disconnected from test-client-4. Client process will keep trying to connect to glusterd until brick's port is available [2016-06-18 03:22:23.685626] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:23.686213] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-1: Connected to test-client-1, attached to remote volume '/disk1'. [2016-06-18 03:22:23.686244] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-1: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:23.686597] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-1: Server lk version = 1 [2016-06-18 03:22:23.814665] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-2: changing port to 49383 (from 0) [2016-06-18 03:22:23.818282] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-test-client-5: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-06-18 03:22:23.818350] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-5: disconnected from test-client-5. Client process will keep trying to connect to glusterd until brick's port is available [2016-06-18 03:22:23.818353] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-test-client-8: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2016-06-18 03:22:23.818442] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-8: disconnected from test-client-8. Client process will keep trying to connect to glusterd until brick's port is available [2016-06-18 03:22:23.908262] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:23.954398] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-2: Connected to test-client-2, attached to remote volume '/disk1'. [2016-06-18 03:22:23.954433] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-2: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:23.954500] I [MSGID: 122061] [ec.c:313:ec_up] 0-test-disperse-0: Going UP [2016-06-18 03:22:24.001334] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-2: Server lk version = 1 [2016-06-18 03:22:26.203237] W [MSGID: 109075] [dht-diskusage.c:49:dht_du_info_cbk] 0-test-dht: failed to get disk info from test-disperse-1 [Transport endpoint is not connected] [2016-06-18 03:22:26.203304] W [MSGID: 109075] [dht-diskusage.c:49:dht_du_info_cbk] 0-test-dht: failed to get disk info from test-disperse-2 [Transport endpoint is not connected] [2016-06-18 03:22:26.953613] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt: Volume file changed [2016-06-18 03:22:27.219502] W [MSGID: 109075] [dht-diskusage.c:49:dht_du_info_cbk] 0-test-dht: failed to get disk info from test-disperse-1 [Transport endpoint is not connected] [2016-06-18 03:22:27.219557] W [MSGID: 109075] [dht-diskusage.c:49:dht_du_info_cbk] 0-test-dht: failed to get disk info from test-disperse-2 [Transport endpoint is not connected] [2016-06-18 03:22:27.629995] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-7: changing port to 49388 (from 0) [2016-06-18 03:22:27.630752] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-4: changing port to 49387 (from 0) [2016-06-18 03:22:27.642401] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-7: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:27.643668] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:27.650505] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-7: Connected to test-client-7, attached to remote volume '/disk3'. [2016-06-18 03:22:27.650552] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-7: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:27.650657] I [MSGID: 122061] [ec.c:313:ec_up] 0-test-disperse-2: Going UP [2016-06-18 03:22:27.650717] W [MSGID: 122035] [ec-common.c:419:ec_child_select] 0-test-disperse-2: Executing operation with some subvolumes unavailable (4) [2016-06-18 03:22:27.651788] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-4: Connected to test-client-4, attached to remote volume '/disk2'. [2016-06-18 03:22:27.651828] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-4: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:27.651925] I [MSGID: 122061] [ec.c:313:ec_up] 0-test-disperse-1: Going UP [2016-06-18 03:22:27.651974] W [MSGID: 122035] [ec-common.c:419:ec_child_select] 0-test-disperse-1: Executing operation with some subvolumes unavailable (4) [2016-06-18 03:22:27.654804] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-4: Server lk version = 1 [2016-06-18 03:22:27.654872] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-7: Server lk version = 1 [2016-06-18 03:22:27.675801] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-5: changing port to 49384 (from 0) [2016-06-18 03:22:27.676249] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-8: changing port to 49385 (from 0) [2016-06-18 03:22:27.745561] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-5: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:27.767869] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-5: Connected to test-client-5, attached to remote volume '/disk2'. [2016-06-18 03:22:27.767904] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-5: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:27.789158] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-5: Server lk version = 1 [2016-06-18 03:22:27.807143] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-8: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-06-18 03:22:27.831490] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-8: Connected to test-client-8, attached to remote volume '/disk3'. [2016-06-18 03:22:27.869927] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-8: Server and Client lk-version numbers are not same, reopening the fds [2016-06-18 03:22:27.870217] W [MSGID: 122035] [ec-common.c:419:ec_child_select] 0-test-disperse-2: Executing operation with some subvolumes unavailable (4) [2016-06-18 03:22:27.892882] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-8: Server lk version = 1 [2016-06-18 03:22:28.377750] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test with [Subvol_name: test-disperse-1, Err: 0 , Start: 0 , Stop: 0 , Hash: 0 ], [Subvol_name: test-disperse-2, Err: 0 , Start: 0 , Stop: 0 , Hash: 0 ], [Subvol_name: test-disperse-0, Err: 0 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:28.791273] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s12 with [Subvol_name: test-disperse-1, Err: 0 , Start: 0 , Stop: 0 , Hash: 0 ], [Subvol_name: test-disperse-2, Err: 0 , Start: 0 , Stop: 0 , Hash: 0 ], [Subvol_name: test-disperse-0, Err: 0 , Start: 0 , Stop: 4294967295 , Hash: 1 ], [2016-06-18 03:22:29.293440] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2016-06-18 03:27:06.862268] W [socket.c:701:__socket_rwv] 0-test-client-3: readv on 10.10.21.50:49156 failed (No data available) [2016-06-18 03:27:06.862333] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-3: disconnected from test-client-3. Client process will keep trying to connect to glusterd until brick's port is available [2016-06-18 03:27:06.862675] W [socket.c:701:__socket_rwv] 0-test-client-0: readv on 10.10.21.50:49155 failed (No data available) [2016-06-18 03:27:06.862718] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-0: disconnected from test-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2016-06-18 03:27:06.866850] W [socket.c:701:__socket_rwv] 0-test-client-6: readv on 10.10.21.50:49157 failed (No data available) [2016-06-18 03:27:06.866895] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-6: disconnected from test-client-6. Client process will keep trying to connect to glusterd until brick's port is available
release-3.7 commit 88d19dfd60fc705375abc8e739b50db111a65a48
afr has same problem
I modify code like this: index 229bfee..9808f12 100644 --- a/xlators/nfs/server/src/nfs3-helpers.c +++ b/xlators/nfs/server/src/nfs3-helpers.c @@ -3791,6 +3791,14 @@ nfs3_fh_resolve_entry_hard (nfs3_call_state_t *cs) (nfs3_create_op (cs) && !nfs3_create_exclusive_op (cs))) { cs->lookuptype = GF_NFS3_FRESH; cs->resolve_ret = 0; + /*Make sure cs->hardresolved == 0 + * before nfs3_call_resume(cs). + * add by Jiademing*/ + if (cs->hardresolved) { + gf_msg (GF_NFS3, GF_LOG_INFO, 0, + NFS_MSG_LOOKUP_ROOT_FAIL, "reset cs->hardresolved = 0"); + cs->hardresolved = 0; + } nfs3_call_resume (cs); } else { cs->hardresolved = 1; then test is OK. when nfsd re-init, inode->parent not found, 1)then call lookup to resolve parent, cs->hardresolved = 1; 2)lookup success, recall nfs3_fh_resolve_entry_hard().then its own inode not found, retry lookup(its inode), but because cs->hardresolved == 1,so direct return at nfs3_lookup_resume(), not dispatch this lookup operation, so failed. Can we reset cs->hardresolved = 0 at nfs3_fh_resolve_entry_hard()?
As updated by you in the bug1356068, the fix is similar. Sorry that this bug was not looked upon earlier. Post the fix of bug1356068 (http://review.gluster.org/14911 ) gets merged, kindly confirm if that indeed resolves your issue.
(In reply to Soumya Koduri from comment #5) > As updated by you in the bug1356068, the fix is similar. Sorry that this bug > was not looked upon earlier. Post the fix of bug1356068 > (http://review.gluster.org/14911 ) gets merged, kindly confirm if that > indeed resolves your issue. Yes, I reset cs->hardresolved = 0, then test is OK.
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life. Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS. If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.