Bug 1347903 - disperse volume, after add-brick success, postmark create file failed by NFS service.
Summary: disperse volume, after add-brick success, postmark create file failed by NFS...
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: nfs
Version: 3.7.12
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Soumya Koduri
QA Contact:
URL:
Whiteboard:
Depends On: 1356068
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-06-18 03:38 UTC by jiademing.dd
Modified: 2017-03-08 10:54 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-03-08 10:54:10 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description jiademing.dd 2016-06-18 03:38:40 UTC
Description of problem:
disperse volume, after add-brick success, postmark create file failed  by NFS service.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.Create disperse volume:
Volume Name: test
Type: Disperse
Volume ID: 5a7292e6-b56d-438a-a544-b7a21512c9fc
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: node-1:/disk1
Brick2: node-2:/disk1
Brick3: node-3:/disk1
Options Reconfigured:
performance.readdir-ahead: on

2.then mount -t nfs IP:/test /mnt/,
run postmark.
postmark'conf:
set size 10240000 10240000
set number 200
set location /mnt/test
set subdirectories 20
set read 65536
set write 65536
set transactions 1
run
quit

3.gluster vol add-brick test node-{1..3}:/disk2 node-{1..3}:/disk3 force;
sleep 1;
gluster vol rebalance test fix-layout start;

Actual results:
Some file create failed.lisk this:
PostMark v1.51 : 8/14/01
Reading configuration from file 'conf'
Creating subdirectories...Done
Creating files...Error: cannot open '/mnt/test/s1/53' for writing

Expected results:
Create and write all are OK.

Additional info:

Comment 1 jiademing.dd 2016-06-18 03:39:18 UTC
 72:     subvolumes test
 73: end-volume
 74:  
+------------------------------------------------------------------------------+
[2016-06-18 03:22:01.264991] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:01.265524] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-0: Connected to test-client-0, attached to remote volume '/disk1'.
[2016-06-18 03:22:01.265550] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:01.265813] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-0: Server lk version = 1
[2016-06-18 03:22:01.444191] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-1: changing port to 49386 (from 0)
[2016-06-18 03:22:01.456021] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-2: changing port to 49383 (from 0)
[2016-06-18 03:22:01.462156] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:01.479843] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:01.479897] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-1: Connected to test-client-1, attached to remote volume '/disk1'.
[2016-06-18 03:22:01.479934] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:01.488189] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-1: Server lk version = 1
[2016-06-18 03:22:01.491425] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-2: Connected to test-client-2, attached to remote volume '/disk1'.
[2016-06-18 03:22:01.491456] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:01.491541] I [MSGID: 122061] [ec.c:313:ec_up] 0-test-disperse-0: Going UP
[2016-06-18 03:22:01.498240] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-2: Server lk version = 1
[2016-06-18 03:22:01.505906] I [MSGID: 109063] [dht-layout.c:718:dht_layout_normalize] 0-test-dht: Found anomalies in / (gfid = 00000000-0000-0000-0000-000000000001). Holes=1 overlaps=0
[2016-06-18 03:22:01.515154] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of / with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:04.504321] W [rpcsvc.c:278:rpcsvc_program_actor] 0-rpc-service: RPC program version not available (req 100003 4) for 10.10.21.50:933
[2016-06-18 03:22:04.504364] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
[2016-06-18 03:22:04.573785] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.091360] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s0 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.101556] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s1 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.109635] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s2 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.118082] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s3 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.127005] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s4 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.136066] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s5 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.145362] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s6 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.153774] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s7 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.162330] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s8 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.171154] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s9 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.179886] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s10 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.188966] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s11 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.197871] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s12 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.205977] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s13 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.214266] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s14 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.222169] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s15 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.230658] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s16 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.239091] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s17 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.247742] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s18 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:09.257356] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s19 with [Subvol_name: test-disperse-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:22.606438] I [MSGID: 100030] [glusterfsd.c:2338:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.5 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/0088583f7375696dfe2eba504070538b.socket)
[2016-06-18 03:22:22.614536] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2016-06-18 03:22:23.606427] I [rpcsvc.c:2196:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 16
[2016-06-18 03:22:23.615095] W [MSGID: 112153] [mount3.c:3936:mnt3svc_init] 0-nfs-mount: Exports auth has been disabled!
[2016-06-18 03:22:23.635291] I [rpc-drc.c:694:rpcsvc_drc_init] 0-rpc-service: DRC is turned OFF
[2016-06-18 03:22:23.635338] I [MSGID: 112110] [nfs.c:1505:init] 0-nfs: NFS service started
[2016-06-18 03:22:23.668644] W [graph.c:357:_log_if_unknown_option] 0-nfs-server: option 'rpc-auth.auth-glusterfs' is not recognized
[2016-06-18 03:22:23.668772] W [graph.c:357:_log_if_unknown_option] 0-nfs-server: option 'rpc-auth-allow-insecure' is not recognized
[2016-06-18 03:22:23.668860] W [graph.c:357:_log_if_unknown_option] 0-nfs-server: option 'transport-type' is not recognized
[2016-06-18 03:22:23.669045] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-0: parent translators are ready, attempting connect on transport
[2016-06-18 03:22:23.670030] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2016-06-18 03:22:23.670093] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-1: parent translators are ready, attempting connect on transport
[2016-06-18 03:22:23.671026] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-0: changing port to 49155 (from 0)
[2016-06-18 03:22:23.672298] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-2: parent translators are ready, attempting connect on transport
[2016-06-18 03:22:23.672757] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-3: parent translators are ready, attempting connect on transport
[2016-06-18 03:22:23.673459] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:23.673900] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-0: Connected to test-client-0, attached to remote volume '/disk1'.
[2016-06-18 03:22:23.673931] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:23.674167] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-0: Server lk version = 1
[2016-06-18 03:22:23.674242] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-4: parent translators are ready, attempting connect on transport
[2016-06-18 03:22:23.674705] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-5: parent translators are ready, attempting connect on transport
[2016-06-18 03:22:23.675173] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-6: parent translators are ready, attempting connect on transport
[2016-06-18 03:22:23.675979] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-3: changing port to 49156 (from 0)
[2016-06-18 03:22:23.676874] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:23.677164] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-7: parent translators are ready, attempting connect on transport
[2016-06-18 03:22:23.677646] I [MSGID: 114020] [client.c:2106:notify] 0-test-client-8: parent translators are ready, attempting connect on transport
Final graph:
+------------------------------------------------------------------------------+
  1: volume test-client-0
  2:     type protocol/client
  3:     option clnt-lk-version 1
  4:     option volfile-checksum 0
  5:     option volfile-key gluster/nfs
  6:     option client-version 3.7.5
  7:     option process-uuid node-1-21289-2016/06/18-03:22:22:598221-test-client-0-0-0
  8:     option fops-version 1298437
  9:     option ping-timeout 42
 10:     option remote-host node-1
 11:     option remote-subvolume /disk1
 12:     option transport-type socket
 13:     option username c6ebd82f-94a3-4947-baa4-ddae560558c6
 14:     option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7
 15:     option send-gids true
 16: end-volume
 17:  
 18: volume test-client-1
 19:     type protocol/client
 20:     option ping-timeout 42
 21:     option remote-host node-2
 22:     option remote-subvolume /disk1
 23:     option transport-type socket
 24:     option username c6ebd82f-94a3-4947-baa4-ddae560558c6
 25:     option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7
 26:     option send-gids true
 27: end-volume
 28:  
 29: volume test-client-2
 30:     type protocol/client
 31:     option ping-timeout 42
 32:     option remote-host node-3
 33:     option remote-subvolume /disk1
 34:     option transport-type socket
 35:     option username c6ebd82f-94a3-4947-baa4-ddae560558c6
 36:     option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7
 37:     option send-gids true
 38: end-volume
 39:  
 40: volume test-disperse-0
 41:     type cluster/disperse
 42:     option redundancy 1
 43:     subvolumes test-client-0 test-client-1 test-client-2
 44: end-volume
 45:  
 46: volume test-client-3
 47:     type protocol/client
 48:     option clnt-lk-version 1
 49:     option volfile-checksum 0
 50:     option volfile-key gluster/nfs
 51:     option client-version 3.7.5
 52:     option process-uuid node-1-21289-2016/06/18-03:22:22:598221-test-client-3-0-0
 53:     option fops-version 1298437
 54:     option ping-timeout 42
 55:     option remote-host node-1
 56:     option remote-subvolume /disk2
 57:     option transport-type socket
 58:     option username c6ebd82f-94a3-4947-baa4-ddae560558c6
 59:     option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7
 60:     option send-gids true
 61: end-volume
 62:  
 63: volume test-client-4
 64:     type protocol/client
 65:     option ping-timeout 42
 66:     option remote-host node-2
 67:     option remote-subvolume /disk2
 68:     option transport-type socket
 69:     option username c6ebd82f-94a3-4947-baa4-ddae560558c6
 70:     option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7
 71:     option send-gids true
 72: end-volume
 73:  
 74: volume test-client-5
 75:     type protocol/client
 76:     option ping-timeout 42
 77:     option remote-host node-3
 78:     option remote-subvolume /disk2
 79:     option transport-type socket
 80:     option username c6ebd82f-94a3-4947-baa4-ddae560558c6
 81:     option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7
 82:     option send-gids true
 83: end-volume
 84:  
 85: volume test-disperse-1
 86:     type cluster/disperse
 87:     option redundancy 1
 88:     subvolumes test-client-3 test-client-4 test-client-5
 89: end-volume
 90:  
 91: volume test-client-6
 92:     type protocol/client
 93:     option ping-timeout 42
 94:     option remote-host node-1
 95:     option remote-subvolume /disk3
 96:     option transport-type socket
 97:     option username c6ebd82f-94a3-4947-baa4-ddae560558c6
 98:     option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7
 99:     option send-gids true
100: end-volume
101:  
102: volume test-client-7
103:     type protocol/client
104:     option ping-timeout 42
105:     option remote-host node-2
106:     option remote-subvolume /disk3
107:     option transport-type socket
108:     option username c6ebd82f-94a3-4947-baa4-ddae560558c6
109:     option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7
110:     option send-gids true
111: end-volume
112:  
113: volume test-client-8
114:     type protocol/client
115:     option ping-timeout 42
116:     option remote-host node-3
117:     option remote-subvolume /disk3
118:     option transport-type socket
119:     option username c6ebd82f-94a3-4947-baa4-ddae560558c6
120:     option password 07a35b39-bcb9-4f67-b92b-5b8054957cf7
121:     option send-gids true
122: end-volume
123:  
124: volume test-disperse-2
125:     type cluster/disperse
126:     option redundancy 1
127:     subvolumes test-client-6 test-client-7 test-client-8
128: end-volume
129:  
130: volume test-dht
131:     type cluster/distribute
132:     subvolumes test-disperse-0 test-disperse-1 test-disperse-2
133: end-volume
134:  
135: volume test-write-behind
136:     type performance/write-behind
137:     subvolumes test-dht
138: end-volume
139:  
140: volume test
141:     type debug/io-stats
142:     option log-level INFO
143:     option latency-measurement off
144:     option count-fop-hits off
145:     subvolumes test-write-behind
146: end-volume
147:  
148: volume nfs-server
149:     type nfs/server
150:     option rpc-auth.auth-glusterfs on
151:     option rpc-auth.auth-unix on
152:     option rpc-auth.auth-null on
153:     option rpc-auth.ports.insecure on
154:     option rpc-auth-allow-insecure on
155:     option transport-type socket
156:     option transport.socket.listen-port 2049
157:     option nfs.dynamic-volumes on
158:     option nfs.nlm on
159:     option nfs.drc off
160:     option rpc-auth.addr.test.allow *
161:     option nfs3.test.volume-id 7ddb89bc-5c32-4122-8b45-dee91f03aaca
162:     subvolumes test
163: end-volume
164:  
+------------------------------------------------------------------------------+
[2016-06-18 03:22:23.679499] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-3: Connected to test-client-3, attached to remote volume '/disk2'.
[2016-06-18 03:22:23.679523] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:23.679792] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-3: Server lk version = 1
[2016-06-18 03:22:23.681832] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-6: changing port to 49157 (from 0)
[2016-06-18 03:22:23.682691] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-6: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:23.683100] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-6: Connected to test-client-6, attached to remote volume '/disk3'.
[2016-06-18 03:22:23.683122] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-6: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:23.683293] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-6: Server lk version = 1
[2016-06-18 03:22:23.683779] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-1: changing port to 49386 (from 0)
[2016-06-18 03:22:23.683826] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-test-client-7: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2016-06-18 03:22:23.683880] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-7: disconnected from test-client-7. Client process will keep trying to connect to glusterd until brick's port is available
[2016-06-18 03:22:23.684492] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-test-client-4: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2016-06-18 03:22:23.684547] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-4: disconnected from test-client-4. Client process will keep trying to connect to glusterd until brick's port is available
[2016-06-18 03:22:23.685626] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:23.686213] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-1: Connected to test-client-1, attached to remote volume '/disk1'.
[2016-06-18 03:22:23.686244] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:23.686597] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-1: Server lk version = 1
[2016-06-18 03:22:23.814665] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-2: changing port to 49383 (from 0)
[2016-06-18 03:22:23.818282] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-test-client-5: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2016-06-18 03:22:23.818350] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-5: disconnected from test-client-5. Client process will keep trying to connect to glusterd until brick's port is available
[2016-06-18 03:22:23.818353] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-test-client-8: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2016-06-18 03:22:23.818442] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-8: disconnected from test-client-8. Client process will keep trying to connect to glusterd until brick's port is available
[2016-06-18 03:22:23.908262] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:23.954398] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-2: Connected to test-client-2, attached to remote volume '/disk1'.
[2016-06-18 03:22:23.954433] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:23.954500] I [MSGID: 122061] [ec.c:313:ec_up] 0-test-disperse-0: Going UP
[2016-06-18 03:22:24.001334] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-2: Server lk version = 1
[2016-06-18 03:22:26.203237] W [MSGID: 109075] [dht-diskusage.c:49:dht_du_info_cbk] 0-test-dht: failed to get disk info from test-disperse-1 [Transport endpoint is not connected]
[2016-06-18 03:22:26.203304] W [MSGID: 109075] [dht-diskusage.c:49:dht_du_info_cbk] 0-test-dht: failed to get disk info from test-disperse-2 [Transport endpoint is not connected]
[2016-06-18 03:22:26.953613] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 0-mgmt: Volume file changed
[2016-06-18 03:22:27.219502] W [MSGID: 109075] [dht-diskusage.c:49:dht_du_info_cbk] 0-test-dht: failed to get disk info from test-disperse-1 [Transport endpoint is not connected]
[2016-06-18 03:22:27.219557] W [MSGID: 109075] [dht-diskusage.c:49:dht_du_info_cbk] 0-test-dht: failed to get disk info from test-disperse-2 [Transport endpoint is not connected]
[2016-06-18 03:22:27.629995] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-7: changing port to 49388 (from 0)
[2016-06-18 03:22:27.630752] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-4: changing port to 49387 (from 0)
[2016-06-18 03:22:27.642401] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-7: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:27.643668] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-4: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:27.650505] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-7: Connected to test-client-7, attached to remote volume '/disk3'.
[2016-06-18 03:22:27.650552] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-7: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:27.650657] I [MSGID: 122061] [ec.c:313:ec_up] 0-test-disperse-2: Going UP
[2016-06-18 03:22:27.650717] W [MSGID: 122035] [ec-common.c:419:ec_child_select] 0-test-disperse-2: Executing operation with some subvolumes unavailable (4)
[2016-06-18 03:22:27.651788] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-4: Connected to test-client-4, attached to remote volume '/disk2'.
[2016-06-18 03:22:27.651828] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-4: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:27.651925] I [MSGID: 122061] [ec.c:313:ec_up] 0-test-disperse-1: Going UP
[2016-06-18 03:22:27.651974] W [MSGID: 122035] [ec-common.c:419:ec_child_select] 0-test-disperse-1: Executing operation with some subvolumes unavailable (4)
[2016-06-18 03:22:27.654804] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-4: Server lk version = 1
[2016-06-18 03:22:27.654872] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-7: Server lk version = 1
[2016-06-18 03:22:27.675801] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-5: changing port to 49384 (from 0)
[2016-06-18 03:22:27.676249] I [rpc-clnt.c:1868:rpc_clnt_reconfig] 0-test-client-8: changing port to 49385 (from 0)
[2016-06-18 03:22:27.745561] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-5: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:27.767869] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-5: Connected to test-client-5, attached to remote volume '/disk2'.
[2016-06-18 03:22:27.767904] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-5: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:27.789158] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-5: Server lk version = 1
[2016-06-18 03:22:27.807143] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-test-client-8: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-06-18 03:22:27.831490] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-test-client-8: Connected to test-client-8, attached to remote volume '/disk3'.
[2016-06-18 03:22:27.869927] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-test-client-8: Server and Client lk-version numbers are not same, reopening the fds
[2016-06-18 03:22:27.870217] W [MSGID: 122035] [ec-common.c:419:ec_child_select] 0-test-disperse-2: Executing operation with some subvolumes unavailable (4)
[2016-06-18 03:22:27.892882] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-test-client-8: Server lk version = 1
[2016-06-18 03:22:28.377750] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test with [Subvol_name: test-disperse-1, Err: 0 , Start: 0 , Stop: 0 , Hash: 0 ], [Subvol_name: test-disperse-2, Err: 0 , Start: 0 , Stop: 0 , Hash: 0 ], [Subvol_name: test-disperse-0, Err: 0 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:28.791273] I [MSGID: 109036] [dht-common.c:8906:dht_log_new_layout_for_dir_selfheal] 0-test-dht: Setting layout of /test/s12 with [Subvol_name: test-disperse-1, Err: 0 , Start: 0 , Stop: 0 , Hash: 0 ], [Subvol_name: test-disperse-2, Err: 0 , Start: 0 , Stop: 0 , Hash: 0 ], [Subvol_name: test-disperse-0, Err: 0 , Start: 0 , Stop: 4294967295 , Hash: 1 ], 
[2016-06-18 03:22:29.293440] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing
[2016-06-18 03:27:06.862268] W [socket.c:701:__socket_rwv] 0-test-client-3: readv on 10.10.21.50:49156 failed (No data available)
[2016-06-18 03:27:06.862333] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-3: disconnected from test-client-3. Client process will keep trying to connect to glusterd until brick's port is available
[2016-06-18 03:27:06.862675] W [socket.c:701:__socket_rwv] 0-test-client-0: readv on 10.10.21.50:49155 failed (No data available)
[2016-06-18 03:27:06.862718] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-0: disconnected from test-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2016-06-18 03:27:06.866850] W [socket.c:701:__socket_rwv] 0-test-client-6: readv on 10.10.21.50:49157 failed (No data available)
[2016-06-18 03:27:06.866895] I [MSGID: 114018] [client.c:2030:client_rpc_notify] 0-test-client-6: disconnected from test-client-6. Client process will keep trying to connect to glusterd until brick's port is available

Comment 2 jiademing.dd 2016-06-18 06:31:50 UTC
release-3.7   commit 88d19dfd60fc705375abc8e739b50db111a65a48

Comment 3 jiademing.dd 2016-06-21 01:42:54 UTC
afr has same problem

Comment 4 jiademing.dd 2016-06-22 10:38:20 UTC
I modify code like this:

index 229bfee..9808f12 100644
--- a/xlators/nfs/server/src/nfs3-helpers.c
+++ b/xlators/nfs/server/src/nfs3-helpers.c
@@ -3791,6 +3791,14 @@ nfs3_fh_resolve_entry_hard (nfs3_call_state_t *cs)
                    (nfs3_create_op (cs) && !nfs3_create_exclusive_op (cs))) {
                         cs->lookuptype = GF_NFS3_FRESH;
                         cs->resolve_ret = 0;
+                        /*Make sure cs->hardresolved == 0
+                         * before nfs3_call_resume(cs).
+                         * add by Jiademing*/
+                        if (cs->hardresolved) {
+                                gf_msg (GF_NFS3, GF_LOG_INFO, 0,
+                                        NFS_MSG_LOOKUP_ROOT_FAIL, "reset cs->hardresolved = 0");
+                                cs->hardresolved = 0;
+                        }
                         nfs3_call_resume (cs);
                 } else {
                        cs->hardresolved = 1;
then test is OK.

when nfsd re-init, inode->parent not found, 
1)then call lookup to resolve parent, cs->hardresolved = 1;
2)lookup success, recall nfs3_fh_resolve_entry_hard().then its own inode not found, retry lookup(its inode), but because cs->hardresolved == 1,so direct return at nfs3_lookup_resume(), not dispatch this lookup operation, so failed.  
Can we reset cs->hardresolved = 0 at nfs3_fh_resolve_entry_hard()?

Comment 5 Soumya Koduri 2016-07-16 16:59:43 UTC
As updated by you in the bug1356068, the fix is similar. Sorry that this bug was not looked upon earlier. Post the fix of bug1356068 (http://review.gluster.org/14911 ) gets merged, kindly confirm if that indeed resolves your issue.

Comment 6 jiademing.dd 2016-07-18 01:43:01 UTC
(In reply to Soumya Koduri from comment #5)
> As updated by you in the bug1356068, the fix is similar. Sorry that this bug
> was not looked upon earlier. Post the fix of bug1356068
> (http://review.gluster.org/14911 ) gets merged, kindly confirm if that
> indeed resolves your issue.

Yes, I reset cs->hardresolved = 0, then test is OK.

Comment 7 Kaushal 2017-03-08 10:54:10 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.


Note You need to log in before you can comment on or make changes to this bug.