While running cthon lock tests, simple locking test fails. Corresponding logs: [2012-04-18 08:31:03.341302] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg [2012-04-18 08:31:03.341351] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-nfs-test-big-client-1: Failed to build record header [2012-04-18 08:31:03.341371] W [rpc-clnt.c:1328:rpc_clnt_record] 0-nfs-test-big-client-1: cannot build rpc-record [2012-04-18 08:31:03.341386] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-nfs-test-big-client-1: cannot build rpc-record [2012-04-18 08:31:03.341404] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-nfs-test-big-client-1: remote operation failed: Transport endpoint is not connected
little more log snippet will help.
for me all cthon lock tests are a pass on qa-35 setup for a distribute-replicate(2X2) volume, btw I am using cthon from the suggested cthon repo, git://linux-nfs.org/~steved/cthon04.git
(In reply to comment #1) > little more log snippet will help. No more details are logged, the above log repeats, no unique messages are seen than above.
(In reply to comment #2) > for me all cthon lock tests are a pass on qa-35 setup for a > distribute-replicate(2X2) volume, btw I am using cthon from the suggested cthon > repo, > git://linux-nfs.org/~steved/cthon04.git The one I tried is a 4 node distribute. However that should not matter. The locking fails for any other lock test than cthon, a small program to lock a region of file fails as well.
The network is quite flakey, the link is going UP/DOWN quite often. I am not sure if this is contributing to the issue, the other FOPs work quite transparently.
If I change the client hostname to 33 chars things work fine. If i change it to 34 chars things don't work fine. I think if the length of the lockowner is beyond some value (40 - if we include pid and hostname) we see this behaviour. The problem seems to be in encoding the lock arguments (lock owner in particular)
I tried to change the hostname of the client and executed the cthon test, for me there is one failure and this is only after changing the hostname with a longname, [root@longname-nlmtest-verification-purpose cthon04]# hostname longname-nlmtest-verification-purpose [root@longname-nlmtest-verification-purpose cthon04]# hostname | wc -c 38 cthon fails in Test#10, Test #10 - Make sure a locked region is split properly. Parent: 10.0 - F_TLOCK [ 0, 3] PASSED. Parent: 10.1 - F_ULOCK [ 1, 1] PASSED. Child: 10.2 - F_TEST [ 0, 1] PASSED. Child: 10.3 - F_TEST [ 2, 1] PASSED. Child: 10.4 - F_TEST [ 3, ENDING] FAILED! Child: **** Expected success, returned EACCES... Child: **** Probably implementation error. ** CHILD pass 1 results: 48/48 pass, 0/0 warn, 1/1 fail (pass/total). Parent: Child died ** PARENT pass 1 results: 29/29 pass, 0/0 warn, 0/0 fail (pass/total). lock tests failed Tests failed, leaving /mnt/nfs-test mounted ############################################################### logs from /var/log/glusterfs/nfs.log, [2012-04-19 05:53:47.187445] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg [2012-04-19 05:53:47.187568] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-dist-rep-client-2: Failed to build record header [2012-04-19 05:53:47.187614] W [rpc-clnt.c:1328:rpc_clnt_record] 0-dist-rep-client-2: cannot build rpc-record [2012-04-19 05:53:47.187652] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-dist-rep-client-2: cannot build rpc-record [2012-04-19 05:53:47.187693] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-dist-rep-client-2: remote operation failed: Transport endpoint is not connected [2012-04-19 05:53:47.187742] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg [2012-04-19 05:53:47.187783] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-dist-rep-client-3: Failed to build record header [2012-04-19 05:53:47.187820] W [rpc-clnt.c:1328:rpc_clnt_record] 0-dist-rep-client-3: cannot build rpc-record [2012-04-19 05:53:47.187855] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-dist-rep-client-3: cannot build rpc-record [2012-04-19 05:53:47.187892] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-dist-rep-client-3: remote operation failed: Transport endpoint is not connected ############################################################## [root@RHS-71 ~]# glusterfs -V glusterfs 3.3.0qa35 built on Apr 17 2012 11:22:39 [root@RHS-71 ~]# gluster volume info Volume Name: dist-rep Type: Distributed-Replicate Volume ID: 0a559a33-6cbe-4853-8f75-f7db6c880cc4 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 172.17.251.71:/export/dr Brick2: 172.17.251.72:/export/drr Brick3: 172.17.251.73:/export/ddr Brick4: 172.17.251.74:/export/ddrr
CHANGE: http://review.gluster.com/3191 (rpc-clnt: use the correct xdr_size for getting the iobuf) merged in master by Anand Avati (avati)
Thanks to Krishna for pointing me to the right place (by comment #6)
Works on 3.3.0qa36