Bug 813787 - NFS: locking tests fails
NFS: locking tests fails
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: core (Show other bugs)
pre-release
Unspecified Unspecified
medium Severity high
: ---
: ---
Assigned To: Amar Tumballi
:
Depends On:
Blocks: 817967
  Show dependency treegraph
 
Reported: 2012-04-18 08:41 EDT by Sachidananda Urs
Modified: 2015-12-01 11:45 EST (History)
4 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-07-24 13:19:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Sachidananda Urs 2012-04-18 08:41:06 EDT
While running cthon lock tests, simple locking test fails.

Corresponding logs:

[2012-04-18 08:31:03.341302] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-18 08:31:03.341351] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-nfs-test-big-client-1: Failed to build record header
[2012-04-18 08:31:03.341371] W [rpc-clnt.c:1328:rpc_clnt_record] 0-nfs-test-big-client-1: cannot build rpc-record
[2012-04-18 08:31:03.341386] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-nfs-test-big-client-1: cannot build rpc-record
[2012-04-18 08:31:03.341404] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-nfs-test-big-client-1: remote operation failed: Transport endpoint is not connected
Comment 1 Amar Tumballi 2012-04-18 10:07:11 EDT
little more log snippet will help.
Comment 2 Saurabh 2012-04-18 11:49:52 EDT
for me all cthon lock tests are a pass on qa-35 setup for a distribute-replicate(2X2) volume, btw I am using cthon from the suggested cthon repo,
git://linux-nfs.org/~steved/cthon04.git
Comment 3 Sachidananda Urs 2012-04-18 12:01:50 EDT
(In reply to comment #1)
> little more log snippet will help.

No more details are logged, the above log repeats, no unique messages are seen than above.
Comment 4 Sachidananda Urs 2012-04-18 12:03:30 EDT
(In reply to comment #2)
> for me all cthon lock tests are a pass on qa-35 setup for a
> distribute-replicate(2X2) volume, btw I am using cthon from the suggested cthon
> repo,
> git://linux-nfs.org/~steved/cthon04.git

The one I tried is a 4 node distribute. However that should not matter. The locking fails for any other lock test than cthon, a small program to lock a region of file fails as well.
Comment 5 Sachidananda Urs 2012-04-18 12:08:59 EDT
The network is quite flakey, the link is going UP/DOWN quite often. I am not sure if this is contributing to the issue, the other FOPs work quite transparently.
Comment 6 Krishna Srinivas 2012-04-18 16:30:39 EDT
If I change the client hostname to 33 chars things work fine. If i change it to 34 chars things don't work fine. I think if the length of the lockowner is beyond some value (40 - if we include pid and hostname) we see this behaviour. The problem seems to be in encoding the lock arguments (lock owner in particular)
Comment 7 Saurabh 2012-04-19 01:58:14 EDT
I tried to change the hostname of the client and executed the cthon test, for me there is one failure  and this is only after changing the hostname with a longname,




[root@longname-nlmtest-verification-purpose cthon04]# hostname
longname-nlmtest-verification-purpose

[root@longname-nlmtest-verification-purpose cthon04]# hostname | wc -c
38

cthon fails in Test#10, 

Test #10 - Make sure a locked region is split properly.
	Parent: 10.0  - F_TLOCK [               0,               3] PASSED.
	Parent: 10.1  - F_ULOCK [               1,               1] PASSED.
	Child:  10.2  - F_TEST  [               0,               1] PASSED.
	Child:  10.3  - F_TEST  [               2,               1] PASSED.
	Child:  10.4  - F_TEST  [               3,          ENDING] FAILED!
	Child:  **** Expected success, returned EACCES...
	Child:  **** Probably implementation error.

**  CHILD pass 1 results: 48/48 pass, 0/0 warn, 1/1 fail (pass/total).
	Parent: Child died

** PARENT pass 1 results: 29/29 pass, 0/0 warn, 0/0 fail (pass/total).
lock tests failed
Tests failed, leaving /mnt/nfs-test mounted

###############################################################
logs from /var/log/glusterfs/nfs.log,

[2012-04-19 05:53:47.187445] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-19 05:53:47.187568] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-dist-rep-client-2: Failed to build record header
[2012-04-19 05:53:47.187614] W [rpc-clnt.c:1328:rpc_clnt_record] 0-dist-rep-client-2: cannot build rpc-record
[2012-04-19 05:53:47.187652] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-dist-rep-client-2: cannot build rpc-record
[2012-04-19 05:53:47.187693] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-dist-rep-client-2: remote operation failed: Transport endpoint is not connected
[2012-04-19 05:53:47.187742] W [xdr-rpcclnt.c:88:rpc_request_to_xdr] 0-rpc: failed to encode call msg
[2012-04-19 05:53:47.187783] E [rpc-clnt.c:1268:rpc_clnt_record_build_record] 0-dist-rep-client-3: Failed to build record header
[2012-04-19 05:53:47.187820] W [rpc-clnt.c:1328:rpc_clnt_record] 0-dist-rep-client-3: cannot build rpc-record
[2012-04-19 05:53:47.187855] W [rpc-clnt.c:1467:rpc_clnt_submit] 0-dist-rep-client-3: cannot build rpc-record
[2012-04-19 05:53:47.187892] W [client3_1-fops.c:2173:client3_1_lk_cbk] 0-dist-rep-client-3: remote operation failed: Transport endpoint is not connected



##############################################################

[root@RHS-71 ~]# glusterfs -V
glusterfs 3.3.0qa35 built on Apr 17 2012 11:22:39


[root@RHS-71 ~]# gluster volume info
 
Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 0a559a33-6cbe-4853-8f75-f7db6c880cc4
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 172.17.251.71:/export/dr
Brick2: 172.17.251.72:/export/drr
Brick3: 172.17.251.73:/export/ddr
Brick4: 172.17.251.74:/export/ddrr
Comment 8 Anand Avati 2012-04-19 03:28:46 EDT
CHANGE: http://review.gluster.com/3191 (rpc-clnt: use the correct xdr_size for getting the iobuf) merged in master by Anand Avati (avati@redhat.com)
Comment 9 Amar Tumballi 2012-04-19 03:38:23 EDT
Thanks to Krishna for pointing me to the right place (by comment #6)
Comment 10 Sachidananda Urs 2012-04-20 01:58:57 EDT
Works on 3.3.0qa36

Note You need to log in before you can comment on or make changes to this bug.