Bug 1116872

Summary: BVT: Connectathon lock test failed on NFS V3
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Lalatendu Mohanty <lmohanty>
Component: gluster-nfsAssignee: Niels de Vos <ndevos>
Status: CLOSED WONTFIX QA Contact: Lalatendu Mohanty <lmohanty>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.0CC: lmohanty, mzywusko, rcyriac, rhs-bugs, sankarshan, skoduri, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-16 18:08:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs
none
logs none

Description Lalatendu Mohanty 2014-07-07 14:08:44 UTC
Description of problem:
Cthon lock test (i.e. F_ULOCK) failed while running it on GlusterFS volume using NFS V3. Please check "Actual results:" for cthon log snippet. 


Version-Release number of selected component (if applicable):

glusterfs-server-3.6.0.24-1.el6rhs.x86_64.rpm

How reproducible:
Intermittent

Steps to Reproduce:
1. Create a GlusterFS volume. Mount it using nfs V3 on client
2. Download Cthon i.e. git clone git://linux-nfs.org/~steved/cthon04.git
3. Run cthon on the mount point
${CTHON_PATH}/server -l -o vers=3 -p ${VOLNAME} -m ${mountPath} ${nfsServer}

Actual results:

Test #7 - Test parent/child mutual exclusion.
	Parent: 7.0  - F_TLOCK [             ffc,               9] PASSED.
	Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ].
	Parent: Now free child to run, should block on lock.
	Parent: Check data in file to insure child blocked.
	Parent: Read 'aaaa eh' from testfile [ 4092, 7 ].
	Parent: 7.1  - COMPARE [             ffc,               7] PASSED.
	Parent: Now unlock region so child will unblock.
	Parent: 7.2  - F_ULOCK [             ffc,               9] PASSED.
	Child:  7.3  - F_LOCK  [             ffc,               9] PASSED.
	Child:  Write child's version of the data and release lock.
	Parent: Now try to regain lock, parent should block.
	Child:  Wrote 'bebebebeb' to testfile [ 4092, 9 ].
	Parent: 7.5  - F_LOCK  [             ffc,               9] PASSED.
	Parent: Check data in file to insure child unblocked.
	Child:  7.4  - F_ULOCK [             ffc,               9] PASSED.
	Parent: Read 'bebebebeb' from testfile [ 4092, 9 ].
	Parent: 7.6  - COMPARE [             ffc,               9] PASSED.
	Parent: 7.7  - F_ULOCK [             ffc,               9] FAILED!
	Parent: **** Expected success, returned errno=37...
	Parent: **** Probably implementation error.

** PARENT pass 1 results: 26/26 pass, 0/0 warn, 1/1 fail (pass/total).

Expected results:

The test should not fail.

Additional info:

Comment 2 Lalatendu Mohanty 2014-07-07 15:21:41 UTC
Created attachment 916132 [details]
logs

Comment 3 Lalatendu Mohanty 2014-07-07 15:22:36 UTC
Beaker Job link: https://beaker.engineering.redhat.com/jobs/686753

Comment 4 santosh pradhan 2014-07-07 17:23:34 UTC
Can you point me the NFS log?

Comment 5 santosh pradhan 2014-07-07 17:27:21 UTC

Again, what is the chance of hitting the issue in 5 retries? If its just intermittent (may be once in 10 times) it should not be of high priority.

Comment 6 Lalatendu Mohanty 2014-07-08 01:51:08 UTC
Created attachment 916218 [details]
logs

Comment 7 Lalatendu Mohanty 2014-07-08 01:54:51 UTC
Till now, BVT has hit this issue  once out of 4 runs. As of now we have done only 4 BVT runs on build glusterfs-server-3.6.0.24-1.el6rhs.

Comment 8 Saurabh 2014-07-08 08:17:16 UTC
I have not seen the issue mentioned in this bug even after executing the cthon test for more than 10 times.

one of the runs test result is seen here,

** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total).

**  CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
Congratulations, you passed the locking tests!

All tests completed

Comment 9 santosh pradhan 2014-07-08 08:58:49 UTC
I tried the cthon lock test on both the nodes on 2x2 volume with iteration 10 and did not see the issue. 

I am not sure what is the next step. But lowering the severity would be the first step to do.

Comment 10 santosh pradhan 2014-07-08 09:04:16 UTC
Had a discussion with Lala (bug reporter) and we agreed upon:

1. Reduce the severity of the issue.

2. Keep it open for a while, if he hits it again (as he already mentioned that the occurrence is intermittent), may be it needs some investigation otherwise ll be closed as "Not a bug".

-Santosh

Comment 11 Lalatendu Mohanty 2014-07-08 09:47:40 UTC
Just to give some more background, we have been running Cthon in BVT from last 3 to 4 months. But this test has never failed in the past. Also from that point of time automation did not change.

Also in BVT, before this test, other tests like fssanity, dht, rebalance, top profile, volume set tests run, so there is a possibility that the issue came because of side effect of other tests. Just to be clear, we create new set-up i.e. gluster volume to run Cthon (i.e. we re-use the nodes but not the volume from other tests).  

So I would prefer to keep the bug open with low priority for considerable amount of time, before deciding about closing it as I have seen  reproduction rate of intermittent issues change in a release cycle.

Comment 12 Vivek Agarwal 2014-07-10 07:44:36 UTC
Per discussion and based on earlier comments, removing the blocker? flag

Comment 13 Lalatendu Mohanty 2014-07-10 15:25:48 UTC
This issue got reproduced again in today's BVT run.  So increased the severity to medium.

https://beaker.engineering.redhat.com/jobs/690097

It is exactly the same issue as before. Do you think same logs (what I provided last time)  will help? Or should I change the automation to get some more information. Because I think it would be difficult to reproduce it manually.

Comment 14 Lalatendu Mohanty 2014-08-04 09:07:33 UTC
This issue got reproduced in today's BVT run. GlusterFS build: glusterfs-server-3.6.0.25-1.el6rhs.x86_64

https://beaker.engineering.redhat.com/jobs/711348

Comment 15 Lalatendu Mohanty 2014-08-07 15:52:28 UTC
The issue again reproduced in BVT run on build : glusterfs-server-3.6.0.27-1.el6rhs.x86_64

https://beaker.engineering.redhat.com/jobs/714258

Comment 16 Lalatendu Mohanty 2014-09-08 06:02:11 UTC
Issue reproduced on build glusterfs-server-3.6.0.28-1.el6rhs.x86_64 

Beaker Job:https://beaker.engineering.redhat.com/jobs/739612

Comment 19 Lalatendu Mohanty 2014-11-29 03:03:13 UTC
This is again reproduced on glusterfs build 3.6.0.34-1 with two consecutive BVT runs.

https://beaker.engineering.redhat.com/jobs/813582
https://beaker.engineering.redhat.com/jobs/813593

Comment 20 Lalatendu Mohanty 2015-02-20 06:43:20 UTC
Another failure for RHS 3.0.4 i.e. glusterfs-server-3.6.0.45-1.el6rhs.x86_64
https://beaker.engineering.redhat.com/jobs/886802