Bug 1116872 - BVT: Connectathon lock test failed on NFS V3
Summary: BVT: Connectathon lock test failed on NFS V3
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: gluster-nfs
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Niels de Vos
QA Contact: Lalatendu Mohanty
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-07 14:08 UTC by Lalatendu Mohanty
Modified: 2018-04-16 18:09 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-16 18:08:54 UTC
Embargoed:


Attachments (Terms of Use)
logs (1.06 MB, application/x-tar)
2014-07-07 15:21 UTC, Lalatendu Mohanty
no flags Details
logs (2.29 MB, application/x-tar)
2014-07-08 01:51 UTC, Lalatendu Mohanty
no flags Details

Description Lalatendu Mohanty 2014-07-07 14:08:44 UTC
Description of problem:
Cthon lock test (i.e. F_ULOCK) failed while running it on GlusterFS volume using NFS V3. Please check "Actual results:" for cthon log snippet. 


Version-Release number of selected component (if applicable):

glusterfs-server-3.6.0.24-1.el6rhs.x86_64.rpm

How reproducible:
Intermittent

Steps to Reproduce:
1. Create a GlusterFS volume. Mount it using nfs V3 on client
2. Download Cthon i.e. git clone git://linux-nfs.org/~steved/cthon04.git
3. Run cthon on the mount point
${CTHON_PATH}/server -l -o vers=3 -p ${VOLNAME} -m ${mountPath} ${nfsServer}

Actual results:

Test #7 - Test parent/child mutual exclusion.
	Parent: 7.0  - F_TLOCK [             ffc,               9] PASSED.
	Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ].
	Parent: Now free child to run, should block on lock.
	Parent: Check data in file to insure child blocked.
	Parent: Read 'aaaa eh' from testfile [ 4092, 7 ].
	Parent: 7.1  - COMPARE [             ffc,               7] PASSED.
	Parent: Now unlock region so child will unblock.
	Parent: 7.2  - F_ULOCK [             ffc,               9] PASSED.
	Child:  7.3  - F_LOCK  [             ffc,               9] PASSED.
	Child:  Write child's version of the data and release lock.
	Parent: Now try to regain lock, parent should block.
	Child:  Wrote 'bebebebeb' to testfile [ 4092, 9 ].
	Parent: 7.5  - F_LOCK  [             ffc,               9] PASSED.
	Parent: Check data in file to insure child unblocked.
	Child:  7.4  - F_ULOCK [             ffc,               9] PASSED.
	Parent: Read 'bebebebeb' from testfile [ 4092, 9 ].
	Parent: 7.6  - COMPARE [             ffc,               9] PASSED.
	Parent: 7.7  - F_ULOCK [             ffc,               9] FAILED!
	Parent: **** Expected success, returned errno=37...
	Parent: **** Probably implementation error.

** PARENT pass 1 results: 26/26 pass, 0/0 warn, 1/1 fail (pass/total).

Expected results:

The test should not fail.

Additional info:

Comment 2 Lalatendu Mohanty 2014-07-07 15:21:41 UTC
Created attachment 916132 [details]
logs

Comment 3 Lalatendu Mohanty 2014-07-07 15:22:36 UTC
Beaker Job link: https://beaker.engineering.redhat.com/jobs/686753

Comment 4 santosh pradhan 2014-07-07 17:23:34 UTC
Can you point me the NFS log?

Comment 5 santosh pradhan 2014-07-07 17:27:21 UTC

Again, what is the chance of hitting the issue in 5 retries? If its just intermittent (may be once in 10 times) it should not be of high priority.

Comment 6 Lalatendu Mohanty 2014-07-08 01:51:08 UTC
Created attachment 916218 [details]
logs

Comment 7 Lalatendu Mohanty 2014-07-08 01:54:51 UTC
Till now, BVT has hit this issue  once out of 4 runs. As of now we have done only 4 BVT runs on build glusterfs-server-3.6.0.24-1.el6rhs.

Comment 8 Saurabh 2014-07-08 08:17:16 UTC
I have not seen the issue mentioned in this bug even after executing the cthon test for more than 10 times.

one of the runs test result is seen here,

** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total).

**  CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
Congratulations, you passed the locking tests!

All tests completed

Comment 9 santosh pradhan 2014-07-08 08:58:49 UTC
I tried the cthon lock test on both the nodes on 2x2 volume with iteration 10 and did not see the issue. 

I am not sure what is the next step. But lowering the severity would be the first step to do.

Comment 10 santosh pradhan 2014-07-08 09:04:16 UTC
Had a discussion with Lala (bug reporter) and we agreed upon:

1. Reduce the severity of the issue.

2. Keep it open for a while, if he hits it again (as he already mentioned that the occurrence is intermittent), may be it needs some investigation otherwise ll be closed as "Not a bug".

-Santosh

Comment 11 Lalatendu Mohanty 2014-07-08 09:47:40 UTC
Just to give some more background, we have been running Cthon in BVT from last 3 to 4 months. But this test has never failed in the past. Also from that point of time automation did not change.

Also in BVT, before this test, other tests like fssanity, dht, rebalance, top profile, volume set tests run, so there is a possibility that the issue came because of side effect of other tests. Just to be clear, we create new set-up i.e. gluster volume to run Cthon (i.e. we re-use the nodes but not the volume from other tests).  

So I would prefer to keep the bug open with low priority for considerable amount of time, before deciding about closing it as I have seen  reproduction rate of intermittent issues change in a release cycle.

Comment 12 Vivek Agarwal 2014-07-10 07:44:36 UTC
Per discussion and based on earlier comments, removing the blocker? flag

Comment 13 Lalatendu Mohanty 2014-07-10 15:25:48 UTC
This issue got reproduced again in today's BVT run.  So increased the severity to medium.

https://beaker.engineering.redhat.com/jobs/690097

It is exactly the same issue as before. Do you think same logs (what I provided last time)  will help? Or should I change the automation to get some more information. Because I think it would be difficult to reproduce it manually.

Comment 14 Lalatendu Mohanty 2014-08-04 09:07:33 UTC
This issue got reproduced in today's BVT run. GlusterFS build: glusterfs-server-3.6.0.25-1.el6rhs.x86_64

https://beaker.engineering.redhat.com/jobs/711348

Comment 15 Lalatendu Mohanty 2014-08-07 15:52:28 UTC
The issue again reproduced in BVT run on build : glusterfs-server-3.6.0.27-1.el6rhs.x86_64

https://beaker.engineering.redhat.com/jobs/714258

Comment 16 Lalatendu Mohanty 2014-09-08 06:02:11 UTC
Issue reproduced on build glusterfs-server-3.6.0.28-1.el6rhs.x86_64 

Beaker Job:https://beaker.engineering.redhat.com/jobs/739612

Comment 19 Lalatendu Mohanty 2014-11-29 03:03:13 UTC
This is again reproduced on glusterfs build 3.6.0.34-1 with two consecutive BVT runs.

https://beaker.engineering.redhat.com/jobs/813582
https://beaker.engineering.redhat.com/jobs/813593

Comment 20 Lalatendu Mohanty 2015-02-20 06:43:20 UTC
Another failure for RHS 3.0.4 i.e. glusterfs-server-3.6.0.45-1.el6rhs.x86_64
https://beaker.engineering.redhat.com/jobs/886802


Note You need to log in before you can comment on or make changes to this bug.