Description of problem: Cthon lock test (i.e. F_ULOCK) failed while running it on GlusterFS volume using NFS V3. Please check "Actual results:" for cthon log snippet. Version-Release number of selected component (if applicable): glusterfs-server-3.6.0.24-1.el6rhs.x86_64.rpm How reproducible: Intermittent Steps to Reproduce: 1. Create a GlusterFS volume. Mount it using nfs V3 on client 2. Download Cthon i.e. git clone git://linux-nfs.org/~steved/cthon04.git 3. Run cthon on the mount point ${CTHON_PATH}/server -l -o vers=3 -p ${VOLNAME} -m ${mountPath} ${nfsServer} Actual results: Test #7 - Test parent/child mutual exclusion. Parent: 7.0 - F_TLOCK [ ffc, 9] PASSED. Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ]. Parent: Now free child to run, should block on lock. Parent: Check data in file to insure child blocked. Parent: Read 'aaaa eh' from testfile [ 4092, 7 ]. Parent: 7.1 - COMPARE [ ffc, 7] PASSED. Parent: Now unlock region so child will unblock. Parent: 7.2 - F_ULOCK [ ffc, 9] PASSED. Child: 7.3 - F_LOCK [ ffc, 9] PASSED. Child: Write child's version of the data and release lock. Parent: Now try to regain lock, parent should block. Child: Wrote 'bebebebeb' to testfile [ 4092, 9 ]. Parent: 7.5 - F_LOCK [ ffc, 9] PASSED. Parent: Check data in file to insure child unblocked. Child: 7.4 - F_ULOCK [ ffc, 9] PASSED. Parent: Read 'bebebebeb' from testfile [ 4092, 9 ]. Parent: 7.6 - COMPARE [ ffc, 9] PASSED. Parent: 7.7 - F_ULOCK [ ffc, 9] FAILED! Parent: **** Expected success, returned errno=37... Parent: **** Probably implementation error. ** PARENT pass 1 results: 26/26 pass, 0/0 warn, 1/1 fail (pass/total). Expected results: The test should not fail. Additional info:
Created attachment 916132 [details] logs
Beaker Job link: https://beaker.engineering.redhat.com/jobs/686753
Can you point me the NFS log?
Again, what is the chance of hitting the issue in 5 retries? If its just intermittent (may be once in 10 times) it should not be of high priority.
Created attachment 916218 [details] logs
Till now, BVT has hit this issue once out of 4 runs. As of now we have done only 4 BVT runs on build glusterfs-server-3.6.0.24-1.el6rhs.
I have not seen the issue mentioned in this bug even after executing the cthon test for more than 10 times. one of the runs test result is seen here, ** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total). ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total). Congratulations, you passed the locking tests! All tests completed
I tried the cthon lock test on both the nodes on 2x2 volume with iteration 10 and did not see the issue. I am not sure what is the next step. But lowering the severity would be the first step to do.
Had a discussion with Lala (bug reporter) and we agreed upon: 1. Reduce the severity of the issue. 2. Keep it open for a while, if he hits it again (as he already mentioned that the occurrence is intermittent), may be it needs some investigation otherwise ll be closed as "Not a bug". -Santosh
Just to give some more background, we have been running Cthon in BVT from last 3 to 4 months. But this test has never failed in the past. Also from that point of time automation did not change. Also in BVT, before this test, other tests like fssanity, dht, rebalance, top profile, volume set tests run, so there is a possibility that the issue came because of side effect of other tests. Just to be clear, we create new set-up i.e. gluster volume to run Cthon (i.e. we re-use the nodes but not the volume from other tests). So I would prefer to keep the bug open with low priority for considerable amount of time, before deciding about closing it as I have seen reproduction rate of intermittent issues change in a release cycle.
Per discussion and based on earlier comments, removing the blocker? flag
This issue got reproduced again in today's BVT run. So increased the severity to medium. https://beaker.engineering.redhat.com/jobs/690097 It is exactly the same issue as before. Do you think same logs (what I provided last time) will help? Or should I change the automation to get some more information. Because I think it would be difficult to reproduce it manually.
This issue got reproduced in today's BVT run. GlusterFS build: glusterfs-server-3.6.0.25-1.el6rhs.x86_64 https://beaker.engineering.redhat.com/jobs/711348
The issue again reproduced in BVT run on build : glusterfs-server-3.6.0.27-1.el6rhs.x86_64 https://beaker.engineering.redhat.com/jobs/714258
Issue reproduced on build glusterfs-server-3.6.0.28-1.el6rhs.x86_64 Beaker Job:https://beaker.engineering.redhat.com/jobs/739612
This is again reproduced on glusterfs build 3.6.0.34-1 with two consecutive BVT runs. https://beaker.engineering.redhat.com/jobs/813582 https://beaker.engineering.redhat.com/jobs/813593
Another failure for RHS 3.0.4 i.e. glusterfs-server-3.6.0.45-1.el6rhs.x86_64 https://beaker.engineering.redhat.com/jobs/886802