Bug 236577 - [Realtime][Kernel][NFS] Connectathon NFSv3, UDP, Lock test fails against RHEL3 NFSv3 server
[Realtime][Kernel][NFS] Connectathon NFSv3, UDP, Lock test fails against RHEL...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
1.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
http://rhts.lab.boston.redhat.com/cgi...
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-04-16 11:18 EDT by Jeff Burke
Modified: 2012-01-09 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-01-09 17:07:29 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jeff Burke 2007-04-16 11:18:00 EDT
Description of problem:
While running the connectathon test suite the following test failed.
/kernel/filesystems/nfs/connectathon/rhel3-nfs/nfsvers=3_udp/lock

Version-Release number of selected component (if applicable):
 2.6.20-14.el5rt

How reproducible:
 Intermittant 10% of the time it fails.

Steps to Reproduce:
1. Run the RHTS test /kernel/filesystems/nfs/connectathon while running the RT
kernel
  
Actual results:
Start tests on path /mnt/rhel3-nfs/dhcp78-129.test [y/n]? 
sh ./runtests  -l -t /mnt/rhel3-nfs/dhcp78-129.test

Starting LOCKING tests: test directory /mnt/rhel3-nfs/dhcp78-129.test (arg: -t)

Testing native post-LFS locking

Creating parent/child synchronization pipes.

Test #1 - Test regions of an unlocked file.
	Parent: 1.1  - F_TEST  [               0,               1] PASSED.
	Parent: 1.2  - F_TEST  [               0,          ENDING] PASSED.
	Parent: 1.3  - F_TEST  [               0,7fffffffffffffff] PASSED.
	Parent: 1.4  - F_TEST  [               1,               1] PASSED.
	Parent: 1.5  - F_TEST  [               1,          ENDING] PASSED.
	Parent: 1.6  - F_TEST  [               1,7fffffffffffffff] PASSED.
	Parent: 1.7  - F_TEST  [7fffffffffffffff,               1] PASSED.
	Parent: 1.8  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
	Parent: 1.9  - F_TEST  [7fffffffffffffff,7fffffffffffffff] PASSED.

Test #2 - Try to lock the whole file.
	Parent: 2.0  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  2.1  - F_TEST  [               0,               1] PASSED.
	Child:  2.2  - F_TEST  [               0,          ENDING] PASSED.
	Child:  2.3  - F_TEST  [               0,7fffffffffffffff] PASSED.
	Child:  2.4  - F_TEST  [               1,               1] PASSED.
	Child:  2.5  - F_TEST  [               1,          ENDING] PASSED.
	Child:  2.6  - F_TEST  [               1,7fffffffffffffff] PASSED.
	Child:  2.7  - F_TEST  [7fffffffffffffff,               1] PASSED.
	Child:  2.8  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
	Child:  2.9  - F_TEST  [7fffffffffffffff,7fffffffffffffff] PASSED.
	Parent: 2.10 - F_ULOCK [               0,          ENDING] PASSED.

Test #3 - Try to lock just the 1st byte.
	Parent: 3.0  - F_TLOCK [               0,               1] PASSED.
	Child:  3.1  - F_TEST  [               0,               1] PASSED.
	Child:  3.2  - F_TEST  [               0,          ENDING] PASSED.
	Child:  3.3  - F_TEST  [               1,               1] PASSED.
	Child:  3.4  - F_TEST  [               1,          ENDING] PASSED.
	Parent: 3.5  - F_ULOCK [               0,               1] PASSED.

Test #4 - Try to lock the 2nd byte, test around it.
	Parent: 4.0  - F_TLOCK [               1,               1] PASSED.
	Child:  4.1  - F_TEST  [               0,               1] PASSED.
	Child:  4.2  - F_TEST  [               0,               2] PASSED.
	Child:  4.3  - F_TEST  [               0,          ENDING] PASSED.
	Child:  4.4  - F_TEST  [               1,               1] PASSED.
	Child:  4.5  - F_TEST  [               1,               2] PASSED.
	Child:  4.6  - F_TEST  [               1,          ENDING] PASSED.
	Child:  4.7  - F_TEST  [               2,               1] PASSED.
	Child:  4.8  - F_TEST  [               2,               2] PASSED.
	Child:  4.9  - F_TEST  [               2,          ENDING] PASSED.
	Parent: 4.10 - F_ULOCK [               1,               1] PASSED.

Test #5 - Try to lock 1st and 2nd bytes, test around them.
	Parent: 5.0  - F_TLOCK [               0,               1] PASSED.
	Parent: 5.1  - F_TLOCK [               2,               1] PASSED.
	Child:  5.2  - F_TEST  [               0,               1] PASSED.
	Child:  5.3  - F_TEST  [               0,               2] PASSED.
	Child:  5.4  - F_TEST  [               0,          ENDING] PASSED.
	Child:  5.5  - F_TEST  [               1,               1] PASSED.
	Child:  5.6  - F_TEST  [               1,               2] PASSED.
	Child:  5.7  - F_TEST  [               1,          ENDING] PASSED.
	Child:  5.8  - F_TEST  [               2,               1] PASSED.
	Child:  5.9  - F_TEST  [               2,               2] PASSED.
	Child:  5.10 - F_TEST  [               2,          ENDING] PASSED.
	Child:  5.11 - F_TEST  [               3,               1] PASSED.
	Child:  5.12 - F_TEST  [               3,               2] PASSED.
	Child:  5.13 - F_TEST  [               3,          ENDING] PASSED.
	Parent: 5.14 - F_ULOCK [               0,               1] PASSED.
	Parent: 5.15 - F_ULOCK [               2,               1] PASSED.

Test #6 - Try to lock the MAXEOF byte.
	Parent: 6.0  - F_TLOCK [7fffffffffffffff,               1] PASSED.
	Child:  6.1  - F_TEST  [7ffffffffffffffe,               1] PASSED.
	Child:  6.2  - F_TEST  [7ffffffffffffffe,               2] PASSED.
	Child:  6.3  - F_TEST  [7ffffffffffffffe,          ENDING] PASSED.
	Child:  6.4  - F_TEST  [7fffffffffffffff,               1] PASSED.
	Child:  6.5  - F_TEST  [7fffffffffffffff,               2] PASSED.
	Child:  6.6  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
	Child:  6.7  - F_TEST  [8000000000000000,          ENDING] PASSED.
	Child:  6.8  - F_TEST  [8000000000000000,               1] PASSED.
	Child:  6.9  - F_TEST  [8000000000000000,7fffffffffffffff] PASSED.
	Child:  6.10 - F_TEST  [8000000000000000,8000000000000000] PASSED.
	Parent: 6.11 - F_ULOCK [7fffffffffffffff,               1] PASSED.

Test #7 - Test parent/child mutual exclusion.
	Parent: 7.0  - F_TLOCK [             ffc,               9] PASSED.
	Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ].
	Parent: Now free child to run, should block on lock.
	Parent: Check data in file to insure child blocked.
	Parent: Read 'aaaa eh' from testfile [ 4092, 7 ].
	Parent: 7.1  - COMPARE [             ffc,               7] PASSED.
	Parent: Now unlock region so child will unblock.
	Parent: 7.2  - F_ULOCK [             ffc,               9] PASSED.
	Child:  7.3  - F_LOCK  [             ffc,               9] PASSED.
	Child:  Write child's version of the data and release lock.
	Parent: Now try to regain lock, parent should block.
	Child:  Wrote 'bebebebeb' to testfile [ 4092, 9 ].
	Child:  7.4  - F_ULOCK [             ffc,               9] PASSED.
	Parent: 7.5  - F_LOCK  [             ffc,               9] PASSED.
	Parent: Check data in file to insure child unblocked.
	Parent: Read 'bebebebeb' from testfile [ 4092, 9 ].
	Parent: 7.6  - COMPARE [             ffc,               9] PASSED.
	Parent: 7.7  - F_ULOCK [             ffc,               9] PASSED.

Test #8 - Rate test performing lock/unlock cycles.
	Parent: Performed 1000 lock/unlock cycles in 560 msecs. [214285 lpm].

Test #10 - Make sure a locked region is split properly.
	Parent: 10.0  - F_TLOCK [               0,               3] PASSED.
	Parent: 10.1  - F_ULOCK [               1,               1] PASSED.
	Child:  10.2  - F_TEST  [               0,               1] PASSED.
	Child:  10.3  - F_TEST  [               2,               1] PASSED.
	Child:  10.4  - F_TEST  [               3,          ENDING] PASSED.
	Child:  10.5  - F_TEST  [               1,               1] PASSED.
	Parent: 10.6  - F_ULOCK [               0,               1] PASSED.
	Parent: 10.7  - F_ULOCK [               2,               1] PASSED.
	Child:  10.8  - F_TEST  [               0,               3] PASSED.
	Parent: 10.9  - F_ULOCK [               0,               1] PASSED.
	Parent: 10.10 - F_TLOCK [               1,               3] PASSED.
	Parent: 10.11 - F_ULOCK [               2,               1] PASSED.
	Child:  10.12 - F_TEST  [               1,               1] PASSED.
	Child:  10.13 - F_TEST  [               3,               1] PASSED.
	Child:  10.14 - F_TEST  [               4,          ENDING] PASSED.
	Child:  10.15 - F_TEST  [               2,               1] PASSED.
	Child:  10.16 - F_TEST  [               0,               1] PASSED.

Test #11 - Make sure close() releases the process's locks.
	Parent: 11.0  - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Closed testfile.
	Child:  11.1  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.2  - F_ULOCK [               0,          ENDING] PASSED.
	Parent: 11.3  - F_TLOCK [              1d,             5b7] PASSED.
	Parent: 11.4  - F_TLOCK [            2000,              57] PASSED.
	Parent: Closed testfile.
	Child:  11.5  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.6  - F_ULOCK [               0,          ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 0, 16 ].
	Parent: 11.7  - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 13, 16 ].
	Parent: Closed testfile.
	Child:  11.8  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.9  - F_ULOCK [               0,          ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 0, 16 ].
	Parent: 11.10 - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Truncated testfile.
	Parent: Closed testfile.
	Child:  11.11 - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.12 - F_ULOCK [               0,          ENDING] PASSED.

Test #12 - Signalled process should release locks.
	Child:  12.0  - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Killed child process.
	Parent: 12.1  - F_TLOCK [               0,          ENDING] FAILED!
	Parent: **** Expected success, returned EAGAIN...
	Parent: **** Probably implementation error.

** PARENT pass 1 results: 39/39 pass, 0/0 warn, 1/1 fail (pass/total).

**  CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
lock tests failed

Expected results:
 This test should pass 100% of the time.

Additional info:
 Here is a link to the packet capture while the test was running.
http://rhts.lab.boston.redhat.com/testlogs/18994/73670/344426/rhel3-nfs_nfsvers=3_udp_lock.cap.bz2
Comment 1 Steve Dickson 2007-05-01 10:40:40 EDT
Have you every seen this on TCP mounts? 
Comment 2 Jeff Burke 2007-05-01 10:42:18 EDT
Nope, if i did I would have open another BZ
Comment 3 Steve Dickson 2007-05-01 11:35:20 EDT
Well putting the server under load, I am able to
get the lock tests to hang, but in test 8 (the rate
test) instead of test 7 (the blocking test). 

One other notable, is using a RHEL5 xen guest I
do see the same hang, but it appears the xen
kernel does recover.. 
Comment 4 Steve Dickson 2007-05-01 15:03:56 EDT
A couple of things... 

1) It appears this is a UDP only thing since I
am not able to reproduce the hang using TCP mounts... 

2) the ethereal trace is no longer avaiable... 
Comment 5 Jeff Burke 2007-05-01 15:54:36 EDT
Steve,
   RHTS was upgraded yesterday, The link was broken. You can find the
information here.
http://armstrong.rhts.boston.redhat.com/testlogs/18994/73670/344426/rhel3-nfs_nfsvers=3_udp_lock.cap.bz2

Thanks Jeff
Comment 6 Steve Dickson 2007-05-03 17:40:47 EDT
Ok... after further review... it appears this problem only
appears on RT kernels... Using a non-rt kernel, I was not
able to get the cthon test to hang... 
Comment 7 Tim Burke 2007-06-14 15:36:38 EDT
Steve - have you differentiated whether this issue is RT specific vs a more
recent upstream issue?  Clark does a generic kernel build which is just the
upstream stuff without the RT patches.  Does the flaw occur there?
Comment 8 Steve Dickson 2007-06-15 05:45:52 EDT
I generally use Rawhide kernels for my "generic" kernels. when I ran
using those I was not able to reproduce the problem. But its been
a while since I tried... let me set up another test to see how 
things have changed... if they have... 
Comment 9 Peter Staubach 2008-04-28 15:45:10 EDT
This issue has been seen on vanilla RHEL-5.  It was also observed upstream
and should be fixed there.

There is also a testsuite timing dependency which can lead to this
behavior.  The testsuite in RHTS has been patched.
Comment 10 Clark Williams 2008-05-27 18:11:41 EDT
I haven't seen this on recent  RHTS runs. closing for now
Comment 11 Peter Staubach 2008-05-28 16:51:36 EDT
I see this, or at least, I did before I constructed the testsuite
changes and the kernel changes to address it.
Comment 12 Clark Williams 2009-02-10 18:30:46 EST
do we still see this on recent (2.6.24.7-101.el5rt) RT kernels?
Comment 13 Jeff Burke 2009-02-11 15:36:43 EST
Clark,
  Yes we still see this with the current version 2.6.24.7-101.el5rt

Here are the links to the results:
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6270210

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6278249

Thanks,
Jeff
Comment 14 Luis Claudio R. Goncalves 2009-02-25 16:05:12 EST
Isn't this bug related to BZ486645? Jon Masters added a patch to kernel -104 that may help fixing this issue (though I haven't been seeing these failures recently - at least not the ones that used to freeze the machine).
Comment 15 Jeff Burke 2009-02-25 16:40:44 EST
I do not believe4 so. This is a different type of failure. It is a race condition. The latest RHEL5.4 kernel fixed the issue.
https://bugzilla.redhat.com/show_bug.cgi?id=448929
Comment 16 Beth Uptagrafft 2009-09-22 15:40:17 EDT
Do we still see this on recent (2.6.24.7-132.el5rt) RT kernels?
Comment 17 Jeff Burke 2009-09-23 09:00:39 EDT
Looking at the latest results. I do not see the failure. But it is a timing bug.
Comment 19 Clark Williams 2012-01-09 17:07:29 EST
Haven't seen this issue on our 2.6.33  or 3.0.9 kernels

Note You need to log in before you can comment on or make changes to this bug.