Bug 236577 - [Realtime][Kernel][NFS] Connectathon NFSv3, UDP, Lock test fails against RHEL3 NFSv3 server
Summary: [Realtime][Kernel][NFS] Connectathon NFSv3, UDP, Lock test fails against RHEL...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 1.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Steve Dickson
QA Contact:
URL: http://rhts.lab.boston.redhat.com/cgi...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-04-16 15:18 UTC by Jeff Burke
Modified: 2012-01-09 22:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-01-09 22:07:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeff Burke 2007-04-16 15:18:00 UTC
Description of problem:
While running the connectathon test suite the following test failed.
/kernel/filesystems/nfs/connectathon/rhel3-nfs/nfsvers=3_udp/lock

Version-Release number of selected component (if applicable):
 2.6.20-14.el5rt

How reproducible:
 Intermittant 10% of the time it fails.

Steps to Reproduce:
1. Run the RHTS test /kernel/filesystems/nfs/connectathon while running the RT
kernel
  
Actual results:
Start tests on path /mnt/rhel3-nfs/dhcp78-129.test [y/n]? 
sh ./runtests  -l -t /mnt/rhel3-nfs/dhcp78-129.test

Starting LOCKING tests: test directory /mnt/rhel3-nfs/dhcp78-129.test (arg: -t)

Testing native post-LFS locking

Creating parent/child synchronization pipes.

Test #1 - Test regions of an unlocked file.
	Parent: 1.1  - F_TEST  [               0,               1] PASSED.
	Parent: 1.2  - F_TEST  [               0,          ENDING] PASSED.
	Parent: 1.3  - F_TEST  [               0,7fffffffffffffff] PASSED.
	Parent: 1.4  - F_TEST  [               1,               1] PASSED.
	Parent: 1.5  - F_TEST  [               1,          ENDING] PASSED.
	Parent: 1.6  - F_TEST  [               1,7fffffffffffffff] PASSED.
	Parent: 1.7  - F_TEST  [7fffffffffffffff,               1] PASSED.
	Parent: 1.8  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
	Parent: 1.9  - F_TEST  [7fffffffffffffff,7fffffffffffffff] PASSED.

Test #2 - Try to lock the whole file.
	Parent: 2.0  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  2.1  - F_TEST  [               0,               1] PASSED.
	Child:  2.2  - F_TEST  [               0,          ENDING] PASSED.
	Child:  2.3  - F_TEST  [               0,7fffffffffffffff] PASSED.
	Child:  2.4  - F_TEST  [               1,               1] PASSED.
	Child:  2.5  - F_TEST  [               1,          ENDING] PASSED.
	Child:  2.6  - F_TEST  [               1,7fffffffffffffff] PASSED.
	Child:  2.7  - F_TEST  [7fffffffffffffff,               1] PASSED.
	Child:  2.8  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
	Child:  2.9  - F_TEST  [7fffffffffffffff,7fffffffffffffff] PASSED.
	Parent: 2.10 - F_ULOCK [               0,          ENDING] PASSED.

Test #3 - Try to lock just the 1st byte.
	Parent: 3.0  - F_TLOCK [               0,               1] PASSED.
	Child:  3.1  - F_TEST  [               0,               1] PASSED.
	Child:  3.2  - F_TEST  [               0,          ENDING] PASSED.
	Child:  3.3  - F_TEST  [               1,               1] PASSED.
	Child:  3.4  - F_TEST  [               1,          ENDING] PASSED.
	Parent: 3.5  - F_ULOCK [               0,               1] PASSED.

Test #4 - Try to lock the 2nd byte, test around it.
	Parent: 4.0  - F_TLOCK [               1,               1] PASSED.
	Child:  4.1  - F_TEST  [               0,               1] PASSED.
	Child:  4.2  - F_TEST  [               0,               2] PASSED.
	Child:  4.3  - F_TEST  [               0,          ENDING] PASSED.
	Child:  4.4  - F_TEST  [               1,               1] PASSED.
	Child:  4.5  - F_TEST  [               1,               2] PASSED.
	Child:  4.6  - F_TEST  [               1,          ENDING] PASSED.
	Child:  4.7  - F_TEST  [               2,               1] PASSED.
	Child:  4.8  - F_TEST  [               2,               2] PASSED.
	Child:  4.9  - F_TEST  [               2,          ENDING] PASSED.
	Parent: 4.10 - F_ULOCK [               1,               1] PASSED.

Test #5 - Try to lock 1st and 2nd bytes, test around them.
	Parent: 5.0  - F_TLOCK [               0,               1] PASSED.
	Parent: 5.1  - F_TLOCK [               2,               1] PASSED.
	Child:  5.2  - F_TEST  [               0,               1] PASSED.
	Child:  5.3  - F_TEST  [               0,               2] PASSED.
	Child:  5.4  - F_TEST  [               0,          ENDING] PASSED.
	Child:  5.5  - F_TEST  [               1,               1] PASSED.
	Child:  5.6  - F_TEST  [               1,               2] PASSED.
	Child:  5.7  - F_TEST  [               1,          ENDING] PASSED.
	Child:  5.8  - F_TEST  [               2,               1] PASSED.
	Child:  5.9  - F_TEST  [               2,               2] PASSED.
	Child:  5.10 - F_TEST  [               2,          ENDING] PASSED.
	Child:  5.11 - F_TEST  [               3,               1] PASSED.
	Child:  5.12 - F_TEST  [               3,               2] PASSED.
	Child:  5.13 - F_TEST  [               3,          ENDING] PASSED.
	Parent: 5.14 - F_ULOCK [               0,               1] PASSED.
	Parent: 5.15 - F_ULOCK [               2,               1] PASSED.

Test #6 - Try to lock the MAXEOF byte.
	Parent: 6.0  - F_TLOCK [7fffffffffffffff,               1] PASSED.
	Child:  6.1  - F_TEST  [7ffffffffffffffe,               1] PASSED.
	Child:  6.2  - F_TEST  [7ffffffffffffffe,               2] PASSED.
	Child:  6.3  - F_TEST  [7ffffffffffffffe,          ENDING] PASSED.
	Child:  6.4  - F_TEST  [7fffffffffffffff,               1] PASSED.
	Child:  6.5  - F_TEST  [7fffffffffffffff,               2] PASSED.
	Child:  6.6  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
	Child:  6.7  - F_TEST  [8000000000000000,          ENDING] PASSED.
	Child:  6.8  - F_TEST  [8000000000000000,               1] PASSED.
	Child:  6.9  - F_TEST  [8000000000000000,7fffffffffffffff] PASSED.
	Child:  6.10 - F_TEST  [8000000000000000,8000000000000000] PASSED.
	Parent: 6.11 - F_ULOCK [7fffffffffffffff,               1] PASSED.

Test #7 - Test parent/child mutual exclusion.
	Parent: 7.0  - F_TLOCK [             ffc,               9] PASSED.
	Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ].
	Parent: Now free child to run, should block on lock.
	Parent: Check data in file to insure child blocked.
	Parent: Read 'aaaa eh' from testfile [ 4092, 7 ].
	Parent: 7.1  - COMPARE [             ffc,               7] PASSED.
	Parent: Now unlock region so child will unblock.
	Parent: 7.2  - F_ULOCK [             ffc,               9] PASSED.
	Child:  7.3  - F_LOCK  [             ffc,               9] PASSED.
	Child:  Write child's version of the data and release lock.
	Parent: Now try to regain lock, parent should block.
	Child:  Wrote 'bebebebeb' to testfile [ 4092, 9 ].
	Child:  7.4  - F_ULOCK [             ffc,               9] PASSED.
	Parent: 7.5  - F_LOCK  [             ffc,               9] PASSED.
	Parent: Check data in file to insure child unblocked.
	Parent: Read 'bebebebeb' from testfile [ 4092, 9 ].
	Parent: 7.6  - COMPARE [             ffc,               9] PASSED.
	Parent: 7.7  - F_ULOCK [             ffc,               9] PASSED.

Test #8 - Rate test performing lock/unlock cycles.
	Parent: Performed 1000 lock/unlock cycles in 560 msecs. [214285 lpm].

Test #10 - Make sure a locked region is split properly.
	Parent: 10.0  - F_TLOCK [               0,               3] PASSED.
	Parent: 10.1  - F_ULOCK [               1,               1] PASSED.
	Child:  10.2  - F_TEST  [               0,               1] PASSED.
	Child:  10.3  - F_TEST  [               2,               1] PASSED.
	Child:  10.4  - F_TEST  [               3,          ENDING] PASSED.
	Child:  10.5  - F_TEST  [               1,               1] PASSED.
	Parent: 10.6  - F_ULOCK [               0,               1] PASSED.
	Parent: 10.7  - F_ULOCK [               2,               1] PASSED.
	Child:  10.8  - F_TEST  [               0,               3] PASSED.
	Parent: 10.9  - F_ULOCK [               0,               1] PASSED.
	Parent: 10.10 - F_TLOCK [               1,               3] PASSED.
	Parent: 10.11 - F_ULOCK [               2,               1] PASSED.
	Child:  10.12 - F_TEST  [               1,               1] PASSED.
	Child:  10.13 - F_TEST  [               3,               1] PASSED.
	Child:  10.14 - F_TEST  [               4,          ENDING] PASSED.
	Child:  10.15 - F_TEST  [               2,               1] PASSED.
	Child:  10.16 - F_TEST  [               0,               1] PASSED.

Test #11 - Make sure close() releases the process's locks.
	Parent: 11.0  - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Closed testfile.
	Child:  11.1  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.2  - F_ULOCK [               0,          ENDING] PASSED.
	Parent: 11.3  - F_TLOCK [              1d,             5b7] PASSED.
	Parent: 11.4  - F_TLOCK [            2000,              57] PASSED.
	Parent: Closed testfile.
	Child:  11.5  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.6  - F_ULOCK [               0,          ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 0, 16 ].
	Parent: 11.7  - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 13, 16 ].
	Parent: Closed testfile.
	Child:  11.8  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.9  - F_ULOCK [               0,          ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 0, 16 ].
	Parent: 11.10 - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Truncated testfile.
	Parent: Closed testfile.
	Child:  11.11 - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.12 - F_ULOCK [               0,          ENDING] PASSED.

Test #12 - Signalled process should release locks.
	Child:  12.0  - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Killed child process.
	Parent: 12.1  - F_TLOCK [               0,          ENDING] FAILED!
	Parent: **** Expected success, returned EAGAIN...
	Parent: **** Probably implementation error.

** PARENT pass 1 results: 39/39 pass, 0/0 warn, 1/1 fail (pass/total).

**  CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
lock tests failed

Expected results:
 This test should pass 100% of the time.

Additional info:
 Here is a link to the packet capture while the test was running.
http://rhts.lab.boston.redhat.com/testlogs/18994/73670/344426/rhel3-nfs_nfsvers=3_udp_lock.cap.bz2

Comment 1 Steve Dickson 2007-05-01 14:40:40 UTC
Have you every seen this on TCP mounts? 

Comment 2 Jeff Burke 2007-05-01 14:42:18 UTC
Nope, if i did I would have open another BZ

Comment 3 Steve Dickson 2007-05-01 15:35:20 UTC
Well putting the server under load, I am able to
get the lock tests to hang, but in test 8 (the rate
test) instead of test 7 (the blocking test). 

One other notable, is using a RHEL5 xen guest I
do see the same hang, but it appears the xen
kernel does recover.. 

Comment 4 Steve Dickson 2007-05-01 19:03:56 UTC
A couple of things... 

1) It appears this is a UDP only thing since I
am not able to reproduce the hang using TCP mounts... 

2) the ethereal trace is no longer avaiable... 

Comment 5 Jeff Burke 2007-05-01 19:54:36 UTC
Steve,
   RHTS was upgraded yesterday, The link was broken. You can find the
information here.
http://armstrong.rhts.boston.redhat.com/testlogs/18994/73670/344426/rhel3-nfs_nfsvers=3_udp_lock.cap.bz2

Thanks Jeff

Comment 6 Steve Dickson 2007-05-03 21:40:47 UTC
Ok... after further review... it appears this problem only
appears on RT kernels... Using a non-rt kernel, I was not
able to get the cthon test to hang... 

Comment 7 Tim Burke 2007-06-14 19:36:38 UTC
Steve - have you differentiated whether this issue is RT specific vs a more
recent upstream issue?  Clark does a generic kernel build which is just the
upstream stuff without the RT patches.  Does the flaw occur there?

Comment 8 Steve Dickson 2007-06-15 09:45:52 UTC
I generally use Rawhide kernels for my "generic" kernels. when I ran
using those I was not able to reproduce the problem. But its been
a while since I tried... let me set up another test to see how 
things have changed... if they have... 

Comment 9 Peter Staubach 2008-04-28 19:45:10 UTC
This issue has been seen on vanilla RHEL-5.  It was also observed upstream
and should be fixed there.

There is also a testsuite timing dependency which can lead to this
behavior.  The testsuite in RHTS has been patched.

Comment 10 Clark Williams 2008-05-27 22:11:41 UTC
I haven't seen this on recent  RHTS runs. closing for now

Comment 11 Peter Staubach 2008-05-28 20:51:36 UTC
I see this, or at least, I did before I constructed the testsuite
changes and the kernel changes to address it.

Comment 12 Clark Williams 2009-02-10 23:30:46 UTC
do we still see this on recent (2.6.24.7-101.el5rt) RT kernels?

Comment 13 Jeff Burke 2009-02-11 20:36:43 UTC
Clark,
  Yes we still see this with the current version 2.6.24.7-101.el5rt

Here are the links to the results:
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6270210

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6278249

Thanks,
Jeff

Comment 14 Luis Claudio R. Goncalves 2009-02-25 21:05:12 UTC
Isn't this bug related to BZ486645? Jon Masters added a patch to kernel -104 that may help fixing this issue (though I haven't been seeing these failures recently - at least not the ones that used to freeze the machine).

Comment 15 Jeff Burke 2009-02-25 21:40:44 UTC
I do not believe4 so. This is a different type of failure. It is a race condition. The latest RHEL5.4 kernel fixed the issue.
https://bugzilla.redhat.com/show_bug.cgi?id=448929

Comment 16 Beth Uptagrafft 2009-09-22 19:40:17 UTC
Do we still see this on recent (2.6.24.7-132.el5rt) RT kernels?

Comment 17 Jeff Burke 2009-09-23 13:00:39 UTC
Looking at the latest results. I do not see the failure. But it is a timing bug.

Comment 19 Clark Williams 2012-01-09 22:07:29 UTC
Haven't seen this issue on our 2.6.33  or 3.0.9 kernels


Note You need to log in before you can comment on or make changes to this bug.