Bug 434966 - [RHEL4 U7] Kernel NFS Connectathon Test#12, 12.1 Failing
Summary: [RHEL4 U7] Kernel NFS Connectathon Test#12, 12.1 Failing
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.7
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Peter Staubach
QA Contact: Martin Jenner
URL: http://rhts.redhat.com/cgi-bin/rhts/t...
Whiteboard:
Depends On:
Blocks: 448929 461304
TreeView+ depends on / blocked
 
Reported: 2008-02-26 15:41 UTC by Jeff Burke
Modified: 2009-01-12 15:38 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-12 15:38:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jeff Burke 2008-02-26 15:41:53 UTC
Description of problem:
Test #12 - Signalled process should release locks.
	Parent: Killed child process.
	Parent: 12.1  - F_TLOCK [       0,  ENDING] FAILED!
	Parent: **** Expected success, returned EAGAIN...
	Parent: **** Probably implementation error.

Version-Release number of selected component (if applicable):
2.6.9-68.13.EL

How reproducible:
 Intermittent 

Steps to Reproduce:
1. Install RHEL4-U6 Baseline. 
2. Install 2.6.9-68.13.EL kernel
3. Run the RTHS Connectathon test
  
Actual results:
Capturing on eth0
===== Starting 'nfsvers=2_udp' test 'lock' =====
----- start: Mon Feb 25 19:55:20 EST 2008 -----
./server -l -F nfs -onfsvers=2,udp -p /export/home rhel5-nfs
Start tests on path /mnt/rhel5-nfs/hp-xw4200-01.test [y/n]? 
sh ./runtests  -l -t /mnt/rhel5-nfs/hp-xw4200-01.test

Starting LOCKING tests: test directory /mnt/rhel5-nfs/hp-xw4200-01.test (arg: -t)

Testing native post-LFS locking

Creating parent/child synchronization pipes.

Test #1 - Test regions of an unlocked file.
	Parent: 1.1  - F_TEST  [       0,       1] PASSED.
	Parent: 1.2  - F_TEST  [       0,  ENDING] PASSED.
	Parent: 1.3  - F_TEST  [       0,7fffffff] PASSED.
	Parent: 1.4  - F_TEST  [       1,       1] PASSED.
	Parent: 1.5  - F_TEST  [       1,  ENDING] PASSED.
	Parent: 1.6  - F_TEST  [       1,7fffffff] PASSED.
	Parent: 1.7  - F_TEST  [7fffffff,       1] PASSED.
	Parent: 1.8  - F_TEST  [7fffffff,  ENDING] PASSED.
	Parent: 1.9  - F_TEST  [7fffffff,7fffffff] PASSED.

Test #2 - Try to lock the whole file.
	Parent: 2.0  - F_TLOCK [       0,  ENDING] PASSED.
	Child:  2.1  - F_TEST  [       0,       1] PASSED.
	Child:  2.2  - F_TEST  [       0,  ENDING] PASSED.
	Child:  2.3  - F_TEST  [       0,7fffffff] PASSED.
	Child:  2.4  - F_TEST  [       1,       1] PASSED.
	Child:  2.5  - F_TEST  [       1,  ENDING] PASSED.
	Child:  2.6  - F_TEST  [       1,7fffffff] PASSED.
	Child:  2.7  - F_TEST  [7fffffff,       1] PASSED.
	Child:  2.8  - F_TEST  [7fffffff,  ENDING] PASSED.
	Child:  2.9  - F_TEST  [7fffffff,7fffffff] PASSED.
	Parent: 2.10 - F_ULOCK [       0,  ENDING] PASSED.

Test #3 - Try to lock just the 1st byte.
	Parent: 3.0  - F_TLOCK [       0,       1] PASSED.
	Child:  3.1  - F_TEST  [       0,       1] PASSED.
	Child:  3.2  - F_TEST  [       0,  ENDING] PASSED.
	Child:  3.3  - F_TEST  [       1,       1] PASSED.
	Child:  3.4  - F_TEST  [       1,  ENDING] PASSED.
	Parent: 3.5  - F_ULOCK [       0,       1] PASSED.

Test #4 - Try to lock the 2nd byte, test around it.
	Parent: 4.0  - F_TLOCK [       1,       1] PASSED.
	Child:  4.1  - F_TEST  [       0,       1] PASSED.
	Child:  4.2  - F_TEST  [       0,       2] PASSED.
	Child:  4.3  - F_TEST  [       0,  ENDING] PASSED.
	Child:  4.4  - F_TEST  [       1,       1] PASSED.
	Child:  4.5  - F_TEST  [       1,       2] PASSED.
	Child:  4.6  - F_TEST  [       1,  ENDING] PASSED.
	Child:  4.7  - F_TEST  [       2,       1] PASSED.
	Child:  4.8  - F_TEST  [       2,       2] PASSED.
	Child:  4.9  - F_TEST  [       2,  ENDING] PASSED.
	Parent: 4.10 - F_ULOCK [       1,       1] PASSED.

Test #5 - Try to lock 1st and 2nd bytes, test around them.
	Parent: 5.0  - F_TLOCK [       0,       1] PASSED.
	Parent: 5.1  - F_TLOCK [       2,       1] PASSED.
	Child:  5.2  - F_TEST  [       0,       1] PASSED.
	Child:  5.3  - F_TEST  [       0,       2] PASSED.
	Child:  5.4  - F_TEST  [       0,  ENDING] PASSED.
	Child:  5.5  - F_TEST  [       1,       1] PASSED.
	Child:  5.6  - F_TEST  [       1,       2] PASSED.
	Child:  5.7  - F_TEST  [       1,  ENDING] PASSED.
	Child:  5.8  - F_TEST  [       2,       1] PASSED.
	Child:  5.9  - F_TEST  [       2,       2] PASSED.
	Child:  5.10 - F_TEST  [       2,  ENDING] PASSED.
	Child:  5.11 - F_TEST  [       3,       1] PASSED.
	Child:  5.12 - F_TEST  [       3,       2] PASSED.
	Child:  5.13 - F_TEST  [       3,  ENDING] PASSED.
	Parent: 5.14 - F_ULOCK [       0,       1] PASSED.
	Parent: 5.15 - F_ULOCK [       2,       1] PASSED.

Test #6 - Try to lock the MAXEOF byte.
	Parent: 6.0  - F_TLOCK [7fffffff,       1] PASSED.
	Child:  6.1  - F_TEST  [7ffffffe,       1] PASSED.
	Child:  6.2  - F_TEST  [7ffffffe,       2] PASSED.
	Child:  6.3  - F_TEST  [7ffffffe,  ENDING] PASSED.
	Child:  6.4  - F_TEST  [7fffffff,       1] PASSED.
	Child:  6.5  - F_TEST  [7fffffff,       2] PASSED.
	Child:  6.6  - F_TEST  [7fffffff,  ENDING] PASSED.
	Child:  6.7  - F_TEST  [80000000,  ENDING] PASSED.
	Child:  6.8  - F_TEST  [80000000,       1] PASSED.
	Child:  6.9  - F_TEST  [80000000,7fffffff] PASSED.
	Child:  6.10 - F_TEST  [80000000,80000000] WARNING!
	Child:  **** Expected EINVAL, returned EACCES...
	Parent: 6.11 - F_ULOCK [7fffffff,       1] PASSED.

Test #7 - Test parent/child mutual exclusion.
	Parent: 7.0  - F_TLOCK [     ffc,       9] PASSED.
	Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ].
	Parent: Now free child to run, should block on lock.
	Parent: Check data in file to insure child blocked.
	Parent: Read 'aaaa eh' from testfile [ 4092, 7 ].
	Parent: 7.1  - COMPARE [     ffc,       7] PASSED.
	Parent: Now unlock region so child will unblock.
	Parent: 7.2  - F_ULOCK [     ffc,       9] PASSED.
	Child:  7.3  - F_LOCK  [     ffc,       9] PASSED.
	Child:  Write child's version of the data and release lock.
	Parent: Now try to regain lock, parent should block.
	Child:  Wrote 'bebebebeb' to testfile [ 4092, 9 ].
	Child:  7.4  - F_ULOCK [     ffc,       9] PASSED.
	Parent: 7.5  - F_LOCK  [     ffc,       9] PASSED.
	Parent: Check data in file to insure child unblocked.
	Parent: Read 'bebebebeb' from testfile [ 4092, 9 ].
	Parent: 7.6  - COMPARE [     ffc,       9] PASSED.
	Parent: 7.7  - F_ULOCK [     ffc,       9] PASSED.

Test #8 - Rate test performing lock/unlock cycles.
	Parent: Performed 1000 lock/unlock cycles in 1230 msecs. [97560 lpm].

Test #10 - Make sure a locked region is split properly.
	Parent: 10.0  - F_TLOCK [       0,       3] PASSED.
	Parent: 10.1  - F_ULOCK [       1,       1] PASSED.
	Child:  10.2  - F_TEST  [       0,       1] PASSED.
	Child:  10.3  - F_TEST  [       2,       1] PASSED.
	Child:  10.4  - F_TEST  [       3,  ENDING] PASSED.
	Child:  10.5  - F_TEST  [       1,       1] PASSED.
	Parent: 10.6  - F_ULOCK [       0,       1] PASSED.
	Parent: 10.7  - F_ULOCK [       2,       1] PASSED.
	Child:  10.8  - F_TEST  [       0,       3] PASSED.
	Parent: 10.9  - F_ULOCK [       0,       1] PASSED.
	Parent: 10.10 - F_TLOCK [       1,       3] PASSED.
	Parent: 10.11 - F_ULOCK [       2,       1] PASSED.
	Child:  10.12 - F_TEST  [       1,       1] PASSED.
	Child:  10.13 - F_TEST  [       3,       1] PASSED.
	Child:  10.14 - F_TEST  [       4,  ENDING] PASSED.
	Child:  10.15 - F_TEST  [       2,       1] PASSED.
	Child:  10.16 - F_TEST  [       0,       1] PASSED.

Test #11 - Make sure close() releases the process's locks.
	Parent: 11.0  - F_TLOCK [       0,  ENDING] PASSED.
	Parent: Closed testfile.
	Child:  11.1  - F_TLOCK [       0,  ENDING] PASSED.
	Child:  11.2  - F_ULOCK [       0,  ENDING] PASSED.
	Parent: 11.3  - F_TLOCK [      1d,     5b7] PASSED.
	Parent: 11.4  - F_TLOCK [    2000,      57] PASSED.
	Parent: Closed testfile.
	Child:  11.5  - F_TLOCK [       0,  ENDING] PASSED.
	Child:  11.6  - F_ULOCK [       0,  ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 0, 16 ].
	Parent: 11.7  - F_TLOCK [       0,  ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 13, 16 ].
	Parent: Closed testfile.
	Child:  11.8  - F_TLOCK [       0,  ENDING] PASSED.
	Child:  11.9  - F_ULOCK [       0,  ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 0, 16 ].
	Parent: 11.10 - F_TLOCK [       0,  ENDING] PASSED.
	Parent: Truncated testfile.
	Parent: Closed testfile.
	Child:  11.11 - F_TLOCK [       0,  ENDING] PASSED.
	Child:  11.12 - F_ULOCK [       0,  ENDING] PASSED.

Test #12 - Signalled process should release locks.
	Parent: Killed child process.
	Parent: 12.1  - F_TLOCK [       0,  ENDING] FAILED!
	Parent: **** Expected success, returned EAGAIN...
	Parent: **** Probably implementation error.

** PARENT pass 1 results: 39/39 pass, 0/0 warn, 1/1 fail (pass/total).

**  CHILD pass 1 results: 63/63 pass, 1/1 warn, 0/0 fail (pass/total).
lock tests failed
Tests failed, leaving /mnt/rhel5-nfs mounted
----- end: Mon Feb 25 19:55:37 EST 2008 -----
----- return code: 1 -----
4451 packets captured

Expected results:
 This test should pass all the time

Additional info:
 We have seen this against several static server RHEL5, Sol9

Comment 1 Jeff Burke 2008-03-01 19:44:30 UTC
Not sure if this requires a separate bug at this point in time. We are seeing
this issue with the latest RHEL5.2 kernel 2.6.9.18-84.el5

Test #12 - Signalled process should release locks.
	Child:  12.0  - F_TLOCK [       0,  ENDING] PASSED.
	Parent: Killed child process.
	Parent: 12.1  - F_TLOCK [       0,  ENDING] FAILED!
	Parent: **** Expected success, returned EAGAIN...
	Parent: **** Probably implementation error.

** PARENT pass 1 results: 39/39 pass, 0/0 warn, 1/1 fail (pass/total).

**  CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
lock tests failed
Tests failed, leaving /mnt/sol9-nfs mounted
----- end: Sat Mar  1 01:52:37 EST 2008 -----
----- return code: 1 -----
4407 packets captured

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2091722

Comment 2 Peter Staubach 2008-03-03 14:34:42 UTC
I looked at the packet capture for this last failure.  It appears to me
that sol9 failed to respond for long enough that test12, which is very
timing dependent, failed.  I believe that the backoff delays caused the
child of the child process to delay dropping its lock long enough for
the parent process to fail to acquire the lock when it tried.

Perhaps running the lock tests with a longer wait time will help.
Otherwise, we could/should look at rewriting this test to eliminate
the timing dependencies.

Comment 6 RHEL Program Management 2008-05-29 15:36:53 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 RHEL Program Management 2008-09-03 13:16:05 UTC
Updating PM score.

Comment 9 RHEL Program Management 2009-01-12 15:38:51 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.


Note You need to log in before you can comment on or make changes to this bug.