Description of problem: Test #12 - Signalled process should release locks. Parent: Killed child process. Parent: 12.1 - F_TLOCK [ 0, ENDING] FAILED! Parent: **** Expected success, returned EAGAIN... Parent: **** Probably implementation error. Version-Release number of selected component (if applicable): 2.6.9-68.13.EL How reproducible: Intermittent Steps to Reproduce: 1. Install RHEL4-U6 Baseline. 2. Install 2.6.9-68.13.EL kernel 3. Run the RTHS Connectathon test Actual results: Capturing on eth0 ===== Starting 'nfsvers=2_udp' test 'lock' ===== ----- start: Mon Feb 25 19:55:20 EST 2008 ----- ./server -l -F nfs -onfsvers=2,udp -p /export/home rhel5-nfs Start tests on path /mnt/rhel5-nfs/hp-xw4200-01.test [y/n]? sh ./runtests -l -t /mnt/rhel5-nfs/hp-xw4200-01.test Starting LOCKING tests: test directory /mnt/rhel5-nfs/hp-xw4200-01.test (arg: -t) Testing native post-LFS locking Creating parent/child synchronization pipes. Test #1 - Test regions of an unlocked file. Parent: 1.1 - F_TEST [ 0, 1] PASSED. Parent: 1.2 - F_TEST [ 0, ENDING] PASSED. Parent: 1.3 - F_TEST [ 0,7fffffff] PASSED. Parent: 1.4 - F_TEST [ 1, 1] PASSED. Parent: 1.5 - F_TEST [ 1, ENDING] PASSED. Parent: 1.6 - F_TEST [ 1,7fffffff] PASSED. Parent: 1.7 - F_TEST [7fffffff, 1] PASSED. Parent: 1.8 - F_TEST [7fffffff, ENDING] PASSED. Parent: 1.9 - F_TEST [7fffffff,7fffffff] PASSED. Test #2 - Try to lock the whole file. Parent: 2.0 - F_TLOCK [ 0, ENDING] PASSED. Child: 2.1 - F_TEST [ 0, 1] PASSED. Child: 2.2 - F_TEST [ 0, ENDING] PASSED. Child: 2.3 - F_TEST [ 0,7fffffff] PASSED. Child: 2.4 - F_TEST [ 1, 1] PASSED. Child: 2.5 - F_TEST [ 1, ENDING] PASSED. Child: 2.6 - F_TEST [ 1,7fffffff] PASSED. Child: 2.7 - F_TEST [7fffffff, 1] PASSED. Child: 2.8 - F_TEST [7fffffff, ENDING] PASSED. Child: 2.9 - F_TEST [7fffffff,7fffffff] PASSED. Parent: 2.10 - F_ULOCK [ 0, ENDING] PASSED. Test #3 - Try to lock just the 1st byte. Parent: 3.0 - F_TLOCK [ 0, 1] PASSED. Child: 3.1 - F_TEST [ 0, 1] PASSED. Child: 3.2 - F_TEST [ 0, ENDING] PASSED. Child: 3.3 - F_TEST [ 1, 1] PASSED. Child: 3.4 - F_TEST [ 1, ENDING] PASSED. Parent: 3.5 - F_ULOCK [ 0, 1] PASSED. Test #4 - Try to lock the 2nd byte, test around it. Parent: 4.0 - F_TLOCK [ 1, 1] PASSED. Child: 4.1 - F_TEST [ 0, 1] PASSED. Child: 4.2 - F_TEST [ 0, 2] PASSED. Child: 4.3 - F_TEST [ 0, ENDING] PASSED. Child: 4.4 - F_TEST [ 1, 1] PASSED. Child: 4.5 - F_TEST [ 1, 2] PASSED. Child: 4.6 - F_TEST [ 1, ENDING] PASSED. Child: 4.7 - F_TEST [ 2, 1] PASSED. Child: 4.8 - F_TEST [ 2, 2] PASSED. Child: 4.9 - F_TEST [ 2, ENDING] PASSED. Parent: 4.10 - F_ULOCK [ 1, 1] PASSED. Test #5 - Try to lock 1st and 2nd bytes, test around them. Parent: 5.0 - F_TLOCK [ 0, 1] PASSED. Parent: 5.1 - F_TLOCK [ 2, 1] PASSED. Child: 5.2 - F_TEST [ 0, 1] PASSED. Child: 5.3 - F_TEST [ 0, 2] PASSED. Child: 5.4 - F_TEST [ 0, ENDING] PASSED. Child: 5.5 - F_TEST [ 1, 1] PASSED. Child: 5.6 - F_TEST [ 1, 2] PASSED. Child: 5.7 - F_TEST [ 1, ENDING] PASSED. Child: 5.8 - F_TEST [ 2, 1] PASSED. Child: 5.9 - F_TEST [ 2, 2] PASSED. Child: 5.10 - F_TEST [ 2, ENDING] PASSED. Child: 5.11 - F_TEST [ 3, 1] PASSED. Child: 5.12 - F_TEST [ 3, 2] PASSED. Child: 5.13 - F_TEST [ 3, ENDING] PASSED. Parent: 5.14 - F_ULOCK [ 0, 1] PASSED. Parent: 5.15 - F_ULOCK [ 2, 1] PASSED. Test #6 - Try to lock the MAXEOF byte. Parent: 6.0 - F_TLOCK [7fffffff, 1] PASSED. Child: 6.1 - F_TEST [7ffffffe, 1] PASSED. Child: 6.2 - F_TEST [7ffffffe, 2] PASSED. Child: 6.3 - F_TEST [7ffffffe, ENDING] PASSED. Child: 6.4 - F_TEST [7fffffff, 1] PASSED. Child: 6.5 - F_TEST [7fffffff, 2] PASSED. Child: 6.6 - F_TEST [7fffffff, ENDING] PASSED. Child: 6.7 - F_TEST [80000000, ENDING] PASSED. Child: 6.8 - F_TEST [80000000, 1] PASSED. Child: 6.9 - F_TEST [80000000,7fffffff] PASSED. Child: 6.10 - F_TEST [80000000,80000000] WARNING! Child: **** Expected EINVAL, returned EACCES... Parent: 6.11 - F_ULOCK [7fffffff, 1] PASSED. Test #7 - Test parent/child mutual exclusion. Parent: 7.0 - F_TLOCK [ ffc, 9] PASSED. Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ]. Parent: Now free child to run, should block on lock. Parent: Check data in file to insure child blocked. Parent: Read 'aaaa eh' from testfile [ 4092, 7 ]. Parent: 7.1 - COMPARE [ ffc, 7] PASSED. Parent: Now unlock region so child will unblock. Parent: 7.2 - F_ULOCK [ ffc, 9] PASSED. Child: 7.3 - F_LOCK [ ffc, 9] PASSED. Child: Write child's version of the data and release lock. Parent: Now try to regain lock, parent should block. Child: Wrote 'bebebebeb' to testfile [ 4092, 9 ]. Child: 7.4 - F_ULOCK [ ffc, 9] PASSED. Parent: 7.5 - F_LOCK [ ffc, 9] PASSED. Parent: Check data in file to insure child unblocked. Parent: Read 'bebebebeb' from testfile [ 4092, 9 ]. Parent: 7.6 - COMPARE [ ffc, 9] PASSED. Parent: 7.7 - F_ULOCK [ ffc, 9] PASSED. Test #8 - Rate test performing lock/unlock cycles. Parent: Performed 1000 lock/unlock cycles in 1230 msecs. [97560 lpm]. Test #10 - Make sure a locked region is split properly. Parent: 10.0 - F_TLOCK [ 0, 3] PASSED. Parent: 10.1 - F_ULOCK [ 1, 1] PASSED. Child: 10.2 - F_TEST [ 0, 1] PASSED. Child: 10.3 - F_TEST [ 2, 1] PASSED. Child: 10.4 - F_TEST [ 3, ENDING] PASSED. Child: 10.5 - F_TEST [ 1, 1] PASSED. Parent: 10.6 - F_ULOCK [ 0, 1] PASSED. Parent: 10.7 - F_ULOCK [ 2, 1] PASSED. Child: 10.8 - F_TEST [ 0, 3] PASSED. Parent: 10.9 - F_ULOCK [ 0, 1] PASSED. Parent: 10.10 - F_TLOCK [ 1, 3] PASSED. Parent: 10.11 - F_ULOCK [ 2, 1] PASSED. Child: 10.12 - F_TEST [ 1, 1] PASSED. Child: 10.13 - F_TEST [ 3, 1] PASSED. Child: 10.14 - F_TEST [ 4, ENDING] PASSED. Child: 10.15 - F_TEST [ 2, 1] PASSED. Child: 10.16 - F_TEST [ 0, 1] PASSED. Test #11 - Make sure close() releases the process's locks. Parent: 11.0 - F_TLOCK [ 0, ENDING] PASSED. Parent: Closed testfile. Child: 11.1 - F_TLOCK [ 0, ENDING] PASSED. Child: 11.2 - F_ULOCK [ 0, ENDING] PASSED. Parent: 11.3 - F_TLOCK [ 1d, 5b7] PASSED. Parent: 11.4 - F_TLOCK [ 2000, 57] PASSED. Parent: Closed testfile. Child: 11.5 - F_TLOCK [ 0, ENDING] PASSED. Child: 11.6 - F_ULOCK [ 0, ENDING] PASSED. Parent: Wrote '123456789abcdef' to testfile [ 0, 16 ]. Parent: 11.7 - F_TLOCK [ 0, ENDING] PASSED. Parent: Wrote '123456789abcdef' to testfile [ 13, 16 ]. Parent: Closed testfile. Child: 11.8 - F_TLOCK [ 0, ENDING] PASSED. Child: 11.9 - F_ULOCK [ 0, ENDING] PASSED. Parent: Wrote '123456789abcdef' to testfile [ 0, 16 ]. Parent: 11.10 - F_TLOCK [ 0, ENDING] PASSED. Parent: Truncated testfile. Parent: Closed testfile. Child: 11.11 - F_TLOCK [ 0, ENDING] PASSED. Child: 11.12 - F_ULOCK [ 0, ENDING] PASSED. Test #12 - Signalled process should release locks. Parent: Killed child process. Parent: 12.1 - F_TLOCK [ 0, ENDING] FAILED! Parent: **** Expected success, returned EAGAIN... Parent: **** Probably implementation error. ** PARENT pass 1 results: 39/39 pass, 0/0 warn, 1/1 fail (pass/total). ** CHILD pass 1 results: 63/63 pass, 1/1 warn, 0/0 fail (pass/total). lock tests failed Tests failed, leaving /mnt/rhel5-nfs mounted ----- end: Mon Feb 25 19:55:37 EST 2008 ----- ----- return code: 1 ----- 4451 packets captured Expected results: This test should pass all the time Additional info: We have seen this against several static server RHEL5, Sol9
Not sure if this requires a separate bug at this point in time. We are seeing this issue with the latest RHEL5.2 kernel 2.6.9.18-84.el5 Test #12 - Signalled process should release locks. Child: 12.0 - F_TLOCK [ 0, ENDING] PASSED. Parent: Killed child process. Parent: 12.1 - F_TLOCK [ 0, ENDING] FAILED! Parent: **** Expected success, returned EAGAIN... Parent: **** Probably implementation error. ** PARENT pass 1 results: 39/39 pass, 0/0 warn, 1/1 fail (pass/total). ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total). lock tests failed Tests failed, leaving /mnt/sol9-nfs mounted ----- end: Sat Mar 1 01:52:37 EST 2008 ----- ----- return code: 1 ----- 4407 packets captured http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=2091722
I looked at the packet capture for this last failure. It appears to me that sol9 failed to respond for long enough that test12, which is very timing dependent, failed. I believe that the backoff delays caused the child of the child process to delay dropping its lock long enough for the parent process to fail to acquire the lock when it tried. Perhaps running the lock tests with a longer wait time will help. Otherwise, we could/should look at rewriting this test to eliminate the timing dependencies.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Updating PM score.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.