This bug has been copied from bug #432855 and has been proposed to be backported to 4.5 z-stream (EUS).
Bug is not fixed, tested in following steps: 1) export a directory on NFS server, /etc/exports is as this: /export/euc/sxfs *(rw,no_root_squash) 2) mount the export on the client 3) Create a file inside the mount point 4) Copy the testlocks and binary file compiled from check_lock.c into /tmp the files are from IT161907 5) Modify the testlocks script and correct the path for the test file 6) ./testlocks > /tmp/output.txt there are following lines appear in the output.txt: 05:39:50.0850 unlock() took more than 80 ms: 655 ms 05:40:19.4932 (4730) lockf() took more than 80 ms: 30147 ms 05:40:19.5329 (4727) lockf() took more than 80 ms: 30187 ms 05:40:19.5759 (4716) lockf() took more than 80 ms: 30230 ms 05:40:19.6999 (4715) lockf() took more than 80 ms: 30354 ms 05:40:19.7465 (4722) lockf() took more than 80 ms: 30400 ms 05:40:19.7890 (4721) lockf() took more than 80 ms: 30443 ms 05:40:19.9125 (4726) lockf() took more than 80 ms: 30566 ms 05:40:19.9556 (4729) lockf() took more than 80 ms: 30609 ms 05:40:20.0850 (4728) lockf() took more than 80 ms: 30740 ms but this is not expected to appear.
Putting back on ON_QA since I think we are beyond where we can take a look at this, since this solution was inherited from other streams (4.7 and 4.6.z). Has this has been tested on the original 4.7 test bits in bug 432855?
Yes, IBM has tested the 4.4, 4.6 and 4.7.
Looking back through the comments in the test results show that the testing reported in comment #4 was done using KVM guests. The reproducer for this problem is highly dependent on gettimeofday() calls, and I've had very inconsistent results from those under KVM. Is this problem still reproducible on bare-metal machines? I've not tested this particular kernel, but I'm highly suspicious of this reproducer being run on a KVM-based setup.
Just to clearify how IBM did their testing. 1)We only tested using bare metal. 2)We used the reproducer testcase and had no problems. 3)We used it in our test environment for over a week and saw no issues 4)The customer tested in their environment and saw no issues. Hope this helps. Please let me know if you need anything else from IBM. Thanks
Ok, I think I see the problem with >= -55.0.18... This patch was added during the backporting work: linux-2.6.9-nlm_compare_locks-fl_owner.patch ...that's causing the some of the lock comparisons to fail which causes the client to send NLM_DENIED on a grant callback. If you back that patch out of the set, then the problem should go away. That patch seems to have been added as part of the backporting effort for this. Let me know if you need other assistance...
I tested Vitaly's kernel-2.6.9-55.0.20.EL.bz440401.* kernel get from http://porkchop.devel.redhat.com/brewroot/scratch/vmayatsk/task_1399361/ (these kernels do not include the nlm patch), and they did not hit the bug.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0846.html