Red Hat Bugzilla – Bug 440401
LTC41942-30 second flock() calls against files stored on a NetApp while using NFS
Last modified: 2015-03-30 22:32:23 EDT
This bug has been copied from bug #432855 and has been proposed
to be backported to 4.5 z-stream (EUS).
Bug is not fixed, tested in following steps:
1) export a directory on NFS server, /etc/exports is as this:
2) mount the export on the client
3) Create a file inside the mount point
4) Copy the testlocks and binary file compiled from check_lock.c into /tmp
the files are from IT161907
5) Modify the testlocks script and correct the path for the test file
6) ./testlocks > /tmp/output.txt
there are following lines appear in the output.txt:
05:39:50.0850 unlock() took more than 80 ms: 655 ms
05:40:19.4932 (4730) lockf() took more than 80 ms: 30147 ms
05:40:19.5329 (4727) lockf() took more than 80 ms: 30187 ms
05:40:19.5759 (4716) lockf() took more than 80 ms: 30230 ms
05:40:19.6999 (4715) lockf() took more than 80 ms: 30354 ms
05:40:19.7465 (4722) lockf() took more than 80 ms: 30400 ms
05:40:19.7890 (4721) lockf() took more than 80 ms: 30443 ms
05:40:19.9125 (4726) lockf() took more than 80 ms: 30566 ms
05:40:19.9556 (4729) lockf() took more than 80 ms: 30609 ms
05:40:20.0850 (4728) lockf() took more than 80 ms: 30740 ms
but this is not expected to appear.
Putting back on ON_QA since I think we are beyond where we can take a look at
this, since this solution was inherited from other streams (4.7 and 4.6.z).
Has this has been tested on the original 4.7 test bits in bug 432855?
Yes, IBM has tested the 4.4, 4.6 and 4.7.
Looking back through the comments in the test results show that the testing
reported in comment #4 was done using KVM guests. The reproducer for this
problem is highly dependent on gettimeofday() calls, and I've had very
inconsistent results from those under KVM. Is this problem still reproducible on
bare-metal machines? I've not tested this particular kernel, but I'm highly
suspicious of this reproducer being run on a KVM-based setup.
Just to clearify how IBM did their testing.
1)We only tested using bare metal.
2)We used the reproducer testcase and had no problems.
3)We used it in our test environment for over a week and saw no issues
4)The customer tested in their environment and saw no issues.
Hope this helps.
Please let me know if you need anything else from IBM.
Ok, I think I see the problem with >= -55.0.18...
This patch was added during the backporting work:
...that's causing the some of the lock comparisons to fail which causes the
client to send NLM_DENIED on a grant callback. If you back that patch out of the
set, then the problem should go away.
That patch seems to have been added as part of the backporting effort for this.
Let me know if you need other assistance...
I tested Vitaly's kernel-2.6.9-55.0.20.EL.bz440401.* kernel get from
(these kernels do not include the nlm patch), and they did not hit the bug.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.