Bug 440401 - LTC41942-30 second flock() calls against files stored on a NetApp while using NFS
LTC41942-30 second flock() calls against files stored on a NetApp while using...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.5.z
All Linux
urgent Severity urgent
: rc
: ---
Assigned To: Vitaly Mayatskikh
Martin Jenner
: ZStream
Depends On: 432855 436129 445181 1207483
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-03 08:24 EDT by RHEL Product and Program Management
Modified: 2015-03-30 22:32 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-08-26 10:52:10 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description RHEL Product and Program Management 2008-04-03 08:24:15 EDT
This bug has been copied from bug #432855 and has been proposed
to be backported to 4.5 z-stream (EUS).
Comment 4 Zhang Kexin 2008-05-15 09:56:32 EDT
Bug is not fixed, tested in following steps:

1)  export a directory on NFS server, /etc/exports is as this:
    /export/euc/sxfs *(rw,no_root_squash)
2)  mount the export on the client
3)  Create a file inside the mount point
4)  Copy the testlocks and binary file compiled from check_lock.c into /tmp
    the files are from IT161907
5)  Modify the testlocks script and correct the path for the test file
6)  ./testlocks > /tmp/output.txt

there are following lines appear in the output.txt:
05:39:50.0850 unlock() took more than 80 ms: 655 ms
05:40:19.4932 (4730) lockf() took more than 80 ms: 30147 ms
05:40:19.5329 (4727) lockf() took more than 80 ms: 30187 ms
05:40:19.5759 (4716) lockf() took more than 80 ms: 30230 ms
05:40:19.6999 (4715) lockf() took more than 80 ms: 30354 ms
05:40:19.7465 (4722) lockf() took more than 80 ms: 30400 ms
05:40:19.7890 (4721) lockf() took more than 80 ms: 30443 ms
05:40:19.9125 (4726) lockf() took more than 80 ms: 30566 ms
05:40:19.9556 (4729) lockf() took more than 80 ms: 30609 ms
05:40:20.0850 (4728) lockf() took more than 80 ms: 30740 ms

but this is not expected to appear.


Comment 5 Andrius Benokraitis 2008-05-15 10:12:10 EDT
Putting back on ON_QA since I think we are beyond where we can take a look at
this, since this solution was inherited from other streams (4.7 and 4.6.z).

Has this has been tested on the original 4.7 test bits in bug 432855?
Comment 6 Stephanie Glass 2008-05-15 11:38:29 EDT
Yes, IBM has tested the 4.4, 4.6 and 4.7.
Comment 8 Jeff Layton 2008-06-17 14:56:25 EDT
Looking back through the comments in the test results show that the testing
reported in comment #4 was done using KVM guests. The reproducer for this
problem is highly dependent on gettimeofday() calls, and I've had very
inconsistent results from those under KVM. Is this problem still reproducible on
bare-metal machines? I've not tested this particular kernel, but I'm highly
suspicious of this reproducer being run on a KVM-based setup.
Comment 9 Stephanie Glass 2008-07-11 15:03:24 EDT
Just to clearify how IBM did their testing.  

1)We only tested using bare metal.  
2)We used the reproducer testcase and had no problems. 
3)We used it in our test environment for over a week and saw no issues
4)The customer tested in their environment and saw no issues.

Hope this helps.

Please let me know if you need anything else from IBM.

Thanks
Comment 11 Jeff Layton 2008-07-17 13:26:17 EDT
Ok, I think I see the problem with >= -55.0.18...

This patch was added during the backporting work:

    linux-2.6.9-nlm_compare_locks-fl_owner.patch

...that's causing the some of the lock comparisons to fail which causes the
client to send NLM_DENIED on a grant callback. If you back that patch out of the
set, then the problem should go away.

That patch seems to have been added as part of the backporting effort for this.

Let me know if you need other assistance...
Comment 12 Zhang Kexin 2008-07-21 20:49:50 EDT
I tested Vitaly's kernel-2.6.9-55.0.20.EL.bz440401.* kernel get from
http://porkchop.devel.redhat.com/brewroot/scratch/vmayatsk/task_1399361/ 
(these kernels do not include the nlm patch), and they did not hit the bug.
Comment 16 errata-xmlrpc 2008-08-26 10:52:10 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0846.html

Note You need to log in before you can comment on or make changes to this bug.