440401 – LTC41942-30 second flock() calls against files stored on a NetApp while using NFS

Bug 440401 - LTC41942-30 second flock() calls against files stored on a NetApp while using NFS

Summary: LTC41942-30 second flock() calls against files stored on a NetApp while using...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.5.z
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Vitaly Mayatskikh
QA Contact:	Martin Jenner
Docs Contact:
URL:
Whiteboard:
Depends On:	432855 436129 445181 1207483
Blocks:
TreeView+	depends on / blocked

Reported:	2008-04-03 12:24 UTC by RHEL Program Management
Modified:	2015-03-31 02:32 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-08-26 14:52:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0846	0	normal	SHIPPED_LIVE	kernel bug fix update	2008-08-26 14:51:56 UTC

Description RHEL Program Management 2008-04-03 12:24:15 UTC

This bug has been copied from bug #432855 and has been proposed
to be backported to 4.5 z-stream (EUS).

Comment 4 Zhang Kexin 2008-05-15 13:56:32 UTC

Bug is not fixed, tested in following steps:

1)  export a directory on NFS server, /etc/exports is as this:
    /export/euc/sxfs *(rw,no_root_squash)
2)  mount the export on the client
3)  Create a file inside the mount point
4)  Copy the testlocks and binary file compiled from check_lock.c into /tmp
    the files are from IT161907
5)  Modify the testlocks script and correct the path for the test file
6)  ./testlocks > /tmp/output.txt

there are following lines appear in the output.txt:
05:39:50.0850 unlock() took more than 80 ms: 655 ms
05:40:19.4932 (4730) lockf() took more than 80 ms: 30147 ms
05:40:19.5329 (4727) lockf() took more than 80 ms: 30187 ms
05:40:19.5759 (4716) lockf() took more than 80 ms: 30230 ms
05:40:19.6999 (4715) lockf() took more than 80 ms: 30354 ms
05:40:19.7465 (4722) lockf() took more than 80 ms: 30400 ms
05:40:19.7890 (4721) lockf() took more than 80 ms: 30443 ms
05:40:19.9125 (4726) lockf() took more than 80 ms: 30566 ms
05:40:19.9556 (4729) lockf() took more than 80 ms: 30609 ms
05:40:20.0850 (4728) lockf() took more than 80 ms: 30740 ms

but this is not expected to appear.

Comment 5 Andrius Benokraitis 2008-05-15 14:12:10 UTC

Putting back on ON_QA since I think we are beyond where we can take a look at
this, since this solution was inherited from other streams (4.7 and 4.6.z).

Has this has been tested on the original 4.7 test bits in bug 432855?

Comment 6 Stephanie Glass 2008-05-15 15:38:29 UTC

Yes, IBM has tested the 4.4, 4.6 and 4.7.

Comment 8 Jeff Layton 2008-06-17 18:56:25 UTC

Looking back through the comments in the test results show that the testing
reported in comment #4 was done using KVM guests. The reproducer for this
problem is highly dependent on gettimeofday() calls, and I've had very
inconsistent results from those under KVM. Is this problem still reproducible on
bare-metal machines? I've not tested this particular kernel, but I'm highly
suspicious of this reproducer being run on a KVM-based setup.

Comment 9 Stephanie Glass 2008-07-11 19:03:24 UTC

Just to clearify how IBM did their testing.  

1)We only tested using bare metal.  
2)We used the reproducer testcase and had no problems. 
3)We used it in our test environment for over a week and saw no issues
4)The customer tested in their environment and saw no issues.

Hope this helps.

Please let me know if you need anything else from IBM.

Thanks

Comment 11 Jeff Layton 2008-07-17 17:26:17 UTC

Ok, I think I see the problem with >= -55.0.18...

This patch was added during the backporting work:

    linux-2.6.9-nlm_compare_locks-fl_owner.patch

...that's causing the some of the lock comparisons to fail which causes the
client to send NLM_DENIED on a grant callback. If you back that patch out of the
set, then the problem should go away.

That patch seems to have been added as part of the backporting effort for this.

Let me know if you need other assistance...

Comment 12 Zhang Kexin 2008-07-22 00:49:50 UTC

I tested Vitaly's kernel-2.6.9-55.0.20.EL.bz440401.* kernel get from
http://porkchop.devel.redhat.com/brewroot/scratch/vmayatsk/task_1399361/ 
(these kernels do not include the nlm patch), and they did not hit the bug.

Comment 16 errata-xmlrpc 2008-08-26 14:52:10 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0846.html

Note You need to log in before you can comment on or make changes to this bug.