Bug 167192 - NFSv3 locking misses important kernel patches
NFSv3 locking misses important kernel patches
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
Brian Brock
: 170545 (view as bug list)
Depends On:
Blocks: 168429
  Show dependency treegraph
Reported: 2005-08-31 10:32 EDT by Florian von Kurnatowski
Modified: 2007-11-30 17:07 EST (History)
6 users (show)

See Also:
Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-03-07 14:44:20 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Florian von Kurnatowski 2005-08-31 10:32:05 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; de-DE; rv:1.7.6) Gecko/20050321 Firefox/1.0.2

Description of problem:
Our email server application uses a file-based storage approach and utilizes file locking heavily; one of our larger customers is just deciding on the storage architecture for their upcoming Scalix project. One option is a NetAPP NAS device.

when using the app on NFSv3 storage, we have large-scale performance degradations with RHEL4 as the app server, e.g. application startup time goes up from 5 to 30 seconds on NFS as compared to local storage.

The problem was isolated down to two processes accessing the same file and trying to lock using a system call similar to the one giving the following strace output:

fcntl64(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0})

When the first process, which is holding the lock, terminates and thereby releases it, the second process will hang anywhere between 5 and 45 seconds before seeing the release, then acquiring the lock and continuing.

Using the "nolock" mount option on NFSv3 makes the situation even worse, as then there is no locking at all and the two processes access the file at the same time, resulting in data corruption.

Kernel 2.6.10 resolves the problem as it introduces a number of fixes around NFS locking. This is described in 

The patch was modified again in Kernel 2.6.11.

The fix is currently not available in RHEL4 stable and beta kernels.

We re-tested our application on Fedora Core 4 (based on 2.6.11) and the problem seems to disappear completely, i.e. without nolock the lock release is detected by the second process without any delay; with nolock, the kernel code still provides for local logging (on NFS client only as opposed to NFS server). The latter is what we want because only one client will be accessing the NAS device at a time and local logging will provide for better performance.

So... ;-) it would be good to include this fix in a RHEL4 kernel update somehow.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
see description.

Actual Results:  see description.

Expected Results:  see description.

Additional info:
Comment 1 Florian von Kurnatowski 2005-08-31 11:11:54 EDT
In addition, while doing the testing on FC4, we're experiencing a throughput of
15-16MB/s. to the NetApp box as opposed to 5-6MB/s on RHEL4, all other
parameters unchanged.
Comment 2 Florian von Kurnatowski 2005-09-01 03:42:31 EDT
On testing with FC4 and Kernel 2.6.11-1 there, we still saw lock-related lockups
of processes under load. We weren't able to gather exact data, but working
assumption is that the NFS locking code in 2.6.11 is not fully stable either.
Comment 3 John Haxby 2005-09-01 08:45:47 EDT
A preliminary test suggests that this is a regression from RHEL3.  The "nolock"
option seems to do what Florian requires -- it simply causes a fall-back to
local-locking instead of NFS locking.

Although the man page sort-of implies that "nolock" completely disables locking
this would appear to be a completely useless option: it converts a slow locking
mechanism for an exclusively mounted NFS directory (as from an NAS) into a
something that is unusable!   I've used "nolock" with RHEL3 to greatly speed up
applications that do a lot of file locking -- and, of course, made sure that no
one else is using NFS exported file system.

There's still time to get this in RHEL4 U2 isn't there?   It's quite important
for people who want to use NetApp servers and the like.
Comment 4 Steve Dickson 2005-09-01 09:35:33 EDT
Its probably too late to get it into U2, but I will try and get it
in as early as possible for U3 and if need be, you can request
an Hot Fix kernel. 
Comment 5 Steve Dickson 2005-10-20 10:24:42 EDT
*** Bug 170545 has been marked as a duplicate of this bug. ***
Comment 11 Red Hat Bugzilla 2006-03-07 14:44:20 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.