Bug 184223

Summary: File locking via fcntl() broken for NFS mounts in RHEL4u2
Product: Red Hat Enterprise Linux 4 Reporter: Steve Dickson <steved>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: gug, jbaron, maarten, mattdm
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 16:03:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Steve Dickson 2006-03-07 14:52:40 UTC
Description of problem:

Cooperative file locks using the fcntl() system call do
not seem to work on NFS mounted drives in rhel4u2.  The
python script appended to this message demonstrates the
problem.

The script forks off 70 child processes.  Each child attempts
to obtain a lock on a file.  When the lock is granted the child
simply writes a message to the file, releases the lock and exits.
On a working system, this all occurs very quickly.  On rhel4,
child processes are left hanging around and locks are not granted
for a long time.  It takes between 4 and 20 minutes for the child
processes all to access the file and exit.

How reproducible:
Use the following script:

#!/usr/bin/python

import os, sys, time, fcntl

FNAME = "test.txt"

NFORKS = 70

def dochild():
    fp = open(FNAME, 'a+b')
    fcntl.lockf(fp.fileno(), fcntl.LOCK_EX)
    fp.write("Hello from: " + str(os.getpid()) + '\n')
    fcntl.lockf(fp.fileno(), fcntl.LOCK_UN)
    fp.close()

for x in xrange(NFORKS):
    pid = os.fork()
    if pid == 0: # Child
        dochild()
        sys.exit(0)



Steps to Reproduce:
1. mount nfs filesystem
2. run python script
3.
  
Actual results:
Take several minutes to complete

Expected results:
Should take seconds to complete

Additional info:

Comment 1 Matthew Miller 2008-05-01 18:01:10 UTC
So, this works fine in RHEL 5 but still seems to be a problem on 4. Will a fix
be forthcoming? Thanks.

Comment 2 Maarten Broekman 2008-07-22 15:55:13 UTC
I'm not sure if I'm seeing a similar problem on my RHEL4.6 system.

Kernel:
2.6.9-67.0.4.ELhugemem #1 SMP Fri Jan 18 05:11:24 EST 2008 i686 athlon i386 
GNU/Linux

strace:
clone(Process 23784 attached
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0xf6fda708) = 23784
[pid 23783] close(1)                    = 0
[pid 23783] close(2)                    = 0
[pid 23783] time(NULL)                  = 1216299609
[pid 23783] wait4(-1, Process 23783 suspended
 <unfinished ...>
[pid 23784] open("/opt/applogs/rad/ac_data/log/ac.lock", O_WRONLY) = 3
[pid 23784] fcntl64(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_CUR, start=0, 
len=0}

The process hangs at this point.

Comment 3 Jiri Pallich 2012-06-20 16:03:25 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.