515717 – Flock on GFS fs file will error with "Resource tempory unavailable" for EWOULDBLOCK

Bug 515717 - Flock on GFS fs file will error with "Resource tempory unavailable" for EWOULDBLOCK

Summary: Flock on GFS fs file will error with "Resource tempory unavailable" for EWOUL...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	gfs-kmod
Sub Component:
Version:	5.5
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Abhijith Das
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-08-05 13:44 UTC by Shane Bradley
Modified:	2018-10-27 14:54 UTC (History)
CC List:	7 users (show)
Fixed In Version:	gfs-kmod-0.1.34-11.el5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-03-30 08:56:04 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Reproducer script (2.81 KB, application/octet-stream) 2009-08-05 13:46 UTC, Shane Bradley	no flags	Details
attempt at a workaround patch (2.20 KB, patch) 2010-01-11 16:40 UTC, Abhijith Das	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2010:0291	0	normal	SHIPPED_LIVE	Moderate: gfs-kmod security, bug fix and enhancement update	2010-03-29 14:12:22 UTC

Description Shane Bradley 2009-08-05 13:44:54 UTC

Description of problem:

Customer is doing some testing on flocks with GFS. When requesting a
"flock" on a file in GFS in rw or ro mode there will eventually be
errors of state "Resource temporary unavailable" which is error 11(for
EWOULDBLOCK). It appears that a flock cannot be obtained on the file.

The flock are opened with this line:
   " int ret = flock(g_filehandle, LOCK_SH | LOCK_NB);  "
If LOCK_NB is not used, then no errors are produced.

The script that orginal was submitted is attached to the bz. 
See additional notes on other details.

This does not occur on GFS2.  This was tested on latest release from
rhn.redhat.com and still fails with errors.

Version-Release number of selected component (if applicable):
kmod-gfs-0.1.23-5.el5-x86_64 ( tested on latest from rhn.redhat.com as
well and it fails)

How reproducible:
Everytime 

Steps to Reproduce:
1. setup GFS fs and mount to /<mountpoint>
2. then download the reproducer script to the mount point(gfs_trylock_test.cc)
3. cd /<mountpoint>
4. compile the script(see top of script for compile command) 
5. run ./gfs_trylock_test.cc
  
Actual results:

Errors occur trying to acquire a "flock" on a file. An error for
"Resource temporary unavailable" is returned which is error code
11(for EWOULDBLOCK)

Scripts custom error message:
"flock: tryrdlock failed: handle=3 \
             error=Resource temporarily unavailable errno=11 threadId=4112
  tryacquire_read failed. loopN=7 threadId=4112"

Expected results:
That the script should complete with no errors.

Additional info:

I have tested this script and made a couple modifications(which are
not included) and remove the file creation since the file creation
would open the file in "rw" mode thus exclusive access to all the
forked processes. I changed it so that it would open fd on a read only
file. The error still occurred in this mode as well.

Comment 1 Shane Bradley 2009-08-05 13:46:38 UTC

Created attachment 356316 [details]
Reproducer script

Comment 2 Robert Peterson 2009-08-05 14:18:50 UTC

I've been able to recreate this problem on GFS, and I've verified
that it does not recreate on GFS2.  My belief at this time is that
this was fixed in GFS2 by a patch that Abhi did that allowed glocks
for flocks to be shared.  Unfortunately, that GFS2 code has changed
a great deal since, so sorting it all out is a problem.  I don't
have a fix yet, but it should be fixable.

The problem also does not occur if mounted with -o localflocks but
that's not normally sane in a clustered environment.

Comment 4 Steve Whitehouse 2009-08-10 10:27:34 UTC

See also bz #421321

Comment 7 Steve Whitehouse 2009-08-25 09:56:09 UTC

Can you elaborate on the "massive changes" as the code looks pretty similar to me between gfs1 and gfs2. Am I looking at the wrong thing?

Comment 8 Abhijith Das 2010-01-11 16:40:38 UTC

Created attachment 383022 [details]
attempt at a workaround patch

I was able to reduce the parameters in the test script such that I could reproduce this bug with only a single process and two iterations of flock/unflock. It doesn't look like it is the same problem that was fixed in gfs2 where a process is queueing multiple flocks through multiple descriptors at the same time.

I've observed that this is a race between an unflock(LOCK_UN) and a subsequent flock. The unflock does a dq_uninit on the corresponding glock. When an flock request comes in before the glock can be unintialized, it fails with -EAGAIN (when the request is non-blocking). This gives the impression that some other process is holding the flock, whereas it's the previous unflock by the same process that's preventing the flock from succeeding.

As soon as unflock returns, the user should immediately be able to flock again.

When the flock is blocking, it correctly waits for the glock uninitialization from the previous unflock and goes on to process the flock.

For a non-blocking flock request, this patch checks for the condition where a previous unflock-related glock uninitialization may be pending and if so, disregards the TRY flag.

This patch seems to work correctly... the script completes without any errors and the QA locksmith test also succeeds.

Comment 23 errata-xmlrpc 2010-03-30 08:56:04 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0291.html

Note You need to log in before you can comment on or make changes to this bug.