191222 – read flock broken on single-node

Bug 191222 - read flock broken on single-node

Summary: read flock broken on single-node

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Cluster Suite
Classification:	Retired
Component:	gfs
Sub Component:
Version:	4
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Abhijith Das
QA Contact:	GFS Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-05-09 20:41 UTC by Abhijith Das
Modified:	2010-01-12 03:11 UTC (History)
CC List:	4 users (show)
Fixed In Version:	RHBA-2006-0561
Clone Of:
Environment:
Last Closed:	2006-08-10 21:35:28 UTC
Embargoed:

Attachments	(Terms of Use)
test-program to simulate bug (1.24 KB, text/x-csrc) 2006-05-09 20:41 UTC, Abhijith Das	no flags	Details
Patch to potentially fix this bz (1.26 KB, patch) 2006-05-15 15:01 UTC, Abhijith Das	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2006:0561	0	normal	SHIPPED_LIVE	GFS-kernel bug fix update	2006-08-10 04:00:00 UTC

Description Abhijith Das 2006-05-09 20:41:12 UTC

Description of problem:
Remember this is all on a single node.
When you hold out a READ FLOCK on a file. First request for a READ FLOCK
succeeds, but Second request for a READ FLOCK hangs/returns error

Version-Release number of selected component (if applicable):


How reproducible: All the time


Steps to Reproduce:
1. Process1 opens and acquires a READ FLOCK on a gfs file foo and goes to sleep.
2. Process2 opens, acquires a READ FLOCK on foo, UNFLOCKS and closes. No errors.
3. Process3 opens, tries to acquire a READ FLOCK on foo
  
Actual results:
Process3 blocks or returns with "resource not available" error (depending on
blocking flag used with flock())

Expected results:
All READ FLOCKS should be compatible with each other, no blocks or errors.
Process3 should behave exactly the same way as Process2

Additional info:
Attached is a test program that can be used to simulate this scenario.

Comment 1 Abhijith Das 2006-05-09 20:41:13 UTC

Created attachment 128812 [details]
test-program to simulate bug

Comment 2 David Teigland 2006-05-09 20:52:59 UTC

At least part of the problem is that the GL_NOCACHE flag used on
flock glocks assumes that there's only a single glock holder, so
when a NOCACHE holder is dequeued the glock is unlocked without
any thought that other holders may still exist.

Comment 3 Abhijith Das 2006-05-15 15:01:08 UTC

Created attachment 129069 [details]
Patch to potentially fix this bz

This patch ensures that a GL_NOCACHE glock is removed from cache only when
gfs_glock_dq is called on the last holder. I haven't seen any ill-effects of
this patch, but will feel comfortable when it goes through a round of qa.

Comment 4 Abhijith Das 2006-05-15 23:01:42 UTC

Committed above patch into RHEL4, HEAD and STABLE branches.

Comment 5 Abhijith Das 2006-05-17 19:17:43 UTC

A little explanation of FLOCKS, GL_NOCACHE etc
1. Why do flocks need GL_NOCACHE flag turned on for its glocks?
   If FLOCK glocks are cached on one node after use, another node requesting
   a conflicting FLOCK coupled with the LOCK_NB flag will be denied. The first
   node has already used and released the FLOCK and should not conflict with the
   second node's request. The GL_NOCACHE flag ensures this.

2. In RHEL3, there was no GL_NOCACHE flag. How were flocks working then?
   Without the GL_NOCACHE flag the release of the glock depends on a timeout
   value associated with FLOCK glocks. This timeout mechanism (flock_demote_ok())
   is not implemented and hence the glock gets released immediately.
   But, there is a correctness issue here. The release of the glock doesn't
   happen synchronously. The issue in 1. could still occur if the second 
   node requests the flock within the small window between the release of 
   the flock and release of the glock.

The solution is a correct implementation of GL_NOCACHE, which this patch
attempts to accomplish.

Comment 8 Red Hat Bugzilla 2006-08-10 21:35:28 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0561.html

Comment 9 Lenny Maiorani 2007-02-27 20:33:18 UTC

Just stumbled upon this bug myself using RHEL4U3. The symptoms I saw was that
the traffic on the heartbeat (DLM) network was high and performance was poor(er)
on nodes which were not the first to mount the filesystem.

The first mounter obtained journal locks then dequeued them when they still had
holders. From that moment on the other nodes had to do network DLM transactions
to get the locks and could never cache them locally. 

This fix solved the performance problem.

Note You need to log in before you can comment on or make changes to this bug.