Bug 191222

Summary:

read flock broken on single-node

Product:

[Retired] Red Hat Cluster Suite

Reporter:

Abhijith Das <adas>

Component:

gfs

Assignee:

Abhijith Das <adas>

Status:

CLOSED ERRATA

QA Contact:

GFS Bugs <gfs-bugs>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

CC:

cfeist, nobody+wcheng, swhiteho, teigland

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

RHBA-2006-0561

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2006-08-10 21:35:28 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
test-program to simulate bug	none
Patch to potentially fix this bz	none

Description Abhijith Das 2006-05-09 20:41:12 UTC

Description of problem:
Remember this is all on a single node.
When you hold out a READ FLOCK on a file. First request for a READ FLOCK
succeeds, but Second request for a READ FLOCK hangs/returns error

Version-Release number of selected component (if applicable):


How reproducible: All the time


Steps to Reproduce:
1. Process1 opens and acquires a READ FLOCK on a gfs file foo and goes to sleep.
2. Process2 opens, acquires a READ FLOCK on foo, UNFLOCKS and closes. No errors.
3. Process3 opens, tries to acquire a READ FLOCK on foo
  
Actual results:
Process3 blocks or returns with "resource not available" error (depending on
blocking flag used with flock())

Expected results:
All READ FLOCKS should be compatible with each other, no blocks or errors.
Process3 should behave exactly the same way as Process2

Additional info:
Attached is a test program that can be used to simulate this scenario.

Comment 1 Abhijith Das 2006-05-09 20:41:13 UTC

Created attachment 128812 [details]
test-program to simulate bug

Comment 2 David Teigland 2006-05-09 20:52:59 UTC

At least part of the problem is that the GL_NOCACHE flag used on
flock glocks assumes that there's only a single glock holder, so
when a NOCACHE holder is dequeued the glock is unlocked without
any thought that other holders may still exist.

Comment 3 Abhijith Das 2006-05-15 15:01:08 UTC

Created attachment 129069 [details]
Patch to potentially fix this bz

This patch ensures that a GL_NOCACHE glock is removed from cache only when
gfs_glock_dq is called on the last holder. I haven't seen any ill-effects of
this patch, but will feel comfortable when it goes through a round of qa.

Comment 4 Abhijith Das 2006-05-15 23:01:42 UTC

Committed above patch into RHEL4, HEAD and STABLE branches.

Comment 5 Abhijith Das 2006-05-17 19:17:43 UTC

A little explanation of FLOCKS, GL_NOCACHE etc
1. Why do flocks need GL_NOCACHE flag turned on for its glocks?
   If FLOCK glocks are cached on one node after use, another node requesting
   a conflicting FLOCK coupled with the LOCK_NB flag will be denied. The first
   node has already used and released the FLOCK and should not conflict with the
   second node's request. The GL_NOCACHE flag ensures this.

2. In RHEL3, there was no GL_NOCACHE flag. How were flocks working then?
   Without the GL_NOCACHE flag the release of the glock depends on a timeout
   value associated with FLOCK glocks. This timeout mechanism (flock_demote_ok())
   is not implemented and hence the glock gets released immediately.
   But, there is a correctness issue here. The release of the glock doesn't
   happen synchronously. The issue in 1. could still occur if the second 
   node requests the flock within the small window between the release of 
   the flock and release of the glock.

The solution is a correct implementation of GL_NOCACHE, which this patch
attempts to accomplish.

Comment 8 Red Hat Bugzilla 2006-08-10 21:35:28 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0561.html

Comment 9 Lenny Maiorani 2007-02-27 20:33:18 UTC

Just stumbled upon this bug myself using RHEL4U3. The symptoms I saw was that
the traffic on the heartbeat (DLM) network was high and performance was poor(er)
on nodes which were not the first to mount the filesystem.

The first mounter obtained journal locks then dequeued them when they still had
holders. From that moment on the other nodes had to do network DLM transactions
to get the locks and could never cache them locally. 

This fix solved the performance problem.