Bug 272021
Summary: | GFS2 - flocks from same process trip kernel BUG at fs/gfs2/glock.c:1118! | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Abhijith Das <adas> | ||||||||
Component: | kernel | Assignee: | Don Zickus <dzickus> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Dean Jansa <djansa> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 5.0 | CC: | kanderso, lwang, nobody+wcheng, swhiteho, teigland | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | All | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | RHBA-2007-0959 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-11-07 20:02:26 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Abhijith Das
2007-08-31 16:15:03 UTC
Created attachment 183661 [details]
Program to create the problem.
Created attachment 193221 [details]
Initial patch
Two scenarios when doing multiple flocks from the same process:
a) flocks through single file descriptor
One fd means same struct file* and same holder structure for all flocks.
b) flocks through multiple file descriptors.
Each fd has a different holder structure.
This patch adds a new function gfs2_flock_glock_nq that's almost like
gfs2_glock_nq. It does the list_add from add_to_queue() but does not perform
the checks that disallow the same process from queueing multiple holders onto a
glock. We need this because of scenario (b) where it's ok for multiple flocks
to come from the same process through multiple file descriptors.
In scenario (a), when a process requests the second flock through the same file
descriptor, we dequeue the first flock, reinit the holder with the new flock
and enqueue.
In scenario (b), when a process requests the second flock through another file
descriptor, we need to find the glock (held by first flock) and queue another
holder (corresponding to the second file descriptor). This goes through
gfs2_flock_glock_nq() which doesn't trip BUG()s if it's the same process
requesting the glocks.
Existing problems that this patch doesn't fix:
1) With gfs2, ctrl-c will not break out of a process that is blocked waiting
for a flock. So, if we have a single-threaded process that does a SH flock
followed by a blocking EX flock, it'll block. Since the SH flock can't be
unlocked, we have a deadlock. If the process had two threads, one for each
flock, things go smoothly when the first thread unlocks the SH flock. I'm not
sure how this case can be handled, or whether it's ok to deadlock if the user's
rogue program attempts such a thing.
2) When one process requests promotion or demotion of an flock (i.e. through
the same file descriptor, scenario (a) from above), SH followed by EX or EX
followed by SH, we currently unlock, reinit holder and relock. There's a race
condition between the unlock and the relock where another process/node can
capture the lock. I don't know if LM_FLAG_PRIORITY would help, but ideally we
should have an atomic operation to promote or demote an flock. This bz does not
specify this issue, but I have a gfs1 bz that does.
Created attachment 194051 [details]
Second attempt
This patch adds a new flag to the gfs2_holder structure GL_FLOCK. It is set on
holders of glocks representing flocks. This flag is checked in add_to_queue()
and a process is permitted to queue more than one holder onto a glock if it is
set.
I'm in the middle of testing this patch out and will update this bz with my
results.
That patch looks much better I think. http://post-office.corp.redhat.com/archives/rhkernel-list/2007-September/msg00320.html Posted the rhel5.1 version of this patch to rhkernel-list. Marking this bz POST. in 2.6.18-48.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0959.html |