707005 – dlm: fcntl F_SETLKW should be interruptible in GFS2

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 707005 - dlm: fcntl F_SETLKW should be interruptible in GFS2

Summary: dlm: fcntl F_SETLKW should be interruptible in GFS2

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	David Teigland
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:	678585
Blocks:	695824
TreeView+	depends on / blocked

Reported:	2011-05-23 17:45 UTC by David Teigland
Modified:	2018-11-14 09:43 UTC (History)
CC List:	19 users (show)
Fixed In Version:	kernel-2.6.32-156.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:	678585
Environment:
Last Closed:	2011-12-06 13:32:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	731775	0	medium	CLOSED	dlm: dev_write no op messages when flocks are used my multiple processes	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHSA-2011:1530	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update	2011-12-06 01:45:35 UTC

Internal Links: 731775

Description David Teigland 2011-05-23 17:45:50 UTC

+++ This bug was initially created as a clone of Bug #678585 +++

Created attachment 479523 [details]
/sys/kernel/debug/gfs2/jbm:jbm/glocks from both nodes

Description of problem:

JBoss QA guys have the following problem when they are testing failover of HornetQ application.

Testing is based on RHEL6 cluster, clvmd and GFS2.

The application is running on two nodes. There is one process on each node. Processes on both nodes use shared files[1] on the GFS2 filesystem.

The problem is that when the process is killed (on any of the nodes) by SIGKILL (kill -9) it always becomes zombie, which can't be killed at all.

It seems it is hanged on some GFS2/DLM lock - see here[2].

When the process is killed on one node then it becomes zombie, but on the other node it still works w/o problem. When the process is killed also on the other node then it also becomes zombie. So the result is unkillable zombie process on both nodes.

When one node is rebooted, the zombie process on the other node finishes automatically.

I saved /sys/kernel/debug/gfs2/jbm:jbm/glocks before kill and after kill and there are no differences. You can find this file from both nodes attached.



[1]
The following files are opened when the process becomes zombie:
Node A:
java      29389 therfert   45u      REG              253,4  1048576    2394856 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-bindings-1.bindings
java      29389 therfert   57u      REG              253,4  1048576    2395114 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-bindings-2.bindings
java      29389 therfert   64u      REG              253,4 10485760    2395372 /mnt/jbm/common/therfert/nodeA/hornetq/journal/hornetq-data-1.hq
java      29389 therfert   65u      REG              253,4 10485760    2397940 /mnt/jbm/common/therfert/nodeA/hornetq/journal/hornetq-data-2.hq
java      29389 therfert   81u      REG              253,4  1048576    2421052 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-jms-1.jms
java      29389 therfert   84u      REG              253,4  1048576    2421310 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-jms-2.jms
java      29389 therfert   90uw     REG              253,4       19    2394838 /mnt/jbm/common/therfert/nodeB/hornetq/journal/server.lock
java      29389 therfert  118uw     REG              253,4       19    2394846 /mnt/jbm/common/therfert/nodeA/hornetq/journal/server.lock

Node B:
java      29389 therfert  118uw     REG              253,4       19    2394846 /mnt/jbm/common/therfert/nodeA/hornetq/journal/server.lock
java      29389 therfert   45u      REG              253,4  1048576    2394856 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-bindings-1.bindings
java      29389 therfert   57u      REG              253,4  1048576    2395114 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-bindings-2.bindings
java      29389 therfert   64u      REG              253,4 10485760    2395372 /mnt/jbm/common/therfert/nodeA/hornetq/journal/hornetq-data-1.hq
java      29389 therfert   65u      REG              253,4 10485760    2397940 /mnt/jbm/common/therfert/nodeA/hornetq/journal/hornetq-data-2.hq
java      29389 therfert   81u      REG              253,4  1048576    2421052 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-jms-1.jms
java      29389 therfert   84u      REG              253,4  1048576    2421310 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-jms-2.jms
java      29389 therfert   90uw     REG              253,4       19    2394838 /mnt/jbm/common/therfert/nodeB/hornetq/journal/server.lock


[2]
INFO: task java:29180 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D 0000000000000000  3432 29180  28852 0x00000084
 ffff8804e233fd98 0000000000000046 0000000000000000 ffffffffa05b6ec0
 ffffffffa05b6ed8 0000000000000046 ffff88003b7d6fd8 00000001007ffeaf
 ffff8804e21e1500 ffff8804e233ffd8 0000000000010608 ffff8804e21e1500
Call Trace:
 [<ffffffffa05aace5>] dlm_posix_lock+0x1b5/0x2d0 [dlm]
 [<ffffffff81096e90>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa05e247b>] gfs2_lock+0x7b/0xf0 [gfs2]
 [<ffffffff811d4623>] vfs_lock_file+0x23/0x40
 [<ffffffff811d487f>] fcntl_setlk+0x17f/0x340
 [<ffffffff8119b65d>] sys_fcntl+0x19d/0x580
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b




Version-Release number of selected component (if applicable):
RHEL6.0 up to date.
Kernel: 2.6.32-71.14.1.el6.x86_64.debug

How reproducible:
https://issues.jboss.org/browse/JBPAPP-5956

Actual results:
hanged zombie process after kill

Expected results:
process should be killed immediately

Additional info:
Let me know what else you need.

Thanks
Tomas

--- Additional comment from pm-rhel on 2011-02-18 09:43:34 EST ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from swhiteho on 2011-02-18 09:53:27 EST ---

Can you tell me which NICs the cluster is using, and which firmware version the NICs are using, if applicable.

I assume from the report that the application is making use of fcntl POSIX locks. That appears to be the code path in question. Depending on the application requirements, there may be better solutions. Can you confirm that there was no process which was holding a lock and thus blocking the process in question?

--- Additional comment from teigland on 2011-02-18 10:24:19 EST ---

To debug a posix lock problem, the following information from both nodes would be a good start:

cman_tool nodes
corosync-objctl
group_tool -n
dlm_tool log_plock
dlm_tool plocks <fsname>
/etc/cluster/cluster.conf
/var/log/messages
/proc/locks
ps ax -o pid,stat,cmd,wchan

--- Additional comment from therfert on 2011-02-18 10:37:31 EST ---

(In reply to comment #2)
> Can you tell me which NICs the cluster is using, and which firmware version the
> NICs are using, if applicable.

Used NICs: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz

> 
> I assume from the report that the application is making use of fcntl POSIX
> locks. That appears to be the code path in question. Depending on the
> application requirements, there may be better solutions. Can you confirm that
> there was no process which was holding a lock and thus blocking the process in
> question?

There is only one process on each node which access the shared files as I explained. However I don't know what operations exactly those processes are doing.

--- Additional comment from therfert on 2011-02-18 10:52:20 EST ---

Created attachment 479549 [details]
Resuls of the commands/content of the files you wanted in comment #3

Please find the results attached. 

nodeA = messaging-22
nodeB = messaging-23


(In reply to comment #3)
> To debug a posix lock problem, the following information from both nodes would
> be a good start:
> 
> cman_tool nodes
> corosync-objctl
> group_tool -n
> dlm_tool log_plock
> dlm_tool plocks <fsname>
> /etc/cluster/cluster.conf
> /var/log/messages
> /proc/locks
> ps ax -o pid,stat,cmd,wchan

--- Additional comment from therfert on 2011-02-18 11:02:48 EST ---

When I was looking at result of "corosync-objctl", I noticed there is multicast address "totem.interface.mcastaddr=239.192.149.169".

I hadn't expected it uses muticast when it wasn't configured anywhere.

Multicast is routed via different interface - bond0.
It is bonding interface (mode=balance-tlb) which uses two the following physical interfaces:

Intel Corporation 82598EB 10-Gigabit AF Dual Port Network Connection



(In reply to comment #4)
> (In reply to comment #2)
> > Can you tell me which NICs the cluster is using, and which firmware version the
> > NICs are using, if applicable.
> 
> Used NICs: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz
> 
> > 
> > I assume from the report that the application is making use of fcntl POSIX
> > locks. That appears to be the code path in question. Depending on the
> > application requirements, there may be better solutions. Can you confirm that
> > there was no process which was holding a lock and thus blocking the process in
> > question?
> 
> There is only one process on each node which access the shared files as I
> explained. However I don't know what operations exactly those processes are
> doing.

--- Additional comment from teigland on 2011-02-18 11:38:52 EST ---

It appears that the application has gotten itself into a standard A,B / B,A deadlock with posix locks.  Our clustered posix locks do not do EDEADLK detection of posix locks.

2394838 WR 2-2 nodeid 22 pid 29389 owner ffff8804de8c2368 rown 0
2394838 WR 1-1 nodeid 23 pid 28907 owner ffff88050fc29050 rown 0
2394838 WR 1-1 nodeid 22 pid 29389 owner ffff8804de8c2368 rown 0 WAITING
2394846 WR 1-1 nodeid 22 pid 29389 owner ffff8804de8c2368 rown 0
2394846 WR 2-2 nodeid 23 pid 28907 owner ffff88050fc29050 rown 0
2394846 WR 1-1 nodeid 23 pid 28907 owner ffff88050fc29050 rown 0 WAITING

> 38.1 = byte 1 of inode 38 (2394838) = /mnt/jbm/common/therfert/nodeB/hornetq/journal/server.lock
> 38.2 = byte 2 of same

> 46.1 = byte 1 of inode 46 (2394846) = /mnt/jbm/common/therfert/nodeA/hornetq/journal/server.lock
> 46.2 = byte 2 of same

node 22 holds WR on 38.2, 46.1
node 22 waits WR on 38.1 (held by node 23)

node 23 holds WR on 38.1, 46.2
node 23 waits WR on 46.1 (held by node 22)

The standard deadlock avoidance methods (lock ordering, non-blocking requests) are the only two options for avoiding this.

--- Additional comment from therfert on 2011-02-18 12:23:59 EST ---

Thanks much for this fast finding, Dave.

--- Additional comment from rrajasek on 2011-02-18 14:02:57 EST ---

Comment from Andy Taylor from HornetQ team

we basically hold 2 locks at bytes 1 and 2, the live server obtains the lock at byte 1 and holds it, the backup lock holds the lock at byte 2 and then hold it, the backup then tries the lock at byte 1 and will block until the live server dies. Once the live server dies and the java process is killed the lock at byte 1 is removed and the backup server obtains it and releases its backup lock for other backup nodes. We are using the java file channel classes for doing the locking so if the lock is not released once the java process running the live server is killed then this is an issue with how the OS works with the file system. I'm not an expert in this field so can't really comment on that. However the following code should recreate the issue, run it twice from the same directory and then kill one node.

File file = new File(".", "server.lock");

      if (!file.exists())
      {
         file.createNewFile();
      }

      RandomAccessFile raFile = new RandomAccessFile(file, "rw");

      FileChannel channel = raFile.getChannel();

      channel.lock(1, 1, false);

      System.out.println("lock obtained");

--- Additional comment from swhiteho on 2011-02-18 14:59:41 EST ---

If I've understood this correctly, the problem is that the fcntl F_SETLKW call on GFS2 is not interruptible? The Linux man page says:

      F_SETLKW (struct flock *)
              As for F_SETLK, but if a conflicting lock is held on  the  file,
              then  wait  for that lock to be released.  If a signal is caught
              while waiting, then the call is interrupted and (after the  sig-
              nal handler has returned) returns immediately (with return value
              -1 and errno set to EINTR; see signal(7)).

So I'd expect that the process should be able to continue and not get stuck as a zombie if it is killed. I'm not sure whether that is a requirement of fcntl locks or something that is Linux specific though.

We have a bug open for flock which has a similar non-interruptible problem in bz #472380 but so far as I know, nobody has an application for which that matters.

I'm surprised that the fcntl lock implementation is not interruptible though, since we use the library functions provided like other filesystems. If we can confirm the problem, then I'll have a look at it shortly.

I'm currently travelling otherwise I'd check the POSIX docs to see what they have to say as well.

--- Additional comment from teigland on 2011-02-18 15:33:10 EST ---

It's using wait_event() to wait for a response from dlm_controld.  One hard part about wait_event_interruptible() is the kernel and userland state getting out of sync, i.e. corrupted lock state.  But perhaps we could handle process termination as a special case since all lock state is being cleared.

If the dlm could check whether process termination was the cause of wait_event_interruptible returning, then it could possibly let dlm_controld know somehow, so that dlm_controld could do a special lock cleanup.

--- Additional comment from rrajasek on 2011-02-21 14:22:29 EST ---

Can somebody please comment if this could be fixed on the GFS2 side or do you still need changes on the application side?

--- Additional comment from teigland on 2011-02-21 14:45:19 EST ---

It will require quite a bit of work to know if it's possible to handle this in dlm/gfs2 or not.  If it is possible, the change will probably be too complicated to have ready any time soon.

--- Additional comment from anmiller on 2011-02-22 17:27:32 EST ---

(In reply to comment #13)
> It will require quite a bit of work to know if it's possible to handle this in
> dlm/gfs2 or not.  If it is possible, the change will probably be too
> complicated to have ready any time soon.

We potentially have a customer that this issue will effect.  What kind of priority does this have?

--- Additional comment from swhiteho on 2011-02-23 09:12:23 EST ---

It has a priority which depends on how important it is to our customers ;-) So please let us know if someone is waiting on it, and we'll make sure that it receives appropriate treatment.

--- Additional comment from anmiller on 2011-02-23 09:43:25 EST ---

(In reply to comment #15)
> It has a priority which depends on how important it is to our customers ;-) So
> please let us know if someone is waiting on it, and we'll make sure that it
> receives appropriate treatment.

The customer that will need this is Putnam, which just recently became an EAP customer, and are moving their EAP stuff to RHEL too.  To support this "shared storage" high-availability configuration, they need to use GFS2.

--- Additional comment from teigland on 2011-02-23 12:01:58 EST ---

Based on the explanation of what these locks are used for (detecting a node failure), there's another possibly simple way of avoiding the deadlock:  use two different processes to lock the two files.

--- Additional comment from sbradley on 2011-02-23 15:01:34 EST ---

An internal rh person opened a case in salesforce which was not needed. They were asking what information needs to be gather to troubleshoot this issue. Here was my response in the case:

We need to gather lockdump information. Please gather all the information in this article and please package up all the data as article states.

https://access.redhat.com/kb/docs/DOC-35673

Gather all the data when the issue occurs and attach to case or dropbox. Please make sure all data is packaged into 1 file because this information needs to stay together. Please note that this article has not been fully tested. If you have problems then please note them for us and we help correct them.

--- Additional comment from jawilson on 2011-02-23 16:02:34 EST ---

I'm working with Rajesh and team to provide this information.

The KBase doc in question isn't viewable by all (for some reason), so I'm attaching it to this BZ in HTML format.

--- Additional comment from anmiller on 2011-02-23 16:27:50 EST ---

(In reply to comment #17)
> Based on the explanation of what these locks are used for (detecting a node
> failure), there's another possibly simple way of avoiding the deadlock:  use
> two different processes to lock the two files.

There are two different processes.  The first JVM (process) on NodeA is the active node, and the second JVM (process) is on NodeB, which is the backup node.  So, there are two processes on two different nodes locking the file at two positions.

--- Additional comment from teigland on 2011-02-23 16:46:38 EST ---

I mean use more than one process on each node.

--- Additional comment from anmiller on 2011-02-23 17:11:00 EST ---

(In reply to comment #20)
> (In reply to comment #17)
> > Based on the explanation of what these locks are used for (detecting a node
> > failure), there's another possibly simple way of avoiding the deadlock:  use
> > two different processes to lock the two files.
> 
> There are two different processes.  The first JVM (process) on NodeA is the
> active node, and the second JVM (process) is on NodeB, which is the backup
> node.  So, there are two processes on two different nodes locking the file at
> two positions.

Okay, but that's actually possible.

--- Additional comment from anmiller on 2011-02-23 17:50:35 EST ---

(In reply to comment #22)
> (In reply to comment #20)
> > (In reply to comment #17)
> > > Based on the explanation of what these locks are used for (detecting a node
> > > failure), there's another possibly simple way of avoiding the deadlock:  use
> > > two different processes to lock the two files.
> > 
> > There are two different processes.  The first JVM (process) on NodeA is the
> > active node, and the second JVM (process) is on NodeB, which is the backup
> > node.  So, there are two processes on two different nodes locking the file at
> > two positions.
> 
> Okay, but that's actually possible.

I meant NOT possible.

--- Additional comment from teigland on 2011-02-23 17:59:55 EST ---

> we basically hold 2 locks at bytes 1 and 2, the live server obtains the lock at
> byte 1 and holds it, the backup lock holds the lock at byte 2 and then hold it,
> the backup then tries the lock at byte 1 and will block until the live server
> dies. Once the live server dies and the java process is killed the lock at byte
> 1 is removed and the backup server obtains it and releases its backup lock for
> other backup nodes

> 38.1 = byte 1 of inode 38 (2394838) = /mnt/jbm/common/therfert/nodeB/hornetq/journal/server.lock
> 38.2 = byte 2 of same
> 46.1 = byte 1 of inode 46 (2394846) = /mnt/jbm/common/therfert/nodeA/hornetq/journal/server.lock
> 46.2 = byte 2 of same

> node 22 holds WR on 38.2, 46.1
> node 22 waits WR on 38.1 (held by node 23)
> node 23 holds WR on 38.1, 46.2
> node 23 waits WR on 46.1 (held by node 22)

deadlock:

node22,pid1: hold lock live-nodeA    (fileA,byte1)
node22,pid1: wait lock live-nodeB    (fileB,byte1)
node22,pid1: hold lock backup-nodeB  (fileB,byte2)

node23,pid1: hold lock live-nodeB    (fileB,byte1)
node23,pid1: wait lock live-nodeA    (fileA,byte1)
node23,pid1: hold lock backup-nodeA  (fileA,byte2)

no deadlock:

node22,pid1: hold lock live-nodeA    (fileA,byte1)
node22,pid2: wait lock live-nodeB    (fileB,byte1)
node22,pid3: hold lock backup-nodeB  (fileB,byte2)

node23,pid1: hold lock live-nodeB    (fileB,byte1)
node23,pid2: wait lock live-nodeA    (fileA,byte1)
node23,pid3: hold lock backup-nodeA  (fileA,byte2)

What I was suggesting above, is that you could fork/exec a process to acquire each of the server locks to avoid the deadlock.

(I am not suggesting that we shouldn't try to handle this in gfs/dlm; I very much think we should.  I am simply offering possible workarounds.)

--- Additional comment from jawilson on 2011-02-24 02:16:30 EST ---

Created attachment 480675 [details]
Artcile on Information Needed That JBoss EAP QA Will Provide

--- Additional comment from jwest on 2011-02-24 11:45:25 EST ---

Marking this as an exception for 6.1 work.

--- Additional comment from anmiller on 2011-02-24 11:59:51 EST ---

(In reply to comment #24)
> > we basically hold 2 locks at bytes 1 and 2, the live server obtains the lock at
> > byte 1 and holds it, the backup lock holds the lock at byte 2 and then hold it,
> > the backup then tries the lock at byte 1 and will block until the live server
> > dies. Once the live server dies and the java process is killed the lock at byte
> > 1 is removed and the backup server obtains it and releases its backup lock for
> > other backup nodes
> 
> > 38.1 = byte 1 of inode 38 (2394838) = /mnt/jbm/common/therfert/nodeB/hornetq/journal/server.lock
> > 38.2 = byte 2 of same
> > 46.1 = byte 1 of inode 46 (2394846) = /mnt/jbm/common/therfert/nodeA/hornetq/journal/server.lock
> > 46.2 = byte 2 of same
> 
> > node 22 holds WR on 38.2, 46.1
> > node 22 waits WR on 38.1 (held by node 23)
> > node 23 holds WR on 38.1, 46.2
> > node 23 waits WR on 46.1 (held by node 22)
> 
> deadlock:
> 
> node22,pid1: hold lock live-nodeA    (fileA,byte1)
> node22,pid1: wait lock live-nodeB    (fileB,byte1)
> node22,pid1: hold lock backup-nodeB  (fileB,byte2)
> 
> node23,pid1: hold lock live-nodeB    (fileB,byte1)
> node23,pid1: wait lock live-nodeA    (fileA,byte1)
> node23,pid1: hold lock backup-nodeA  (fileA,byte2)
> 
> no deadlock:
> 
> node22,pid1: hold lock live-nodeA    (fileA,byte1)
> node22,pid2: wait lock live-nodeB    (fileB,byte1)
> node22,pid3: hold lock backup-nodeB  (fileB,byte2)
> 
> node23,pid1: hold lock live-nodeB    (fileB,byte1)
> node23,pid2: wait lock live-nodeA    (fileA,byte1)
> node23,pid3: hold lock backup-nodeA  (fileA,byte2)
> 
> What I was suggesting above, is that you could fork/exec a process to acquire
> each of the server locks to avoid the deadlock.
> 
> (I am not suggesting that we shouldn't try to handle this in gfs/dlm; I very
> much think we should.  I am simply offering possible workarounds.)

I appreciate the attempt at a workaround, by the process is a Java Virtual Machine.  Doing a fork/exec from that is not a feasible workaround.

--- Additional comment from swhiteho on 2011-02-24 12:11:07 EST ---

I think it is very unlikely that we'll be able to make F_SETLKW interruptible for 6.1 bearing in mind the other items on the list at the moment.

--- Additional comment from teigland on 2011-03-02 17:50:36 EST ---

Created attachment 481967 [details]
kernel patch

Experimental kernel patch to allow a process blocked on plock to be killed, and cleaned up.

--- Additional comment from teigland on 2011-03-02 17:51:44 EST ---

Created attachment 481968 [details]
dlm_controld patch

dlm_controld patch that goes along with the previous kernel patch.

--- Additional comment from teigland on 2011-03-02 18:01:59 EST ---

Using the two patches in comments 29 and 30, I created a simple AB/BA deadlock with plocks, resolved it by killing one of the processes, and had all the lock state properly cleaned up.  I expect this will resolve the specific problem in this bz.  The patches don't have any testing beyond the trivial proof of concept test.  This also includes a change to the user/kernel interface, but I don't believe it creates any incompatibilities with previous user or kernel versions.

--- Additional comment from swhiteho on 2011-03-03 05:29:27 EST ---

The only question is what if we have a "new" kernel with "old" userspace?

So far as I can see all the other combinations would work correctly, and otherwise this looks like a good solution.

--- Additional comment from teigland on 2011-03-03 10:13:01 EST ---

With a new kernel and old userspace, the kernel would complain ("dev_write no op ...") when userspace responded to an unlock caused by a close (as opposed to an unlock the process called.)  It would still work.

--- Additional comment from swhiteho on 2011-03-03 10:17:21 EST ---

Ok, excellent. Sounds good.

--- Additional comment from teigland on 2011-03-10 11:34:23 EST ---

I'm waiting to do anything further with this patch until I hear if it works.

--- Additional comment from mharvey on 2011-03-10 12:19:36 EST ---

Dave,  Can you please provide a test package with these patches applied so that middleware qe can apply the new package to our test environment and execute our tests?  Thanks,  Mike

--- Additional comment from teigland on 2011-03-14 16:47:42 EDT ---

scratch kernel build including patch
https://brewweb.devel.redhat.com/taskinfo?taskID=3177437

here's a dlm_controld x86_64 binary with the patch; I'm hoping this will work, but I'm not certain
http://people.redhat.com/~teigland/dlm_controld

--- Additional comment from lhh on 2011-03-18 10:33:34 EDT ---

Clock's ticking; need feedback on Dave's patches if we want these in 6.1.

We're already in the snapshot phase and this has kernel components *and* userspace components.

--- Additional comment from jwest on 2011-03-24 09:28:14 EDT ---

I've discussed this with the middleware team and at the moment their QE team is testing the workaround for this issue.  After that round of testing is done, they can start focus on testing the proposed solution for this bug.  Unfortunately this doesn't align against the fast approaching 6.1 GA date, so I'm going to move this out to 6.2.

--- Additional comment from swhiteho on 2011-04-13 15:29:35 EDT ---

I assume that we are fairly certain that 6.2 is a reasonable target? I'm attaching it to Sayan's tracker bug on that basis.

--- Additional comment from adas on 2011-05-17 14:09:25 EDT ---

http://download.devel.redhat.com/brewroot/scratch/adas/task_3328652/
build with patch in comment #29

Comment 1 David Teigland 2011-05-23 17:53:26 UTC

This issue requires a change to both dlm (in kernel) and dlm_controld (in cluster userspace).  I had thought the original bug 678585 was for the dlm kernel component, but it was actually for cluster.  I've cloned the original bz so we now have two bugs:

bug 678585: for dlm_controld in userspace
bug 707005: for dlm in kernel

Comment 2 RHEL Program Management 2011-05-23 18:00:43 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 3 David Teigland 2011-05-23 18:07:41 UTC

posted:
http://post-office.corp.redhat.com/archives/rhkernel-list/2011-May/msg00743.html

Comment 5 Nate Straz 2011-08-09 20:04:29 UTC

Verified against kernel-2.6.32-174.el6.x86_64

[root@buzz-01 shared]# /tmp/lockhold A B
Press enter to lock A

Attempting to lock A
Lock Acquired
Press enter to lock B

Attempting to lock B
^C

[root@buzz-02 shared]# /tmp/lockhold B A
Press enter to lock B

Attempting to lock B
Lock Acquired
Press enter to lock A

Attempting to lock A
Lock Acquired

Comment 6 Aristeu Rozanski 2011-08-11 20:08:41 UTC

Patch(es) available on kernel-2.6.32-156.el6

Comment 9 errata-xmlrpc 2011-12-06 13:32:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html

Note You need to log in before you can comment on or make changes to this bug.