678585 – dlm_controld: fcntl F_SETLKW should be interruptible in GFS2

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 678585 - dlm_controld: fcntl F_SETLKW should be interruptible in GFS2

Summary: dlm_controld: fcntl F_SETLKW should be interruptible in GFS2

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	cluster
Sub Component:
Version:	6.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	David Teigland
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	695824 707005
TreeView+	depends on / blocked

Reported:	2011-02-18 14:32 UTC by Tomas Herfert
Modified:	2018-11-14 14:20 UTC (History)
CC List:	21 users (show)
Fixed In Version:	cluster-3.0.12.1-2.el6
Doc Type:	Bug Fix
Doc Text:	Cause: gfs2 posix lock operations (implemented in dlm) are not interruptible when they wait for another posix lock. This was the way they were originally implemented because of simplicity. Consequence: processes that created a deadlock with posix locks, e.g. AB/BA, could not be killed to resolve the problem, and one node would need to be reset. Fix: the dlm uses a new kernel feature that allows the waiting process to be killed, and information about the killed process is now passed to dlm_controld so it can clean up. Result: processes deadlocked on gfs2 posix locks can now be recovered by killing one or more of them.
Clone Of:
Clones:	707005 (view as bug list)
Environment:
Last Closed:	2011-12-06 14:50:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
/sys/kernel/debug/gfs2/jbm:jbm/glocks from both nodes (75.02 KB, application/x-gzip) 2011-02-18 14:32 UTC, Tomas Herfert	no flags	Details
Resuls of the commands/content of the files you wanted in comment #3 (88.14 KB, application/x-gzip) 2011-02-18 15:52 UTC, Tomas Herfert	no flags	Details
kernel patch (4.89 KB, text/plain) 2011-03-02 22:50 UTC, David Teigland	no flags	Details
dlm_controld patch (2.11 KB, text/plain) 2011-03-02 22:51 UTC, David Teigland	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Legacy)	65912	0	None	None	None	Never
Red Hat Product Errata	RHBA-2011:1516	0	normal	SHIPPED_LIVE	cluster and gfs2-utils bug fix update	2011-12-06 00:51:09 UTC

Description Tomas Herfert 2011-02-18 14:32:46 UTC

Created attachment 479523 [details]
/sys/kernel/debug/gfs2/jbm:jbm/glocks from both nodes

Description of problem:

JBoss QA guys have the following problem when they are testing failover of HornetQ application.

Testing is based on RHEL6 cluster, clvmd and GFS2.

The application is running on two nodes. There is one process on each node. Processes on both nodes use shared files[1] on the GFS2 filesystem.

The problem is that when the process is killed (on any of the nodes) by SIGKILL (kill -9) it always becomes zombie, which can't be killed at all.

It seems it is hanged on some GFS2/DLM lock - see here[2].

When the process is killed on one node then it becomes zombie, but on the other node it still works w/o problem. When the process is killed also on the other node then it also becomes zombie. So the result is unkillable zombie process on both nodes.

When one node is rebooted, the zombie process on the other node finishes automatically.

I saved /sys/kernel/debug/gfs2/jbm:jbm/glocks before kill and after kill and there are no differences. You can find this file from both nodes attached.



[1]
The following files are opened when the process becomes zombie:
Node A:
java      29389 therfert   45u      REG              253,4  1048576    2394856 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-bindings-1.bindings
java      29389 therfert   57u      REG              253,4  1048576    2395114 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-bindings-2.bindings
java      29389 therfert   64u      REG              253,4 10485760    2395372 /mnt/jbm/common/therfert/nodeA/hornetq/journal/hornetq-data-1.hq
java      29389 therfert   65u      REG              253,4 10485760    2397940 /mnt/jbm/common/therfert/nodeA/hornetq/journal/hornetq-data-2.hq
java      29389 therfert   81u      REG              253,4  1048576    2421052 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-jms-1.jms
java      29389 therfert   84u      REG              253,4  1048576    2421310 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-jms-2.jms
java      29389 therfert   90uw     REG              253,4       19    2394838 /mnt/jbm/common/therfert/nodeB/hornetq/journal/server.lock
java      29389 therfert  118uw     REG              253,4       19    2394846 /mnt/jbm/common/therfert/nodeA/hornetq/journal/server.lock

Node B:
java      29389 therfert  118uw     REG              253,4       19    2394846 /mnt/jbm/common/therfert/nodeA/hornetq/journal/server.lock
java      29389 therfert   45u      REG              253,4  1048576    2394856 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-bindings-1.bindings
java      29389 therfert   57u      REG              253,4  1048576    2395114 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-bindings-2.bindings
java      29389 therfert   64u      REG              253,4 10485760    2395372 /mnt/jbm/common/therfert/nodeA/hornetq/journal/hornetq-data-1.hq
java      29389 therfert   65u      REG              253,4 10485760    2397940 /mnt/jbm/common/therfert/nodeA/hornetq/journal/hornetq-data-2.hq
java      29389 therfert   81u      REG              253,4  1048576    2421052 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-jms-1.jms
java      29389 therfert   84u      REG              253,4  1048576    2421310 /mnt/jbm/common/therfert/nodeA/hornetq/bindings/hornetq-jms-2.jms
java      29389 therfert   90uw     REG              253,4       19    2394838 /mnt/jbm/common/therfert/nodeB/hornetq/journal/server.lock


[2]
INFO: task java:29180 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java          D 0000000000000000  3432 29180  28852 0x00000084
 ffff8804e233fd98 0000000000000046 0000000000000000 ffffffffa05b6ec0
 ffffffffa05b6ed8 0000000000000046 ffff88003b7d6fd8 00000001007ffeaf
 ffff8804e21e1500 ffff8804e233ffd8 0000000000010608 ffff8804e21e1500
Call Trace:
 [<ffffffffa05aace5>] dlm_posix_lock+0x1b5/0x2d0 [dlm]
 [<ffffffff81096e90>] ? autoremove_wake_function+0x0/0x40
 [<ffffffffa05e247b>] gfs2_lock+0x7b/0xf0 [gfs2]
 [<ffffffff811d4623>] vfs_lock_file+0x23/0x40
 [<ffffffff811d487f>] fcntl_setlk+0x17f/0x340
 [<ffffffff8119b65d>] sys_fcntl+0x19d/0x580
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b




Version-Release number of selected component (if applicable):
RHEL6.0 up to date.
Kernel: 2.6.32-71.14.1.el6.x86_64.debug

How reproducible:
https://issues.jboss.org/browse/JBPAPP-5956

Actual results:
hanged zombie process after kill

Expected results:
process should be killed immediately

Additional info:
Let me know what else you need.

Thanks
Tomas

Comment 2 Steve Whitehouse 2011-02-18 14:53:27 UTC

Can you tell me which NICs the cluster is using, and which firmware version the NICs are using, if applicable.

I assume from the report that the application is making use of fcntl POSIX locks. That appears to be the code path in question. Depending on the application requirements, there may be better solutions. Can you confirm that there was no process which was holding a lock and thus blocking the process in question?

Comment 3 David Teigland 2011-02-18 15:24:19 UTC

To debug a posix lock problem, the following information from both nodes would be a good start:

cman_tool nodes
corosync-objctl
group_tool -n
dlm_tool log_plock
dlm_tool plocks <fsname>
/etc/cluster/cluster.conf
/var/log/messages
/proc/locks
ps ax -o pid,stat,cmd,wchan

Comment 4 Tomas Herfert 2011-02-18 15:37:31 UTC

(In reply to comment #2)
> Can you tell me which NICs the cluster is using, and which firmware version the
> NICs are using, if applicable.

Used NICs: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz

> 
> I assume from the report that the application is making use of fcntl POSIX
> locks. That appears to be the code path in question. Depending on the
> application requirements, there may be better solutions. Can you confirm that
> there was no process which was holding a lock and thus blocking the process in
> question?

There is only one process on each node which access the shared files as I explained. However I don't know what operations exactly those processes are doing.

Comment 5 Tomas Herfert 2011-02-18 15:52:20 UTC

Created attachment 479549 [details]
Resuls of the commands/content of the files you wanted in comment #3

Please find the results attached. 

nodeA = messaging-22
nodeB = messaging-23


(In reply to comment #3)
> To debug a posix lock problem, the following information from both nodes would
> be a good start:
> 
> cman_tool nodes
> corosync-objctl
> group_tool -n
> dlm_tool log_plock
> dlm_tool plocks <fsname>
> /etc/cluster/cluster.conf
> /var/log/messages
> /proc/locks
> ps ax -o pid,stat,cmd,wchan

Comment 6 Tomas Herfert 2011-02-18 16:02:48 UTC

When I was looking at result of "corosync-objctl", I noticed there is multicast address "totem.interface.mcastaddr=239.192.149.169".

I hadn't expected it uses muticast when it wasn't configured anywhere.

Multicast is routed via different interface - bond0.
It is bonding interface (mode=balance-tlb) which uses two the following physical interfaces:

Intel Corporation 82598EB 10-Gigabit AF Dual Port Network Connection



(In reply to comment #4)
> (In reply to comment #2)
> > Can you tell me which NICs the cluster is using, and which firmware version the
> > NICs are using, if applicable.
> 
> Used NICs: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz
> 
> > 
> > I assume from the report that the application is making use of fcntl POSIX
> > locks. That appears to be the code path in question. Depending on the
> > application requirements, there may be better solutions. Can you confirm that
> > there was no process which was holding a lock and thus blocking the process in
> > question?
> 
> There is only one process on each node which access the shared files as I
> explained. However I don't know what operations exactly those processes are
> doing.

Comment 7 David Teigland 2011-02-18 16:38:52 UTC

It appears that the application has gotten itself into a standard A,B / B,A deadlock with posix locks.  Our clustered posix locks do not do EDEADLK detection of posix locks.

2394838 WR 2-2 nodeid 22 pid 29389 owner ffff8804de8c2368 rown 0
2394838 WR 1-1 nodeid 23 pid 28907 owner ffff88050fc29050 rown 0
2394838 WR 1-1 nodeid 22 pid 29389 owner ffff8804de8c2368 rown 0 WAITING
2394846 WR 1-1 nodeid 22 pid 29389 owner ffff8804de8c2368 rown 0
2394846 WR 2-2 nodeid 23 pid 28907 owner ffff88050fc29050 rown 0
2394846 WR 1-1 nodeid 23 pid 28907 owner ffff88050fc29050 rown 0 WAITING

> 38.1 = byte 1 of inode 38 (2394838) = /mnt/jbm/common/therfert/nodeB/hornetq/journal/server.lock
> 38.2 = byte 2 of same

> 46.1 = byte 1 of inode 46 (2394846) = /mnt/jbm/common/therfert/nodeA/hornetq/journal/server.lock
> 46.2 = byte 2 of same

node 22 holds WR on 38.2, 46.1
node 22 waits WR on 38.1 (held by node 23)

node 23 holds WR on 38.1, 46.2
node 23 waits WR on 46.1 (held by node 22)

The standard deadlock avoidance methods (lock ordering, non-blocking requests) are the only two options for avoiding this.

Comment 8 Tomas Herfert 2011-02-18 17:23:59 UTC

Thanks much for this fast finding, Dave.

Comment 9 Rajesh Rajasekaran 2011-02-18 19:02:57 UTC

Comment from Andy Taylor from HornetQ team

we basically hold 2 locks at bytes 1 and 2, the live server obtains the lock at byte 1 and holds it, the backup lock holds the lock at byte 2 and then hold it, the backup then tries the lock at byte 1 and will block until the live server dies. Once the live server dies and the java process is killed the lock at byte 1 is removed and the backup server obtains it and releases its backup lock for other backup nodes. We are using the java file channel classes for doing the locking so if the lock is not released once the java process running the live server is killed then this is an issue with how the OS works with the file system. I'm not an expert in this field so can't really comment on that. However the following code should recreate the issue, run it twice from the same directory and then kill one node.

File file = new File(".", "server.lock");

      if (!file.exists())
      {
         file.createNewFile();
      }

      RandomAccessFile raFile = new RandomAccessFile(file, "rw");

      FileChannel channel = raFile.getChannel();

      channel.lock(1, 1, false);

      System.out.println("lock obtained");

Comment 10 Steve Whitehouse 2011-02-18 19:59:41 UTC

If I've understood this correctly, the problem is that the fcntl F_SETLKW call on GFS2 is not interruptible? The Linux man page says:

      F_SETLKW (struct flock *)
              As for F_SETLK, but if a conflicting lock is held on  the  file,
              then  wait  for that lock to be released.  If a signal is caught
              while waiting, then the call is interrupted and (after the  sig-
              nal handler has returned) returns immediately (with return value
              -1 and errno set to EINTR; see signal(7)).

So I'd expect that the process should be able to continue and not get stuck as a zombie if it is killed. I'm not sure whether that is a requirement of fcntl locks or something that is Linux specific though.

We have a bug open for flock which has a similar non-interruptible problem in bz #472380 but so far as I know, nobody has an application for which that matters.

I'm surprised that the fcntl lock implementation is not interruptible though, since we use the library functions provided like other filesystems. If we can confirm the problem, then I'll have a look at it shortly.

I'm currently travelling otherwise I'd check the POSIX docs to see what they have to say as well.

Comment 11 David Teigland 2011-02-18 20:33:10 UTC

It's using wait_event() to wait for a response from dlm_controld.  One hard part about wait_event_interruptible() is the kernel and userland state getting out of sync, i.e. corrupted lock state.  But perhaps we could handle process termination as a special case since all lock state is being cleared.

If the dlm could check whether process termination was the cause of wait_event_interruptible returning, then it could possibly let dlm_controld know somehow, so that dlm_controld could do a special lock cleanup.

Comment 12 Rajesh Rajasekaran 2011-02-21 19:22:29 UTC

Can somebody please comment if this could be fixed on the GFS2 side or do you still need changes on the application side?

Comment 13 David Teigland 2011-02-21 19:45:19 UTC

It will require quite a bit of work to know if it's possible to handle this in dlm/gfs2 or not.  If it is possible, the change will probably be too complicated to have ready any time soon.

Comment 14 Andrig T Miller 2011-02-22 22:27:32 UTC

(In reply to comment #13)
> It will require quite a bit of work to know if it's possible to handle this in
> dlm/gfs2 or not.  If it is possible, the change will probably be too
> complicated to have ready any time soon.

We potentially have a customer that this issue will effect.  What kind of priority does this have?

Comment 17 David Teigland 2011-02-23 17:01:58 UTC

Based on the explanation of what these locks are used for (detecting a node failure), there's another possibly simple way of avoiding the deadlock:  use two different processes to lock the two files.

Comment 20 Andrig T Miller 2011-02-23 21:27:50 UTC

(In reply to comment #17)
> Based on the explanation of what these locks are used for (detecting a node
> failure), there's another possibly simple way of avoiding the deadlock:  use
> two different processes to lock the two files.

There are two different processes.  The first JVM (process) on NodeA is the active node, and the second JVM (process) is on NodeB, which is the backup node.  So, there are two processes on two different nodes locking the file at two positions.

Comment 21 David Teigland 2011-02-23 21:46:38 UTC

I mean use more than one process on each node.

Comment 22 Andrig T Miller 2011-02-23 22:11:00 UTC

(In reply to comment #20)
> (In reply to comment #17)
> > Based on the explanation of what these locks are used for (detecting a node
> > failure), there's another possibly simple way of avoiding the deadlock:  use
> > two different processes to lock the two files.
> 
> There are two different processes.  The first JVM (process) on NodeA is the
> active node, and the second JVM (process) is on NodeB, which is the backup
> node.  So, there are two processes on two different nodes locking the file at
> two positions.

Okay, but that's actually possible.

Comment 23 Andrig T Miller 2011-02-23 22:50:35 UTC

(In reply to comment #22)
> (In reply to comment #20)
> > (In reply to comment #17)
> > > Based on the explanation of what these locks are used for (detecting a node
> > > failure), there's another possibly simple way of avoiding the deadlock:  use
> > > two different processes to lock the two files.
> > 
> > There are two different processes.  The first JVM (process) on NodeA is the
> > active node, and the second JVM (process) is on NodeB, which is the backup
> > node.  So, there are two processes on two different nodes locking the file at
> > two positions.
> 
> Okay, but that's actually possible.

I meant NOT possible.

Comment 24 David Teigland 2011-02-23 22:59:55 UTC

> we basically hold 2 locks at bytes 1 and 2, the live server obtains the lock at
> byte 1 and holds it, the backup lock holds the lock at byte 2 and then hold it,
> the backup then tries the lock at byte 1 and will block until the live server
> dies. Once the live server dies and the java process is killed the lock at byte
> 1 is removed and the backup server obtains it and releases its backup lock for
> other backup nodes

> 38.1 = byte 1 of inode 38 (2394838) = /mnt/jbm/common/therfert/nodeB/hornetq/journal/server.lock
> 38.2 = byte 2 of same
> 46.1 = byte 1 of inode 46 (2394846) = /mnt/jbm/common/therfert/nodeA/hornetq/journal/server.lock
> 46.2 = byte 2 of same

> node 22 holds WR on 38.2, 46.1
> node 22 waits WR on 38.1 (held by node 23)
> node 23 holds WR on 38.1, 46.2
> node 23 waits WR on 46.1 (held by node 22)

deadlock:

node22,pid1: hold lock live-nodeA    (fileA,byte1)
node22,pid1: wait lock live-nodeB    (fileB,byte1)
node22,pid1: hold lock backup-nodeB  (fileB,byte2)

node23,pid1: hold lock live-nodeB    (fileB,byte1)
node23,pid1: wait lock live-nodeA    (fileA,byte1)
node23,pid1: hold lock backup-nodeA  (fileA,byte2)

no deadlock:

node22,pid1: hold lock live-nodeA    (fileA,byte1)
node22,pid2: wait lock live-nodeB    (fileB,byte1)
node22,pid3: hold lock backup-nodeB  (fileB,byte2)

node23,pid1: hold lock live-nodeB    (fileB,byte1)
node23,pid2: wait lock live-nodeA    (fileA,byte1)
node23,pid3: hold lock backup-nodeA  (fileA,byte2)

What I was suggesting above, is that you could fork/exec a process to acquire each of the server locks to avoid the deadlock.

(I am not suggesting that we shouldn't try to handle this in gfs/dlm; I very much think we should.  I am simply offering possible workarounds.)

Comment 27 Andrig T Miller 2011-02-24 16:59:51 UTC

(In reply to comment #24)
> > we basically hold 2 locks at bytes 1 and 2, the live server obtains the lock at
> > byte 1 and holds it, the backup lock holds the lock at byte 2 and then hold it,
> > the backup then tries the lock at byte 1 and will block until the live server
> > dies. Once the live server dies and the java process is killed the lock at byte
> > 1 is removed and the backup server obtains it and releases its backup lock for
> > other backup nodes
> 
> > 38.1 = byte 1 of inode 38 (2394838) = /mnt/jbm/common/therfert/nodeB/hornetq/journal/server.lock
> > 38.2 = byte 2 of same
> > 46.1 = byte 1 of inode 46 (2394846) = /mnt/jbm/common/therfert/nodeA/hornetq/journal/server.lock
> > 46.2 = byte 2 of same
> 
> > node 22 holds WR on 38.2, 46.1
> > node 22 waits WR on 38.1 (held by node 23)
> > node 23 holds WR on 38.1, 46.2
> > node 23 waits WR on 46.1 (held by node 22)
> 
> deadlock:
> 
> node22,pid1: hold lock live-nodeA    (fileA,byte1)
> node22,pid1: wait lock live-nodeB    (fileB,byte1)
> node22,pid1: hold lock backup-nodeB  (fileB,byte2)
> 
> node23,pid1: hold lock live-nodeB    (fileB,byte1)
> node23,pid1: wait lock live-nodeA    (fileA,byte1)
> node23,pid1: hold lock backup-nodeA  (fileA,byte2)
> 
> no deadlock:
> 
> node22,pid1: hold lock live-nodeA    (fileA,byte1)
> node22,pid2: wait lock live-nodeB    (fileB,byte1)
> node22,pid3: hold lock backup-nodeB  (fileB,byte2)
> 
> node23,pid1: hold lock live-nodeB    (fileB,byte1)
> node23,pid2: wait lock live-nodeA    (fileA,byte1)
> node23,pid3: hold lock backup-nodeA  (fileA,byte2)
> 
> What I was suggesting above, is that you could fork/exec a process to acquire
> each of the server locks to avoid the deadlock.
> 
> (I am not suggesting that we shouldn't try to handle this in gfs/dlm; I very
> much think we should.  I am simply offering possible workarounds.)

I appreciate the attempt at a workaround, by the process is a Java Virtual Machine.  Doing a fork/exec from that is not a feasible workaround.

Comment 29 David Teigland 2011-03-02 22:50:36 UTC

Created attachment 481967 [details]
kernel patch

Experimental kernel patch to allow a process blocked on plock to be killed, and cleaned up.

Comment 30 David Teigland 2011-03-02 22:51:44 UTC

Created attachment 481968 [details]
dlm_controld patch

dlm_controld patch that goes along with the previous kernel patch.

Comment 31 David Teigland 2011-03-02 23:01:59 UTC

Using the two patches in comments 29 and 30, I created a simple AB/BA deadlock with plocks, resolved it by killing one of the processes, and had all the lock state properly cleaned up.  I expect this will resolve the specific problem in this bz.  The patches don't have any testing beyond the trivial proof of concept test.  This also includes a change to the user/kernel interface, but I don't believe it creates any incompatibilities with previous user or kernel versions.

Comment 32 Steve Whitehouse 2011-03-03 10:29:27 UTC

The only question is what if we have a "new" kernel with "old" userspace?

So far as I can see all the other combinations would work correctly, and otherwise this looks like a good solution.

Comment 33 David Teigland 2011-03-03 15:13:01 UTC

With a new kernel and old userspace, the kernel would complain ("dev_write no op ...") when userspace responded to an unlock caused by a close (as opposed to an unlock the process called.)  It would still work.

Comment 34 Steve Whitehouse 2011-03-03 15:17:21 UTC

Ok, excellent. Sounds good.

Comment 35 David Teigland 2011-03-10 16:34:23 UTC

I'm waiting to do anything further with this patch until I hear if it works.

Comment 36 Mike Harvey 2011-03-10 17:19:36 UTC

Dave,  Can you please provide a test package with these patches applied so that middleware qe can apply the new package to our test environment and execute our tests?  Thanks,  Mike

Comment 37 David Teigland 2011-03-14 20:47:42 UTC

scratch kernel build including patch
https://brewweb.devel.redhat.com/taskinfo?taskID=3177437

here's a dlm_controld x86_64 binary with the patch; I'm hoping this will work, but I'm not certain
http://people.redhat.com/~teigland/dlm_controld

Comment 41 Abhijith Das 2011-05-17 18:09:25 UTC

http://download.devel.redhat.com/brewroot/scratch/adas/task_3328652/
build with patch in comment #29

Comment 42 David Teigland 2011-05-23 17:53:22 UTC

This issue requires a change to both dlm (in kernel) and dlm_controld (in cluster userspace).  I had thought the original bug 678585 was for the dlm kernel component, but it was actually for cluster.  I've cloned the original bz so we now have two bugs:

bug 678585: for dlm_controld in userspace
bug 707005: for dlm in kernel

Comment 43 David Teigland 2011-05-23 18:06:59 UTC

posted:
https://www.redhat.com/archives/cluster-devel/2011-May/msg00068.html

(Waiting to push to cluster.git RHEL6 branch until 6.2 branch is ready.)

Comment 44 David Teigland 2011-05-24 14:55:48 UTC

To test, create a deadlock between two processes on separate nodes, then kill one of the processes:
node1: lock fileA
node2: lock fileB
node1: lock fileB
node2: lock fileA
kill process on node1, and node2 should get the lock on fileA

Comment 48 Nate Straz 2011-08-09 20:05:22 UTC

Verified against cman-3.0.12.1-7.el6.x86_64

[root@buzz-01 shared]# /tmp/lockhold A B
Press enter to lock A

Attempting to lock A
Lock Acquired
Press enter to lock B

Attempting to lock B
^C

[root@buzz-02 shared]# /tmp/lockhold B A
Press enter to lock B

Attempting to lock B
Lock Acquired
Press enter to lock A

Attempting to lock A
Lock Acquired

Comment 49 David Teigland 2011-10-27 14:21:54 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: gfs2 posix lock operations (implemented in dlm) are not interruptible when they wait for another posix lock.  This was the way they were originally implemented because of simplicity.
Consequence: processes that created a deadlock with posix locks, e.g. AB/BA, could not be killed to resolve the problem, and one node would need to be reset.
Fix: the dlm uses a new kernel feature that allows the waiting process to be killed, and information about the killed process is now passed to dlm_controld so it can clean up.
Result: processes deadlocked on gfs2 posix locks can now be recovered by killing one or more of them.

Comment 50 errata-xmlrpc 2011-12-06 14:50:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1516.html

Note You need to log in before you can comment on or make changes to this bug.

adas
anmiller
bmarzins
ccaulfie
cluster-maint
djansa
fdinitto
jawilson
jeder
jwest
lhh
mharvey
mschick
rdassen
rpeterso
rrajasek
rsvoboda
sbradley
swhiteho
teigland
therfert