Bug 197396 - dmeventd threads ignoring pthread_cancel causing cluster mirror recovery commands to fail
Summary: dmeventd threads ignoring pthread_cancel causing cluster mirror recovery comm...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper
Version: 4.0
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Jonathan Earl Brassow
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 181411
TreeView+ depends on / blocked
 
Reported: 2006-06-30 20:34 UTC by Jonathan Earl Brassow
Modified: 2010-01-12 02:16 UTC (History)
9 users (show)

Fixed In Version: RHBA-2006-0434
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-10 21:26:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to handle threads that ignore pthread_cancel (4.95 KB, text/x-patch)
2006-06-30 20:34 UTC, Jonathan Earl Brassow
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2006:0434 0 normal SHIPPED_LIVE device-mapper bug fix and enhancement update 2006-08-09 04:00:00 UTC

Description Jonathan Earl Brassow 2006-06-30 20:34:50 UTC
Mirroring uses a process called 'dmeventd' to monitor devices for failures. 
When dmeventd receives a request to monitor a device, it creates a new thread to
watch it.  Also, when a device no longer needs monitoring, dmeventd will reap
the thread.

Monitoring/unmonitoring events happen as a natural course of recovery while the
device is being reconfigured.

In the case of cluster mirrors, if a log device where to fail, dmeventd would
try to reduce the mirror from one with a disk log to one without.  This means
the device must be unmonitored and remonitored when the new device is ready. 
dmeventd would get stuck waiting for a thread to be reaped, and hang the
recovery command.  The overall command would then timeout leaving the bad mirror
in place.  Further, since the command to recover is hung on a particular node,
all lvm commands to that node hang waiting for a lock and subsequently fail.

Comment 1 Jonathan Earl Brassow 2006-06-30 20:34:51 UTC
Created attachment 131831 [details]
Patch to handle threads that ignore pthread_cancel

Comment 5 Alasdair Kergon 2006-07-05 17:57:14 UTC
included in 1.02.07-3.0

Comment 9 Red Hat Bugzilla 2006-08-10 21:26:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0434.html



Note You need to log in before you can comment on or make changes to this bug.