197396 – dmeventd threads ignoring pthread_cancel causing cluster mirror recovery commands to fail

Bug 197396 - dmeventd threads ignoring pthread_cancel causing cluster mirror recovery commands to fail

Summary: dmeventd threads ignoring pthread_cancel causing cluster mirror recovery comm...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	device-mapper
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Jonathan Earl Brassow
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	181411
TreeView+	depends on / blocked

Reported:	2006-06-30 20:34 UTC by Jonathan Earl Brassow
Modified:	2010-01-12 02:16 UTC (History)
CC List:	9 users (show)
Fixed In Version:	RHBA-2006-0434
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-08-10 21:26:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Patch to handle threads that ignore pthread_cancel (4.95 KB, text/x-patch) 2006-06-30 20:34 UTC, Jonathan Earl Brassow	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2006:0434	0	normal	SHIPPED_LIVE	device-mapper bug fix and enhancement update	2006-08-09 04:00:00 UTC

Description Jonathan Earl Brassow 2006-06-30 20:34:50 UTC

Mirroring uses a process called 'dmeventd' to monitor devices for failures. 
When dmeventd receives a request to monitor a device, it creates a new thread to
watch it.  Also, when a device no longer needs monitoring, dmeventd will reap
the thread.

Monitoring/unmonitoring events happen as a natural course of recovery while the
device is being reconfigured.

In the case of cluster mirrors, if a log device where to fail, dmeventd would
try to reduce the mirror from one with a disk log to one without.  This means
the device must be unmonitored and remonitored when the new device is ready. 
dmeventd would get stuck waiting for a thread to be reaped, and hang the
recovery command.  The overall command would then timeout leaving the bad mirror
in place.  Further, since the command to recover is hung on a particular node,
all lvm commands to that node hang waiting for a lock and subsequently fail.

Comment 1 Jonathan Earl Brassow 2006-06-30 20:34:51 UTC

Created attachment 131831 [details]
Patch to handle threads that ignore pthread_cancel

Comment 5 Alasdair Kergon 2006-07-05 17:57:14 UTC

included in 1.02.07-3.0

Comment 9 Red Hat Bugzilla 2006-08-10 21:26:31 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0434.html

Note You need to log in before you can comment on or make changes to this bug.