Red Hat Bugzilla – Bug 197396
dmeventd threads ignoring pthread_cancel causing cluster mirror recovery commands to fail
Last modified: 2010-01-11 21:16:33 EST
Mirroring uses a process called 'dmeventd' to monitor devices for failures.
When dmeventd receives a request to monitor a device, it creates a new thread to
watch it. Also, when a device no longer needs monitoring, dmeventd will reap
Monitoring/unmonitoring events happen as a natural course of recovery while the
device is being reconfigured.
In the case of cluster mirrors, if a log device where to fail, dmeventd would
try to reduce the mirror from one with a disk log to one without. This means
the device must be unmonitored and remonitored when the new device is ready.
dmeventd would get stuck waiting for a thread to be reaped, and hang the
recovery command. The overall command would then timeout leaving the bad mirror
in place. Further, since the command to recover is hung on a particular node,
all lvm commands to that node hang waiting for a lock and subsequently fail.
Created attachment 131831 [details]
Patch to handle threads that ignore pthread_cancel
included in 1.02.07-3.0
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.