Mirroring uses a process called 'dmeventd' to monitor devices for failures. When dmeventd receives a request to monitor a device, it creates a new thread to watch it. Also, when a device no longer needs monitoring, dmeventd will reap the thread. Monitoring/unmonitoring events happen as a natural course of recovery while the device is being reconfigured. In the case of cluster mirrors, if a log device where to fail, dmeventd would try to reduce the mirror from one with a disk log to one without. This means the device must be unmonitored and remonitored when the new device is ready. dmeventd would get stuck waiting for a thread to be reaped, and hang the recovery command. The overall command would then timeout leaving the bad mirror in place. Further, since the command to recover is hung on a particular node, all lvm commands to that node hang waiting for a lock and subsequently fail.
Created attachment 131831 [details] Patch to handle threads that ignore pthread_cancel
included in 1.02.07-3.0
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0434.html