Description of problem: I have a cluster running cluster mirroring. If I raise the I/O high enough and fail the primary side of the mirror, it generates too many messages for the machine to complete other critical tasks (like heartbeating for the cluster). The messages printed are: device-mapper: incrementing error_count on 253:3 ... This message is found in drivers/md/dm-raid1.c:fail_mirror(). We already get messages from the device subsystem (e.g. scsi0 (0:0): rejecting I/O to offline device); and the above is really unnecessary. (RHEL 5 has already pulled this message out.) The system becomes so busy processing this useless message that cluster members start to be removed. Once this happens, CLVM commands can not continue; resulting in a hung recovery process. The mirror never gets recovered, and LVM commands stop. Version-Release number of selected component (if applicable): kernel-2.6.9-42.EL How reproducible: Always (with high enough load). Steps to Reproduce: 1. Create cluster mirror, put FS on it. 2. Fail primary leg of the mirror 3. Additional info: Patch to kernel is a one line fix to remove an unnecessary message.
Created attachment 146217 [details] Patch to remove print statement
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
committed in stream U5 build 45. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html