Bug 223865 - Unneeded message sprays to console causing nodes to leave cluster
Summary: Unneeded message sprays to console causing nodes to leave cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jonathan Earl Brassow
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-01-22 20:05 UTC by Jonathan Earl Brassow
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-08 04:44:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to remove print statement (645 bytes, patch)
2007-01-22 20:06 UTC, Jonathan Earl Brassow
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0304 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 5 2007-04-28 18:58:50 UTC

Description Jonathan Earl Brassow 2007-01-22 20:05:24 UTC
Description of problem:
I have a cluster running cluster mirroring.  If I raise the I/O high enough and
fail the primary side of the mirror, it generates too many messages for the
machine to complete other critical tasks (like heartbeating for the cluster).

The messages printed are:
device-mapper: incrementing error_count on 253:3
...
This message is found in drivers/md/dm-raid1.c:fail_mirror().  We already get
messages from the device subsystem (e.g. scsi0 (0:0): rejecting I/O to offline
device); and the above is really unnecessary.  (RHEL 5 has already pulled this
message out.)

The system becomes so busy processing this useless message that cluster members
start to be removed.  Once this happens, CLVM commands can not continue;
resulting in a hung recovery process.  The mirror never gets recovered, and LVM
commands stop.

Version-Release number of selected component (if applicable):
kernel-2.6.9-42.EL

How reproducible:
Always (with high enough load).

Steps to Reproduce:
1. Create cluster mirror, put FS on it.
2. Fail primary leg of the mirror
3.

Additional info:
Patch to kernel is a one line fix to remove an unnecessary message.

Comment 1 Jonathan Earl Brassow 2007-01-22 20:06:49 UTC
Created attachment 146217 [details]
Patch to remove print statement

Comment 4 RHEL Program Management 2007-01-30 17:05:26 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Jason Baron 2007-02-01 19:35:49 UTC
committed in stream U5 build 45. A test kernel with this patch is available from
http://people.redhat.com/~jbaron/rhel4/


Comment 9 Red Hat Bugzilla 2007-05-08 04:44:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html


Note You need to log in before you can comment on or make changes to this bug.