|Summary:||kernel dm-multipath: Improve error logging|
|Product:||Red Hat Enterprise Linux 4||Reporter:||Jonathan Earl Brassow <jbrassow>|
|Component:||device-mapper-multipath||Assignee:||Tom Coughlan <coughlan>|
|Status:||CLOSED DEFERRED||QA Contact:|
|Version:||4.0||CC:||agk, christophe.varoqui, coughlan, dmo, dwysocha, egoggin, lmb, mbroz, rkenna, tranlan|
|Target Milestone:||---||Keywords:||FutureFeature, Reopened|
|Fixed In Version:||Doc Type:||Enhancement|
|Doc Text:||Story Points:||---|
|Last Closed:||2008-02-12 16:05:16 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Bug Depends On:||155428|
Description Jonathan Earl Brassow 2005-10-21 16:02:46 UTC
+++ This bug was initially created as a clone of Bug #155428 +++ Device Mapper has a slight communication problem if accidentially misconfigured, or if something goes wrong. It tends to silently throw an error upwards, or even report gems such as "Unknown error". The attached patch is a first step towards actually logging what went wrong and providing more information in the logs. -- Additional comment from firstname.lastname@example.org on 2005-04-20 05:09 EST -- Created an attachment (id=113397) Suggested patch to improve the logging situation -- Additional comment from email@example.com on 2005-04-21 06:44 EST -- Created an attachment (id=113462) Updated patch Do not try to print pgpath->path.dev->name when pgpath == NULL. Smart idea, eh? ;) -- Additional comment from firstname.lastname@example.org on 2005-04-21 15:07 EST -- I'll add this to 2.6.12-rc2-udm1 for now, but we need to tidy it more before sending it upstream. e.g. DMWARN("dm-emc: emc_endio: pg_init error %d", error); DMWARN("dm-emc: emc_endio: Found valid sense data %06x", sense); DMWARN("dm-emc: emc_endio: Array Based Copy in progress"); could fit into a single line: maybe "dm-emc: emc_endio: pg_init error %d (sense %06x): Array-based copy in progress" -- Additional comment from email@example.com on 2005-04-21 15:44 EST -- Good point. Yes, this needs more cleaning up and in particular also: a) Rate-limitting; right now it'll trigger once for every bio, even though they are part of the same SCSI request; if they could be joined well, that could be quite substantial amounts of logging and quite flood the console. It'd be interesting if we could figure out a way to only print it _once_ for every request (ie, once for every real error). (w/o keeping a complete history, we could try and only print it if this bio belonged to a different request than the last bio we handled; that'll still cause some excessive logging, but only if end_io is interleaved, which will be much better already.) Question is how to do figure out which request a bio belongs to. Another alternative might be to only print if it's a new error on the same path or if the last error on that path has been reported NNN jiffies back. Comments solicited, maybe we want to discuss on the list too. b) Identify more "interesting" places where to log from a support perspective: What information will we need to track down problems in the field?
Comment 4 Dave Wysochanski 2006-12-14 16:56:54 UTC
I do not think this patch ever got upstream.
Comment 5 Alasdair Kergon 2006-12-14 17:05:04 UTC
Indeed - the patch is unfinished: it added *too many* messages risking certain happenings drowning the logs with messages.
Comment 9 Tom Coughlan 2008-02-07 18:31:30 UTC
The upstream bug from which this one was cloned is still open. There has been no progress there since 4/2005. This is indicative of the fact that, although improvements in kernel error messages are needed, the problem is not causing significant difficulty. I checked with Ben, and he agrees that the lack of better kernel messages has not been a significant impediment to solving problems. It is clearly getting late in RHEL 4 to address this issue. So, I will close this, and expect the problem will be addressed eventually upstream and inherited/backported into the appropriate RHEL release at that time.