Bug 199622
Summary: | failed mirror log causes mirror I/O errors | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Corey Marthaler <cmarthal> | ||||
Component: | kernel | Assignee: | Jonathan Earl Brassow <jbrassow> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 4.0 | CC: | agk, dwysocha, jbaron, jbrassow, k.georgiou, mbroz, rkenna | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHBA-2007-0304 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-05-08 02:52:40 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 218946 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Corey Marthaler
2006-07-20 19:46:20 UTC
This is a known issue; and unfortunately, that's the way it is suppose to work. The log entry _must_ be written before a write can happen. If the log can not be written, the write has three options: 1) delay until mirror is reconfigured - this would be the best, but requires core device-mapper changes to allow requeuing of I/O 2) continue before mirror is reconfigured - this has the potential of causing pain. If the write continues and the machine doing the write dies before both mirrors are written _and_ the log device comes back to life on the reboot, there *might* be corruption. 3) return EIO Because #1 is not an option, and #3 is safer than #2, we default to #3. HOWEVER, since #2 is such a contrived case, we offer the ability to use #2. You need only specify 'dm_mirror_error_on_log_failure = 0' as a module parameter when loading dm-mirror. Then you will not see the EIOs. Defect fix required in the base kernel device mapper module. Changing product to RHEL and kernel component. Also will set the 4.5 appropriate flags for inclusion in the next update release. Defect fix required in the base kernel device mapper module. Changing product to RHEL and kernel component. Also will set the 4.5 appropriate flags for inclusion in the next update release. This cannot be fixed until the ability to resubmit bios to the device-mapper core exists. Agk has added this to rhel5, but I don't see it in rhel4. Until this happens, people will have to make due with the workaround in comment #1 Moving this out to 4.6. We will need to figure out how to address this in a rhel4 environment as well as rhel5. Okay, there is a fix already for this in rhel5, moving this back onto the 4.5 list due and making it a blocker. We need to do this for rhel4.5. So, reverting all of my previous changes. IRC discussion regarding fix for rhel4.5 <visegrips_> agk_: do we plan on putting the requeue I/O patch into RHEL4.5? If not, why not? <agk_> so you can see what operation actually happened to trigger the failure - then we need to find where that happens in the code <agk_> and check the locking there <agk_> the requeue thing: it's up to mbroz if he has time after doing the other patches still to go to 4.5 <visegrips_> agk_: k. I was just having a conversation about it with kanderso on #cluster <agk_> the interface change was backwardly compatible <agk_> so I can think of no reason not to do it except for lack of time <visegrips_> agk_: there is a mirror bug that hinges on it <visegrips_> (a pretty important one) <agk_> then create a bug for tracking getting those patches in and make it a 4.5 blocker? <kanderso> okay - am putting it back on the blocker list - it was already there prior to our discussion <agk_> it seems the multipath tools change to make use of it was only one new line of code, so that's easy to do too Reassigning to mbroz because the fix depends on the requeue patch for core device-mapper (which is already done in RHEL5). When that part of the fix is submitted, I'll take the bug back and push the changes to dm-raid1.c Created attachment 142436 [details]
Patch to fix problem - pending dm-suspend-add-noflush-pushback.patch
Patches preapared and sent for review to Jon, to the rhekrnel will be sent as soon s possible in patchset resolving bug 218946 (making this bug depends on it). Reassigning back to jbrassow for dm raid1 patch. Yes, I have a patch that seems to work. I am continuing to test it and have sent it to mbroz and agk for review before posting. I plan to post on 12/13/2006 unless more problems are uncovered. committed in stream U5 build 42.39. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ fix verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html |