Bug 596318
Summary: | Basic cmirror device failure (with I/O running) is broken | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> | ||||||||||
Component: | lvm2 | Assignee: | Jonathan Earl Brassow <jbrassow> | ||||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Corey Marthaler <cmarthal> | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | high | ||||||||||||
Version: | 6.0 | CC: | agk, antillon.maurizio, dwysocha, heinzm, jbrassow, joe.thornber, mbroz, msnitzer, pkrul, prajnoha, prockai | ||||||||||
Target Milestone: | rc | Keywords: | Regression, TestBlocker | ||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | lvm2-2.02.72-6.el6 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2010-11-10 21:07:58 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 599016 | ||||||||||||
Attachments: |
|
Description
Corey Marthaler
2010-05-26 15:28:01 UTC
I'll attach the full logs, but here's the bit about the repair: taft-01: May 26 15:15:38 taft-01 lvm[19586]: Couldn't find device with uuid OL4s9P-QZXm-Gezs-RbjI-9mUu-6mQy-EEYHT0. May 26 15:15:39 taft-01 lvm[19586]: Repair of mirrored LV helter_skelter/syncd_secondary_core_2legs_1 finished successfully. taft-02: May 26 10:14:05 taft-02 lvm[17402]: Error locking on node taft-04: LV helter_skelter/syncd_secondary_core_2legs_1_mimage_1 in use: not deactivating May 26 10:14:05 taft-02 lvm[17402]: Repair of mirrored LV helter_skelter/syncd_secondary_core_2legs_1 failed. May 26 10:14:05 taft-02 lvm[17402]: Failed to remove faulty devices in helter_skelter-syncd_secondary_core_2legs_1. May 26 10:14:07 taft-02 lvm[17402]: No longer monitoring mirror device helter_skelter-syncd_secondary_core_2legs_1 for events. taft-03: taft-04: Created attachment 416882 [details]
log from taft-01
Created attachment 416883 [details]
log from taft-02
Created attachment 416884 [details]
log from taft-03
Created attachment 416885 [details]
log from taft-04
This appears to be a cluster mirror issue only. Local machine mirrors "work", there are other issues however like bug 596367, but the basic functionality is there. corey, please try again without udev running - we think udev is getting in the way. Once we know whose fault this is, we can proceed to fix. cmirror creation doesn't appear to work without udev running, so I'm not sure how to tell if udev is the problem here. I tried this same simple fault injection case with the latest patched built and saw the exact same results, both without killing udev before the failure, and with killing udev before the failure. Not sure where to go from here... 2.6.32-25.el6.x86_64 lvm2-2.02.67-1.6.el6 BUILT: Thu Jun 17 10:54:32 CDT 2010 lvm2-libs-2.02.67-1.6.el6 BUILT: Thu Jun 17 10:54:32 CDT 2010 lvm2-cluster-2.02.67-1.6.el6 BUILT: Thu Jun 17 10:54:32 CDT 2010 device-mapper-1.02.49-1.6.el6 BUILT: Thu Jun 17 10:54:32 CDT 2010 device-mapper-libs-1.02.49-1.6.el6 BUILT: Thu Jun 17 10:54:32 CDT 2010 device-mapper-event-1.02.49-1.6.el6 BUILT: Thu Jun 17 10:54:32 CDT 2010 device-mapper-event-libs-1.02.49-1.6.el6 BUILT: Thu Jun 17 10:54:32 CDT 2010 cmirror-2.02.67-1.6.el6 BUILT: Thu Jun 17 10:54:32 CDT 2010 FYI - if I run this testcase w/o any I/O load (the only I/O being a dd in order to force the repair) then cmirror device failure works. I *think* this is the same problem as bug 596453 and friends (from looking at the logs, although to confirm this I would need to have more of the logs). Jon, if you disagree please flip this back to ASSIGNED. There is now a basic level of device failure functionality wrt cmirrors in the latest build. Other less basic device failure bugs still exist however. Marking this bug verified. 2.6.32-59.1.el6.x86_64 lvm2-2.02.72-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 lvm2-libs-2.02.72-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 lvm2-cluster-2.02.72-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 udev-147-2.22.el6 BUILT: Fri Jul 23 07:21:33 CDT 2010 device-mapper-1.02.53-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 device-mapper-libs-1.02.53-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 device-mapper-event-1.02.53-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 device-mapper-event-libs-1.02.53-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 cmirror-2.02.72-7.el6 BUILT: Wed Aug 11 17:12:24 CDT 2010 Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |