Bug 1380532
Summary: | certain repair of cache raid volumes fails "Cannot convert internal LV " | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Corey Marthaler <cmarthal> | ||||
Component: | lvm2 | Assignee: | Heinz Mauelshagen <heinzm> | ||||
lvm2 sub component: | Cache Logical Volumes | QA Contact: | cluster-qe <cluster-qe> | ||||
Status: | CLOSED ERRATA | Docs Contact: | Milan Navratil <mnavrati> | ||||
Severity: | medium | ||||||
Priority: | high | CC: | agk, heinzm, jbrassow, lmanasko, msnitzer, mthacker, pasik, prajnoha, rbednar, salmy, tlavigne, yizhan, zkabelac | ||||
Version: | 7.3 | Keywords: | Regression, ZStream | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | lvm2-2.02.169-1.el7 | Doc Type: | Bug Fix | ||||
Doc Text: |
"lvconvert --repair" now works properly on cache logical volumes
Due to a regression in the lvm2-2.02.166-1.el package, released in Red Hat Enterprise Linux 7.3, the "lvconvert --repair" command could not be run properly on cache logical volumes. As a consequence, the `Cannot convert internal LV` error occurred. The underlying source code has been modified to fix this bug, and "lvconvert --repair" now works as expected.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1383925 (view as bug list) | Environment: | |||||
Last Closed: | 2017-08-01 21:47:18 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1383925 | ||||||
Attachments: |
|
Description
Corey Marthaler
2016-09-29 21:57:43 UTC
This appear to only affect cache raid repair when the fault policy is "allocate", warn with a manual repair attempt appears to succeed. Reproduced with raid_fault_policy="allocate". "lvconvert --repair black_bird/synced_primary_raid6_4legs_1_corig" fails here with "Cannot convert internal LV ...". Is this a regression at all? 1. verified that a manual repair is never attempted due to the restriction that cache and pool raids need to be inactive during the passing 'warn' cases, so disregard comment #1. 2. verified that this same test case passed in 7.2.z as well as in a previous version of rhel7.3: 3.10.0-510.el7.x86_64 lvm2-2.02.161-3.el7 BUILT: Thu Jul 28 09:31:24 CDT 2016 lvm2-libs-2.02.161-3.el7 BUILT: Thu Jul 28 09:31:24 CDT 2016 lvm2-cluster-2.02.161-3.el7 BUILT: Thu Jul 28 09:31:24 CDT 2016 device-mapper-1.02.131-3.el7 BUILT: Thu Jul 28 09:31:24 CDT 2016 device-mapper-libs-1.02.131-3.el7 BUILT: Thu Jul 28 09:31:24 CDT 2016 device-mapper-event-1.02.131-3.el7 BUILT: Thu Jul 28 09:31:24 CDT 2016 device-mapper-event-libs-1.02.131-3.el7 BUILT: Thu Jul 28 09:31:24 CDT 2016 device-mapper-persistent-data-0.6.3-1.el7 BUILT: Fri Jul 22 05:29:13 CDT 2016 Oct 4 11:17:01 host-116 lvm[3815]: Device #0 of raid1 array, black_bird-synced_primary_raid1_2legs_1_cdata, has failed. Oct 4 11:17:01 host-116 lvm[3815]: WARNING: Device for PV dlewYC-gbDe-njA2-GeRl-jnev-zJ0I-p92C5V not found or rejected by a filter. Oct 4 11:17:01 host-116 lvm[3815]: WARNING: Couldn't find all devices for LV black_bird/synced_primary_raid1_2legs_1_cdata_rimage_0 while checking used and assumed devices. Oct 4 11:17:01 host-116 lvm[3815]: WARNING: Couldn't find all devices for LV black_bird/synced_primary_raid1_2legs_1_cdata_rmeta_0 while checking used and assumed devices. Oct 4 11:17:01 host-116 lvm[3815]: WARNING: Device for PV dlewYC-gbDe-njA2-GeRl-jnev-zJ0I-p92C5V already missing, skipping. Oct 4 11:17:01 host-116 lvm[3815]: WARNING: Device for PV dlewYC-gbDe-njA2-GeRl-jnev-zJ0I-p92C5V not found or rejected by a filter. Oct 4 11:17:01 host-116 lvm[3815]: WARNING: Couldn't find all devices for LV black_bird/synced_primary_raid1_2legs_1_cdata_rimage_0 while checking used and assumed devices. Oct 4 11:17:01 host-116 lvm[3815]: WARNING: Couldn't find all devices for LV black_bird/synced_primary_raid1_2legs_1_cdata_rmeta_0 while checking used and assumed devices. Oct 4 11:17:02 host-116 kernel: device-mapper: raid: Device 0 specified for rebuild; clearing superblock Oct 4 11:17:02 host-116 kernel: md/raid1:mdX: active with 1 out of 2 mirrors Oct 4 11:17:02 host-116 kernel: created bitmap (1 pages) for device mdX Oct 4 11:17:03 host-116 kernel: mdX: bitmap initialized from disk: read 1 pages, set 0 of 1000 bits Oct 4 11:17:03 host-116 kernel: md: recovery of RAID array mdX Oct 4 11:17:03 host-116 kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Oct 4 11:17:03 host-116 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Oct 4 11:17:03 host-116 kernel: md: using 128k window, over a total of 512000k. Oct 4 11:17:03 host-116 multipathd: dm-2: remove map (uevent) Oct 4 11:17:03 host-116 multipathd: dm-2: devmap not registered, can't remove Oct 4 11:17:03 host-116 multipathd: dm-3: remove map (uevent) Oct 4 11:17:03 host-116 multipathd: dm-3: devmap not registered, can't remove Oct 4 11:17:03 host-116 multipathd: dm-13: remove map (uevent) Oct 4 11:17:03 host-116 multipathd: dm-13: devmap not registered, can't remove Oct 4 11:17:03 host-116 systemd: Started qarsh Per-Connection Server (10.15.80.224:51342). Oct 4 11:17:03 host-116 systemd: Starting qarsh Per-Connection Server (10.15.80.224:51342)... Oct 4 11:17:03 host-116 multipathd: dm-12: remove map (uevent) Oct 4 11:17:03 host-116 multipathd: dm-12: devmap not registered, can't remove Oct 4 11:17:03 host-116 multipathd: dm-2: remove map (uevent) Oct 4 11:17:03 host-116 multipathd: dm-3: remove map (uevent) Oct 4 11:17:03 host-116 qarshd[6698]: Talking to peer ::ffff:10.15.80.224:51342 (IPv6) Oct 4 11:17:03 host-116 multipathd: dm-12: remove map (uevent) Oct 4 11:17:03 host-116 multipathd: dm-13: remove map (uevent) Oct 4 11:17:03 host-116 qarshd[6698]: Running cmdline: pvs -a Oct 4 11:17:03 host-116 kernel: md/raid1:mdX: active with 1 out of 2 mirrors Oct 4 11:17:03 host-116 kernel: created bitmap (1 pages) for device mdX Oct 4 11:17:04 host-116 kernel: md: mdX: recovery interrupted. Oct 4 11:17:04 host-116 kernel: mdX: bitmap initialized from disk: read 1 pages, set 0 of 1000 bits Oct 4 11:17:04 host-116 kernel: md: recovery of RAID array mdX Oct 4 11:17:04 host-116 kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Oct 4 11:17:04 host-116 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Oct 4 11:17:04 host-116 kernel: md: using 128k window, over a total of 512000k. Oct 4 11:17:04 host-116 kernel: md: resuming recovery of mdX from checkpoint. Oct 4 11:17:04 host-116 lvm[3815]: Faulty devices in black_bird/synced_primary_raid1_2legs_1_cdata successfully replaced. Oct 4 11:17:04 host-116 systemd: Started qarsh Per-Connection Server (10.15.80.224:51344). Oct 4 11:17:04 host-116 systemd: Starting qarsh Per-Connection Server (10.15.80.224:51344)... Oct 4 11:17:04 host-116 qarshd[6751]: Talking to peer ::ffff:10.15.80.224:51344 (IPv6) Oct 4 11:17:04 host-116 qarshd[6751]: Running cmdline: tail -n1 /var/log/messages | cut -c 1-12 Oct 4 11:17:04 host-116 systemd: Started qarsh Per-Connection Server (10.15.80.224:51346). Oct 4 11:17:04 host-116 systemd: Starting qarsh Per-Connection Server (10.15.80.224:51346)... Oct 4 11:17:04 host-116 qarshd[6757]: Talking to peer ::ffff:10.15.80.224:51346 (IPv6) Oct 4 11:17:05 host-116 qarshd[6757]: Running cmdline: dd if=/dev/zero of=/mnt/synced_primary_raid1_2legs_1/ddfile count=10 bs=4M Oct 4 11:17:05 host-116 systemd: Started qarsh Per-Connection Server (10.15.80.224:51348). Oct 4 11:17:05 host-116 systemd: Starting qarsh Per-Connection Server (10.15.80.224:51348)... Oct 4 11:17:05 host-116 qarshd[6761]: Talking to peer ::ffff:10.15.80.224:51348 (IPv6) Oct 4 11:17:05 host-116 qarshd[6761]: Running cmdline: sync Oct 4 11:17:08 host-116 kernel: md: mdX: recovery done. Oct 4 11:17:08 host-116 lvm[3815]: Device #0 of raid1 array, black_bird-synced_primary_raid1_2legs_1_cdata, has failed. Oct 4 11:17:08 host-116 lvm[3815]: WARNING: Device for PV dlewYC-gbDe-njA2-GeRl-jnev-zJ0I-p92C5V not found or rejected by a filter. Oct 4 11:17:08 host-116 lvm[3815]: WARNING: Device for PV dlewYC-gbDe-njA2-GeRl-jnev-zJ0I-p92C5V not found or rejected by a filter. Oct 4 11:17:08 host-116 lvm[3815]: Faulty devices in black_bird/synced_primary_raid1_2legs_1_cdata successfully replaced. There is a work-around for this issue (I'll elaborate on that in next comment). It should not cause hangs/corruptions - only inconvenience. As such, I am pushing for 7.4 fix with 7.3.z inclusion. Got one-liner fix to lvconvert.c allowing for the rebuild of corig internal LV in testing... Fix pushed upstream: allow repair on cache orig RAID LVs and "lvconvert --replace/--mirrors/--type {raid*|mirror|striped|linear}" as well. Allow the same lvconvert actions on any cache pool and metadata RAID SubLVs. Created attachment 1272293 [details]
test result
Thank you for explanation, I was not aware of this behaviour.
Marking verified.
3.10.0-640.el7.x86_64
lvm2-2.02.169-3.el7 BUILT: Wed Mar 29 16:17:46 CEST 2017
lvm2-libs-2.02.169-3.el7 BUILT: Wed Mar 29 16:17:46 CEST 2017
lvm2-cluster-2.02.169-3.el7 BUILT: Wed Mar 29 16:17:46 CEST 2017
device-mapper-1.02.138-3.el7 BUILT: Wed Mar 29 16:17:46 CEST 2017
device-mapper-libs-1.02.138-3.el7 BUILT: Wed Mar 29 16:17:46 CEST 2017
device-mapper-event-1.02.138-3.el7 BUILT: Wed Mar 29 16:17:46 CEST 2017
device-mapper-event-libs-1.02.138-3.el7 BUILT: Wed Mar 29 16:17:46 CEST 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7 BUILT: Mon Mar 27 17:15:46 CEST 2017
cmirror-2.02.169-3.el7 BUILT: Wed Mar 29 16:17:46 CEST 2017
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2222 |