Bug 1399844
| Summary: | LVM RAID: Unable to refresh transiently failed device | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jonathan Earl Brassow <jbrassow> | ||||||
| Component: | lvm2 | Assignee: | Heinz Mauelshagen <heinzm> | ||||||
| lvm2 sub component: | Mirroring and RAID | QA Contact: | cluster-qe <cluster-qe> | ||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||
| Severity: | unspecified | ||||||||
| Priority: | unspecified | CC: | agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac | ||||||
| Version: | 7.2 | ||||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | lvm2-2.02.169-1.el7 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-08-01 21:49:49 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1430028 | ||||||||
| Bug Blocks: | 1385242 | ||||||||
| Attachments: |
|
||||||||
|
Description
Jonathan Earl Brassow
2016-11-29 21:36:38 UTC
Jon, I had to run "lvchange --refresh $lv" twice after your pv offline, vgchange -an, vgchange -ay, pv online scenario to make it happen on recent 7.3. Does the same apply to 7.2? Works after two "lvchange --refresh ..." runs (why 2 needs
further clarification) on RHEL 7.2 without lvmetad (with lvmetad
"pvscan --cache /dev/sdb" is necessary to update metatad cache),
lvm2 2.02.130(2)-RHEL7, kernel 3.10.0-327.el7.x86_64 with
dm-raid target 1.0.7:
[root@vm102 ~]# lvcreate -m1 --ty raid1 -L256 -nr --nosync ssd
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
WARNING: New raid1 won't be synchronised. Don't read what you didn't write!
WARNING: ext4 signature detected on /dev/ssd/r at offset 1080. Wipe it? [y/n]: y
Wiping ext4 signature on /dev/ssd/r.
Logical volume "r" created.
[root@vm102 ~]# lvs -a -o name,attr,size,segtype,syncpercent,devices ssd
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
LV Attr LSize Type Cpy%Sync Devices
r Rwi-a-r--- 256.00m raid1 100.00 r_rimage_0(0),r_rimage_1(0)
[r_rimage_0] iwi-aor--- 256.00m linear /dev/sda(1)
[r_rimage_1] iwi-aor--- 256.00m linear /dev/sdb(1)
[r_rmeta_0] ewi-aor--- 4.00m linear /dev/sda(0)
[r_rmeta_1] ewi-aor--- 4.00m linear /dev/sdb(0)
[root@vm102 ~]# mkfs -t ext4 /dev/ssd/r
mke2fs 1.42.9 (28-Dec-2013)
Discarding device blocks: done
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
65536 inodes, 262144 blocks
13107 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=33816576
32 block groups
8192 blocks per group, 8192 fragments per group
2048 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729, 204801, 221185
Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
[root@vm102 ~]# echo offline > /sys/block/sdb/device/state
[root@vm102 ~]# lvs -a -o name,attr,size,segtype,syncpercent,devices ssd
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Couldn't find device with uuid KC4elc-LlBl-jCVC-eBF9-lrZc-LWSd-veEOGW.
LV Attr LSize Type Cpy%Sync Devices
r Rwi-a-r-p- 256.00m raid1 100.00 r_rimage_0(0),r_rimage_1(0)
[r_rimage_0] iwi-aor--- 256.00m linear /dev/sda(1)
[r_rimage_1] iwi-aor-p- 256.00m linear unknown device(1)
[r_rmeta_0] ewi-aor--- 4.00m linear /dev/sda(0)
[r_rmeta_1] ewi-aor-p- 4.00m linear unknown device(0)
[root@vm102 ~]# fsck -fn /dev/ssd/r
fsck from util-linux 2.23.2
e2fsck 1.42.9 (28-Dec-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/ssd-r: 11/65536 files (0.0% non-contiguous), 18535/262144 blocks
[root@vm102 ~]# dmsetup status ssd-r
0 524288 raid raid1 2 AD 524288/524288 idle 0
[root@vm102 ~]# dmsetup table ssd-r
0 524288 raid raid1 4 0 nosync region_size 1024 2 253:2 253:3 253:4 253:5
[root@vm102 ~]# vgchange -an ssd
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Couldn't find device with uuid KC4elc-LlBl-jCVC-eBF9-lrZc-LWSd-veEOGW.
0 logical volume(s) in volume group "ssd" now active
[root@vm102 ~]# vgchange -ay ssd
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Couldn't find device with uuid KC4elc-LlBl-jCVC-eBF9-lrZc-LWSd-veEOGW.
1 logical volume(s) in volume group "ssd" now active
[root@vm102 ~]# dmsetup status ssd-r
0 524288 raid raid1 2 AA 524288/524288 idle 0
[root@vm102 ~]# dmsetup table ssd-r
0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 - -
[root@vm102 ~]# fsck -fn /dev/ssd/r
fsck from util-linux 2.23.2
e2fsck 1.42.9 (28-Dec-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/ssd-r: 11/65536 files (0.0% non-contiguous), 18535/262144 blocks
[root@vm102 ~]# lvchange --refresh ssd/r
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
Couldn't find device with uuid KC4elc-LlBl-jCVC-eBF9-lrZc-LWSd-veEOGW.
Refusing refresh of partial LV ssd/r. Use '--activationmode partial' to override.
[root@vm102 ~]# for d in /dev/sd*;do echo running > /sys/block/`basename $d`/device/state;done
[root@vm102 ~]# lvchange --refresh ssd/r
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
[root@vm102 ~]# dmsetup table ssd-r
0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 - -
[root@vm102 ~]# lvchange --refresh ssd/r
WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
[root@vm102 ~]# dmsetup table ssd-r
0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 253:5 253:7
So the workaround is 2 "lvchange --refresh ..." runs after the PV with the
components turned accessible again as mentioned before.
Analyzing "lvchange -vvvv --refresh ..." shows the linear tables for *_r{meta|rimage]_1" are being reloaded together with the raid1 table
"0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 253:5 253:7"
are being processed in the _first_ refresh, still the table output is
"0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 - -"
until after the _second_ refresh run.
Excerpt from "lvchange -vvvv --refresh ..." of first run:
#libdm-deptree.c:2732 Loading ssd-r table (253:8)
#libdm-deptree.c:2676 Adding target to (253:8): 0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 253:5 253:7
#ioctl/libdm-iface.c:1832 dm table (253:8) OF [16384] (*1)
#ioctl/libdm-iface.c:1832 dm reload (253:8) NF [16384] (*1)
Jon,
can you second this behaviour?
Created attachment 1226348 [details]
First "lvchange -vvvv --refresh ... " run output
Created attachment 1226350 [details]
Second "lvchange -vvvv --refresh ..." run output
Transient device failures aren't supported by lvm yet.
In the first refresh run, the "*_r{meta|image}_1" SubLVs still contain mappings to error targets (the *_missing_* devices), thus causing the dm-raid constructor to fail on reading any metadata. This is because the linear mappings are loaded into the inactive slots of the respective mapped devices but are being resumed _after_ the load of the raid target (they'd need to be resumed prior to the raid1 target load).
In the second refresh run, the linear mappings are active so the raid constructor succeeds reading the RAID superblock.
Discussed the implications seen in the traces with Zdenek who has approaches how to better this situation generically (I let him describe what he has in mind).
So the workaround is the double "lvchange --refresh ..." until we have the enhancements to transient device failures solved generically.
Upstream commit 0b8bf73a63d8 avoids the need to run two "lvchange --refresh ..." until we have enhanced lvm too handle transient failures better. The commit from Comment 7 is only temporary hack - which doesn't work with locking mechanism we do currently use - as only TOP level has a lock and any attempt to suspend & resume remotely active subLVs will do nothing. The trouble we have here is - our locking/activation code is 'free' to change and replace missing segments of a LV without having these change stored in lvm2 metadata. So whenever next command is running - it has no idea table content is different from a lvm2 metadata state - ATM there is no such deep revalidation of a device being done. But this is still not such a big issue when we consider lvm2 has no plan yet to support transient device failures. State of now: When lvm2 detects missing PV - such a PV should be marked as MISSING_PV and its 'reattachment' back to VG happen only when user requests: 'vgextend --restoremissing' So we really cannot claim we do support 'transient' disk failures - there needs to be some plan for it. We may need a PV dedication as raid-leg for this even, then however we are not much different from direct 'mdadm' usage. (In reply to Zdenek Kabelac from comment #8) > The commit from Comment 7 is only temporary hack - which doesn't work with > locking mechanism we do currently use - as only TOP level has a lock and any > attempt to suspend & resume remotely active subLVs will do nothing. The patch is aiming at convenience unitl we come up with a concept to dea with sane preload/resume sequencing. The cluster case is rather rare where user requests refresh on one node when the RaidLV os exclusively activated on another. <SNIP> Posted upstream commit 87117c2b2546 in addition to 0b8bf73a63d8 to cope with remotely active, clustered RaidLVs. Upstream commit 95d68f1d0e16 (and kernel patch "[dm-devel][PATCH] dm raid: fix transient device failure processing"). Marking verified with the latest rpms. 3.10.0-688.el7.x86_64 lvm2-2.02.171-7.el7 BUILT: Thu Jun 22 08:35:15 CDT 2017 lvm2-libs-2.02.171-7.el7 BUILT: Thu Jun 22 08:35:15 CDT 2017 lvm2-cluster-2.02.171-7.el7 BUILT: Thu Jun 22 08:35:15 CDT 2017 device-mapper-1.02.140-7.el7 BUILT: Thu Jun 22 08:35:15 CDT 2017 device-mapper-libs-1.02.140-7.el7 BUILT: Thu Jun 22 08:35:15 CDT 2017 device-mapper-event-1.02.140-7.el7 BUILT: Thu Jun 22 08:35:15 CDT 2017 device-mapper-event-libs-1.02.140-7.el7 BUILT: Thu Jun 22 08:35:15 CDT 2017 device-mapper-persistent-data-0.7.0-0.1.rc6.el7 BUILT: Mon Mar 27 10:15:46 CDT 2017 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2222 |