Bug 1399844

Summary:

LVM RAID: Unable to refresh transiently failed device

Product:

Red Hat Enterprise Linux 7

Reporter:

Jonathan Earl Brassow <jbrassow>

Component:

lvm2

Assignee:

Heinz Mauelshagen <heinzm>

lvm2 sub component:

Mirroring and RAID

QA Contact:

cluster-qe <cluster-qe>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

unspecified

Priority:

unspecified

CC:

agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac

Version:

7.2

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

lvm2-2.02.169-1.el7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-08-01 21:49:49 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1430028

Bug Blocks:

1385242

Attachments:

Description	Flags
First "lvchange -vvvv --refresh ... " run output	none
Second "lvchange -vvvv --refresh ..." run output	none

Description Jonathan Earl Brassow 2016-11-29 21:36:38 UTC

'lvchange --refresh' seem incapable of bringing a transiently failed device back in... see below:

[root@bp-01 ~]# lvs -a -o name,attr,size,segtype,syncpercent,devices vg
  LV               Attr       LSize   Type   Cpy%Sync Devices
  raid1            rwi-a-r--- 500.00m raid1  100.00   raid1_rimage_0(0),raid1_rimage_1(0)
  [raid1_rimage_0] iwi-aor--- 500.00m linear          /dev/sdb1(1)
  [raid1_rimage_1] iwi-aor--- 500.00m linear          /dev/sdc1(1)
  [raid1_rmeta_0]  ewi-aor---   4.00m linear          /dev/sdb1(0)
  [raid1_rmeta_1]  ewi-aor---   4.00m linear          /dev/sdc1(0)
[root@bp-01 ~]# off.sh sdb
Turning off sdb
[root@bp-01 ~]# dd if=/dev/zero of=/dev/vg/raid1 bs=4M count=1
1+0 records in
1+0 records out
4194304 bytes (4.2 MB) copied, 0.11769 s, 35.6 MB/s
[root@bp-01 ~]# lvs -a -o name,attr,size,segtype,syncpercent,devices vg
  WARNING: Device for PV dmVM0n-K1JI-wJ71-7Jto-o8r3-5IK4-QlsDke not found or rejected by a filter.
  WARNING: Couldn't find all devices for LV vg/raid1_rimage_0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/raid1_rmeta_0 while checking used and assumed devices.
  LV               Attr       LSize   Type   Cpy%Sync Devices
  raid1            rwi-a-r-p- 500.00m raid1  100.00   raid1_rimage_0(0),raid1_rimage_1(0)
  [raid1_rimage_0] iwi-aor-p- 500.00m linear          [unknown](1)
  [raid1_rimage_1] iwi-aor--- 500.00m linear          /dev/sdc1(1)
  [raid1_rmeta_0]  ewi-aor-p-   4.00m linear          [unknown](0)
  [raid1_rmeta_1]  ewi-aor---   4.00m linear          /dev/sdc1(0)
[root@bp-01 ~]# vgchange -an vg
  WARNING: Device for PV dmVM0n-K1JI-wJ71-7Jto-o8r3-5IK4-QlsDke not found or rejected by a filter.
  WARNING: Couldn't find all devices for LV vg/raid1_rimage_0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV vg/raid1_rmeta_0 while checking used and assumed devices.
  0 logical volume(s) in volume group "vg" now active
[root@bp-01 ~]# vgchange -ay vg
  WARNING: Device for PV dmVM0n-K1JI-wJ71-7Jto-o8r3-5IK4-QlsDke not found or rejected by a filter.
  1 logical volume(s) in volume group "vg" now active
[root@bp-01 ~]# lvchange -an vg/raid1
  WARNING: Device for PV dmVM0n-K1JI-wJ71-7Jto-o8r3-5IK4-QlsDke not found or rejected by a filter.
[root@bp-01 ~]# lvs -a -o name,attr,size,segtype,syncpercent,devices vg
  WARNING: Device for PV dmVM0n-K1JI-wJ71-7Jto-o8r3-5IK4-QlsDke not found or rejected by a filter.
  LV               Attr       LSize   Type   Cpy%Sync Devices
  raid1            rwi---r-p- 500.00m raid1           raid1_rimage_0(0),raid1_rimage_1(0)
  [raid1_rimage_0] Iwi---r-p- 500.00m linear          [unknown](1)
  [raid1_rimage_1] Iwi---r--- 500.00m linear          /dev/sdc1(1)
  [raid1_rmeta_0]  ewi---r-p-   4.00m linear          [unknown](0)
  [raid1_rmeta_1]  ewi---r---   4.00m linear          /dev/sdc1(0)
[root@bp-01 ~]# lvchange -ay vg/raid1
  WARNING: Device for PV dmVM0n-K1JI-wJ71-7Jto-o8r3-5IK4-QlsDke not found or rejected by a filter.
[root@bp-01 ~]# lvs -a -o name,attr,size,segtype,syncpercent,devices vg
  WARNING: Device for PV dmVM0n-K1JI-wJ71-7Jto-o8r3-5IK4-QlsDke not found or rejected by a filter.
  LV               Attr       LSize   Type   Cpy%Sync Devices
  raid1            rwi-a-r-p- 500.00m raid1  100.00   raid1_rimage_0(0),raid1_rimage_1(0)
  [raid1_rimage_0] iwi-a-r-p- 500.00m linear          [unknown](1)
  [raid1_rimage_1] iwi-aor--- 500.00m linear          /dev/sdc1(1)
  [raid1_rmeta_0]  ewi-a-r-p-   4.00m linear          [unknown](0)
  [raid1_rmeta_1]  ewi-aor---   4.00m linear          /dev/sdc1(0)
[root@bp-01 ~]# on.sh sdb
Turning on sdb
[root@bp-01 ~]# lvchange --refresh vg/raid1
  WARNING: Device for PV dmVM0n-K1JI-wJ71-7Jto-o8r3-5IK4-QlsDke not found or rejected by a filter.
  Refusing refresh of partial LV vg/raid1. Use '--activationmode partial' to override.
[root@bp-01 ~]# lvs -a -o name,attr,size,segtype,syncpercent,devices vg
  WARNING: Device for PV dmVM0n-K1JI-wJ71-7Jto-o8r3-5IK4-QlsDke not found or rejected by a filter.
  LV               Attr       LSize   Type   Cpy%Sync Devices
  raid1            rwi-a-r-p- 500.00m raid1  100.00   raid1_rimage_0(0),raid1_rimage_1(0)
  [raid1_rimage_0] iwi-a-r-p- 500.00m linear          [unknown](1)
  [raid1_rimage_1] iwi-aor--- 500.00m linear          /dev/sdc1(1)
  [raid1_rmeta_0]  ewi-a-r-p-   4.00m linear          [unknown](0)
  [raid1_rmeta_1]  ewi-aor---   4.00m linear          /dev/sdc1(0)
[root@bp-01 ~]# pvscan --cache
[root@bp-01 ~]# lvs -a -o name,attr,size,segtype,syncpercent,devices vg
  LV               Attr       LSize   Type   Cpy%Sync Devices
  raid1            rwi-a-r--- 500.00m raid1  100.00   raid1_rimage_0(0),raid1_rimage_1(0)
  [raid1_rimage_0] iwi-a-r--- 500.00m linear          /dev/sdb1(1)
  [raid1_rimage_1] iwi-aor--- 500.00m linear          /dev/sdc1(1)
  [raid1_rmeta_0]  ewi-a-r---   4.00m linear          /dev/sdb1(0)
  [raid1_rmeta_1]  ewi-aor---   4.00m linear          /dev/sdc1(0)
[root@bp-01 ~]# dmsetup status vg-raid1
0 1024000 raid raid1 2 A 1024000/1024000 idle 0 0
[root@bp-01 ~]# lvs -a -o name,attr,size,segtype,syncpercent,devices vg
  LV               Attr       LSize   Type   Cpy%Sync Devices
  raid1            rwi-a-r--- 500.00m raid1  100.00   raid1_rimage_0(0),raid1_rimage_1(0)
  [raid1_rimage_0] iwi-a-r--- 500.00m linear          /dev/sdb1(1)
  [raid1_rimage_1] iwi-aor--- 500.00m linear          /dev/sdc1(1)
  [raid1_rmeta_0]  ewi-a-r---   4.00m linear          /dev/sdb1(0)
  [raid1_rmeta_1]  ewi-aor---   4.00m linear          /dev/sdc1(0)
[root@bp-01 ~]# dmsetup table
vg-raid1_rmeta_0-missing_0_0: 0 8192 error
rhel_bp--01-home: 0 853286912 linear 8:2 16517120
rhel_bp--01-swap: 0 16515072 linear 8:2 2048
rhel_bp--01-root: 0 104857600 linear 8:2 869804032
vg-raid1_rmeta_1: 0 8192 linear 8:33 2048
vg-raid1_rmeta_0: 0 8192 linear 253:3 0
vg-raid1_rimage_1: 0 1024000 linear 8:33 10240
vg-raid1_rimage_0: 0 1024000 linear 253:5 0
vg-raid1_rimage_0-missing_0_0: 0 1024000 error
vg-raid1: 0 1024000 raid raid1 3 0 region_size 1024 2 - - 253:7 253:8

Comment 1 Heinz Mauelshagen 2016-11-29 23:16:09 UTC

Jon,

I had to run "lvchange --refresh $lv" twice after your pv offline, vgchange -an, vgchange -ay, pv online scenario to make it happen on recent 7.3.

Does the same apply to 7.2?

Comment 2 Heinz Mauelshagen 2016-11-30 13:10:32 UTC

Works after two "lvchange --refresh ..." runs (why 2 needs
further clarification) on RHEL 7.2 without lvmetad (with lvmetad
"pvscan --cache /dev/sdb" is necessary to update metatad cache),
lvm2 2.02.130(2)-RHEL7, kernel 3.10.0-327.el7.x86_64 with
dm-raid target 1.0.7:

[root@vm102 ~]# lvcreate -m1 --ty raid1 -L256 -nr --nosync ssd
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  WARNING: New raid1 won't be synchronised. Don't read what you didn't write!
WARNING: ext4 signature detected on /dev/ssd/r at offset 1080. Wipe it? [y/n]: y
  Wiping ext4 signature on /dev/ssd/r.
  Logical volume "r" created.
[root@vm102 ~]# lvs -a -o name,attr,size,segtype,syncpercent,devices ssd
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  LV           Attr       LSize   Type   Cpy%Sync Devices                    
  r            Rwi-a-r--- 256.00m raid1  100.00   r_rimage_0(0),r_rimage_1(0)
  [r_rimage_0] iwi-aor--- 256.00m linear          /dev/sda(1)                
  [r_rimage_1] iwi-aor--- 256.00m linear          /dev/sdb(1)                
  [r_rmeta_0]  ewi-aor---   4.00m linear          /dev/sda(0)                
  [r_rmeta_1]  ewi-aor---   4.00m linear          /dev/sdb(0)                
[root@vm102 ~]# mkfs -t ext4 /dev/ssd/r
mke2fs 1.42.9 (28-Dec-2013)
Discarding device blocks: done                            
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
Stride=0 blocks, Stripe width=0 blocks
65536 inodes, 262144 blocks
13107 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=33816576
32 block groups
8192 blocks per group, 8192 fragments per group
2048 inodes per group
Superblock backups stored on blocks: 
        8193, 24577, 40961, 57345, 73729, 204801, 221185

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done 

[root@vm102 ~]# echo offline > /sys/block/sdb/device/state
[root@vm102 ~]# lvs -a -o name,attr,size,segtype,syncpercent,devices ssd
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  Couldn't find device with uuid KC4elc-LlBl-jCVC-eBF9-lrZc-LWSd-veEOGW.
  LV           Attr       LSize   Type   Cpy%Sync Devices                    
  r            Rwi-a-r-p- 256.00m raid1  100.00   r_rimage_0(0),r_rimage_1(0)
  [r_rimage_0] iwi-aor--- 256.00m linear          /dev/sda(1)                
  [r_rimage_1] iwi-aor-p- 256.00m linear          unknown device(1)          
  [r_rmeta_0]  ewi-aor---   4.00m linear          /dev/sda(0)                
  [r_rmeta_1]  ewi-aor-p-   4.00m linear          unknown device(0)          
[root@vm102 ~]# fsck -fn /dev/ssd/r
fsck from util-linux 2.23.2
e2fsck 1.42.9 (28-Dec-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/ssd-r: 11/65536 files (0.0% non-contiguous), 18535/262144 blocks
[root@vm102 ~]# dmsetup  status ssd-r
0 524288 raid raid1 2 AD 524288/524288 idle 0
[root@vm102 ~]# dmsetup  table ssd-r
0 524288 raid raid1 4 0 nosync region_size 1024 2 253:2 253:3 253:4 253:5
[root@vm102 ~]# vgchange -an ssd
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  Couldn't find device with uuid KC4elc-LlBl-jCVC-eBF9-lrZc-LWSd-veEOGW.
  0 logical volume(s) in volume group "ssd" now active
[root@vm102 ~]# vgchange -ay ssd
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  Couldn't find device with uuid KC4elc-LlBl-jCVC-eBF9-lrZc-LWSd-veEOGW.
  1 logical volume(s) in volume group "ssd" now active
[root@vm102 ~]# dmsetup  status ssd-r
0 524288 raid raid1 2 AA 524288/524288 idle 0
[root@vm102 ~]# dmsetup  table ssd-r
0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 - -
[root@vm102 ~]# fsck -fn /dev/ssd/r
fsck from util-linux 2.23.2
e2fsck 1.42.9 (28-Dec-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/ssd-r: 11/65536 files (0.0% non-contiguous), 18535/262144 blocks
[root@vm102 ~]# lvchange --refresh ssd/r
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
  Couldn't find device with uuid KC4elc-LlBl-jCVC-eBF9-lrZc-LWSd-veEOGW.
  Refusing refresh of partial LV ssd/r. Use '--activationmode partial' to override.
[root@vm102 ~]# for d in /dev/sd*;do echo running > /sys/block/`basename $d`/device/state;done
[root@vm102 ~]# lvchange --refresh ssd/r
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
[root@vm102 ~]# dmsetup  table ssd-r
0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 - -
[root@vm102 ~]# lvchange --refresh ssd/r
  WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!
[root@vm102 ~]# dmsetup  table ssd-r
0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 253:5 253:7

Comment 3 Heinz Mauelshagen 2016-11-30 14:27:11 UTC

So the workaround is 2 "lvchange --refresh ..." runs after the PV with the
components turned accessible again as mentioned before.

Analyzing "lvchange -vvvv --refresh ..." shows the linear tables for *_r{meta|rimage]_1" are being reloaded together with the raid1 table
"0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 253:5 253:7"
are being processed in the _first_ refresh, still the table output is
"0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 - -"
until after the _second_ refresh run.

Excerpt from "lvchange -vvvv --refresh ..." of first run:
#libdm-deptree.c:2732     Loading ssd-r table (253:8)
#libdm-deptree.c:2676         Adding target to (253:8): 0 524288 raid raid1 3 0 region_size 1024 2 253:2 253:3 253:5 253:7
#ioctl/libdm-iface.c:1832         dm table   (253:8) OF   [16384] (*1)
#ioctl/libdm-iface.c:1832         dm reload   (253:8) NF   [16384] (*1)


Jon,
can you second this behaviour?

Comment 4 Heinz Mauelshagen 2016-11-30 14:30:00 UTC

Created attachment 1226348 [details]
First "lvchange -vvvv --refresh ... " run output

Comment 5 Heinz Mauelshagen 2016-11-30 14:30:45 UTC

Created attachment 1226350 [details]
Second "lvchange -vvvv --refresh ..." run output

Comment 6 Heinz Mauelshagen 2016-11-30 17:09:37 UTC

Transient device failures aren't supported by lvm yet.

In the first refresh run, the "*_r{meta|image}_1" SubLVs still contain mappings to error targets (the *_missing_* devices), thus causing the dm-raid constructor to fail on reading any metadata. This is because the linear mappings are loaded into the inactive slots of the respective mapped devices but are being resumed _after_ the load of the raid target (they'd need to be resumed prior to the raid1 target load).

In the second refresh run, the linear mappings are active so the raid constructor succeeds reading the RAID superblock.

Discussed the implications seen in the traces with Zdenek who has approaches how to better this situation generically (I let him describe what he has in mind).

So the workaround is the double "lvchange --refresh ..." until we have the enhancements to transient device failures solved generically.

Comment 7 Heinz Mauelshagen 2016-11-30 22:01:28 UTC

Upstream commit 0b8bf73a63d8 avoids the need to run two "lvchange --refresh ..."  until we have enhanced lvm too handle transient failures better.

Comment 8 Zdenek Kabelac 2016-12-01 09:23:10 UTC

The commit from Comment 7 is only temporary hack - which doesn't work with locking mechanism we do currently use - as only TOP level has a lock and any attempt to suspend & resume remotely active subLVs will do nothing.

The trouble we have here is - our locking/activation code is 'free' to change and replace missing segments of a LV without having these change stored in lvm2 metadata.

So whenever next command is running - it has no idea table content is different from a lvm2 metadata state - ATM there is no such deep revalidation of a device being done.

But this is still not such a big issue when we consider lvm2 has no plan yet to support transient device failures.

State of now:

When lvm2 detects missing PV - such a PV should be marked as MISSING_PV and its 'reattachment' back to VG happen only when user requests: 'vgextend --restoremissing'

So we really cannot claim we do support 'transient' disk failures - there needs to be some plan for it.

We may need a PV dedication as raid-leg for this even, then however we are not much different from direct 'mdadm' usage.

Comment 9 Heinz Mauelshagen 2016-12-05 16:11:09 UTC

(In reply to Zdenek Kabelac from comment #8)
> The commit from Comment 7 is only temporary hack - which doesn't work with
> locking mechanism we do currently use - as only TOP level has a lock and any
> attempt to suspend & resume remotely active subLVs will do nothing.

The patch is aiming at convenience unitl we come up with a concept to dea with sane preload/resume sequencing.  The cluster case is rather rare where user requests refresh on one node when the RaidLV os exclusively activated on another.

<SNIP>

Comment 10 Heinz Mauelshagen 2016-12-12 21:11:08 UTC

Posted upstream commit 87117c2b2546 in addition to 0b8bf73a63d8 to cope with remotely active, clustered RaidLVs.

Comment 14 Heinz Mauelshagen 2016-12-23 02:51:46 UTC

Upstream commit 95d68f1d0e16
(and kernel patch
"[dm-devel][PATCH] dm raid: fix transient device failure processing").

Comment 16 Corey Marthaler 2017-06-27 23:27:59 UTC

Marking verified with the latest rpms.

3.10.0-688.el7.x86_64
lvm2-2.02.171-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
lvm2-libs-2.02.171-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
lvm2-cluster-2.02.171-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
device-mapper-1.02.140-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
device-mapper-libs-1.02.140-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
device-mapper-event-1.02.140-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
device-mapper-event-libs-1.02.140-7.el7    BUILT: Thu Jun 22 08:35:15 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017

Comment 17 errata-xmlrpc 2017-08-01 21:49:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2222