Bug 1030121

Summary: request for clarification: if the pmspare volume is failed, should it still exist in the dm tree?
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
Status: CLOSED NOTABUG QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.5CC: agk, dwysocha, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-15 20:17:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2013-11-14 00:06:26 UTC
Description of problem:
# When a single pv volume has its device failed, there still exists a dm entry for it

[root@taft-01 ~]# lvcreate -L 100M -n test TEST
  Logical volume "test" created
[root@taft-01 ~]# lvs -a -o +devices
  LV      VG        Attr       LSize    Devices         
  test    TEST      -wi-a----- 100.00m  /dev/sdb1(0)    
[root@taft-01 ~]# echo offline > /sys/block/sdb/device/state
[root@taft-01 ~]# lvs -a -o +devices
  /dev/TEST/test: read failed after 0 of 4096 at 104792064: Input/output error
  [...]
  /dev/sdb1: read failed after 0 of 512 at 4096: Input/output error
  Couldn't find device with uuid Q5XL0S-9LPI-ctEi-FD5h-R31H-TX1T-AsnB7e.
  LV      VG        Attr       LSize   Devices
  test    TEST      -wi-a---p- 100.00m unknown device(0)
[root@taft-01 ~]# dmsetup ls
TEST-test       (253:3)


# However, when the lvol0_pmspare volume has its device failed, the dm entry is gone. Is this expected behavior? Should my test not care here and move on?

Current mirror/raid device structure(s):
  LV                                           Attr       LSize   Cpy%Sync Devices
  [lvol0_pmspare]                              ewi------- 200.00m          /dev/sdc1(126)
  synced_random_raid1_3legs_1                  twi-a-tz-- 500.00m          synced_random_raid1_3legs_1_tdata(0)
  [synced_random_raid1_3legs_1_tdata]          rwi-aor--- 500.00m   100.00 synced_random_raid1_3legs_1_tdata_rimage_0(0),synced_random_raid1_3legs_1_tdata_rimage_1(0),synced_random_raid1_3legs_1_tdata_rimage_2(0),synced_random_raid1_3legs_1_tdata_rimage_3(0)
  [synced_random_raid1_3legs_1_tdata_rimage_0] iwi-aor--- 500.00m          /dev/sde1(1)
  [synced_random_raid1_3legs_1_tdata_rimage_1] iwi-aor--- 500.00m          /dev/sdb1(1)
  [synced_random_raid1_3legs_1_tdata_rimage_2] iwi-aor--- 500.00m          /dev/sdc1(1)
  [synced_random_raid1_3legs_1_tdata_rimage_3] iwi-aor--- 500.00m          /dev/sdd1(1)
  [synced_random_raid1_3legs_1_tdata_rmeta_0]  ewi-aor---   4.00m          /dev/sde1(0)
  [synced_random_raid1_3legs_1_tdata_rmeta_1]  ewi-aor---   4.00m          /dev/sdb1(0)
  [synced_random_raid1_3legs_1_tdata_rmeta_2]  ewi-aor---   4.00m          /dev/sdc1(0)
  [synced_random_raid1_3legs_1_tdata_rmeta_3]  ewi-aor---   4.00m          /dev/sdd1(0)
  [synced_random_raid1_3legs_1_tmeta]          ewi-ao---- 200.00m          /dev/sde1(126)
  virt_synced_random_raid1_3legs_1             Vwi-aotz-- 200.00m

Disabling device sdc on taft-01

Getting recovery check start time from /var/log/messages: Nov 13 16:26
Attempting I/O to cause mirror down conversion(s) on taft-01
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.62023 s, 67.6 MB/s

Verifying current sanity of lvm after the failure

Current mirror/raid device structure(s):
  /dev/sdc1: read failed after 0 of 2048 at 0: Input/output error
  [...]
  /dev/sdc1: read failed after 0 of 512 at 4096: Input/output error
  Couldn't find device with uuid 3O6Ulm-NPhO-eDGX-A8in-68rS-3AKx-1Y7gFd.
  LV                                           Attr       LSize   Cpy%Sync Devices
  [lvol0_pmspare]                              ewi-----p- 200.00m          unknown device(126)
  snap1_synced_random_raid1_3legs_1            Vwi---tzpk 200.00m
  snap2_synced_random_raid1_3legs_1            Vwi---tzpk 200.00m
  snap3_synced_random_raid1_3legs_1            Vwi---tzpk 200.00m
  synced_random_raid1_3legs_1                  twi-a-tzp- 500.00m          synced_random_raid1_3legs_1_tdata(0)
  [synced_random_raid1_3legs_1_tdata]          rwi-aor-p- 500.00m   100.00 synced_random_raid1_3legs_1_tdata_rimage_0(0),synced_random_raid1_3legs_1_tdata_rimage_1(0),synced_random_raid1_3legs_1_tdata_rimage_2(0),synced_random_raid1_3legs_1_tdata_rimage_3(0)
  [synced_random_raid1_3legs_1_tdata_rimage_0] iwi-aor--- 500.00m          /dev/sde1(1)
  [synced_random_raid1_3legs_1_tdata_rimage_1] iwi-aor--- 500.00m          /dev/sdb1(1)
  [synced_random_raid1_3legs_1_tdata_rimage_2] iwi-aor-p- 500.00m          unknown device(1)
  [synced_random_raid1_3legs_1_tdata_rimage_3] iwi-aor--- 500.00m          /dev/sdd1(1)
  [synced_random_raid1_3legs_1_tdata_rmeta_0]  ewi-aor---   4.00m          /dev/sde1(0)
  [synced_random_raid1_3legs_1_tdata_rmeta_1]  ewi-aor---   4.00m          /dev/sdb1(0)
  [synced_random_raid1_3legs_1_tdata_rmeta_2]  ewi-aor-p-   4.00m          unknown device(0)
  [synced_random_raid1_3legs_1_tdata_rmeta_3]  ewi-aor---   4.00m          /dev/sdd1(0)
  [synced_random_raid1_3legs_1_tmeta]          ewi-ao---- 200.00m          /dev/sde1(126)
  virt_synced_random_raid1_3legs_1             Vwi-aotzp- 200.00m

Verifying FAILED device /dev/sdc1 is *NOT* in the volume(s)
Verifying IMAGE device /dev/sde1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdb1 *IS* in the volume(s)
Verifying IMAGE device /dev/sdd1 *IS* in the volume(s)
verify the rimage/rmeta dm devices remain after the failures
Checking EXISTENCE and STATE of lvol0_pmspare on: taft-01 
lvol0_pmspare on taft-01 should still exist


[root@taft-01 ~]# dmsetup ls
black_bird-synced_random_raid1_3legs_1_tdata_rmeta_1    (253:6)
black_bird-synced_random_raid1_3legs_1_tdata_rmeta_0    (253:4)
black_bird-synced_random_raid1_3legs_1_tdata_rimage_3   (253:11)
black_bird-synced_random_raid1_3legs_1_tdata_rimage_2   (253:9)
black_bird-virt_synced_random_raid1_3legs_1     (253:15)
black_bird-synced_random_raid1_3legs_1_tdata_rimage_1   (253:7)
black_bird-synced_random_raid1_3legs_1_tdata_rimage_0   (253:5)
black_bird-synced_random_raid1_3legs_1  (253:14)
black_bird-synced_random_raid1_3legs_1-tpool    (253:13)
black_bird-synced_random_raid1_3legs_1_tdata    (253:12)
black_bird-synced_random_raid1_3legs_1_tdata_rmeta_3    (253:10)
black_bird-synced_random_raid1_3legs_1_tdata_rmeta_2    (253:8)
black_bird-synced_random_raid1_3legs_1_tmeta    (253:3)


Version-Release number of selected component (if applicable):
2.6.32-425.el6.x86_64
lvm2-2.02.100-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
lvm2-libs-2.02.100-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
lvm2-cluster-2.02.100-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
udev-147-2.51.el6    BUILT: Thu Oct 17 06:14:34 CDT 2013
device-mapper-1.02.79-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
device-mapper-libs-1.02.79-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
device-mapper-event-1.02.79-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
device-mapper-event-libs-1.02.79-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
device-mapper-persistent-data-0.2.8-2.el6    BUILT: Mon Oct 21 09:14:25 CDT 2013
cmirror-2.02.100-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013


How reproducible:
Everytime

Comment 1 Zdenek Kabelac 2013-11-14 09:31:18 UTC
I'm somewhat confused from this question.

From 'lvs' output  lvol0_pmspare volume seems to be inactive and also 'dmsetup' is not listing volume as active.

The tool will likely not automatically handle 'double faults' - thus if more errors appears at once then user needs to resolve problems.

Comment 2 Corey Marthaler 2014-10-20 16:31:06 UTC
Comment 1 makes a valid point. After looking at this more, the pmspare device *never* appears to be active (even before the failure) so it should never show up in dmsetup, correct? If that's the case then this can be closed NOTABUG.



[root@host-109 ~]# lvs -a -o +devices
 LV                     Attr       LSize   Pool  Data%  Meta% Cpy%Sync Devices
 [lvol0_pmspare]        ewi------- 500.00m                             /dev/sda1(126)
 raid1                  twi-a-tz-- 500.00m       0.00   0.01           raid1_tdata(0)
 [raid1_tdata]          rwi-aor--- 500.00m                    100.00   raid1_tdata_rimage_0(0),raid1_tdata_rimage_1(0)
 [raid1_tdata_rimage_0] iwi-aor--- 500.00m                             /dev/sda1(1)
 [raid1_tdata_rimage_1] iwi-aor--- 500.00m                             /dev/sdb1(1)
 [raid1_tdata_rmeta_0]  ewi-aor---   4.00m                             /dev/sda1(0)
 [raid1_tdata_rmeta_1]  ewi-aor---   4.00m                             /dev/sdb1(0)
 [raid1_tmeta]          ewi-ao---- 500.00m                             /dev/sdd1(0)
 virt_1                 Vwi-a-tz-- 100.00m raid1 0.00
 virt_2                 Vwi-a-tz-- 100.00m raid1 0.00

[root@host-109 ~]# lvchange -ay TEST/lvol0_pmspare
 Unable to change internal LV lvol0_pmspare directly

[root@host-109 ~]# lvs -a -o +devices
 LV                     Attr       LSize   Pool  Data%  Meta% Cpy%Sync Devices
 [lvol0_pmspare]        ewi------- 500.00m                             /dev/sda1(126)
 raid1                  twi-a-tz-- 500.00m       0.00   0.01           raid1_tdata(0)
 [raid1_tdata]          rwi-aor--- 500.00m                    100.00   raid1_tdata_rimage_0(0),raid1_tdata_rimage_1(0)
 [raid1_tdata_rimage_0] iwi-aor--- 500.00m                             /dev/sda1(1)
 [raid1_tdata_rimage_1] iwi-aor--- 500.00m                             /dev/sdb1(1)
 [raid1_tdata_rmeta_0]  ewi-aor---   4.00m                             /dev/sda1(0)
 [raid1_tdata_rmeta_1]  ewi-aor---   4.00m                             /dev/sdb1(0)
 [raid1_tmeta]          ewi-ao---- 500.00m                             /dev/sdd1(0)
 virt_1                 Vwi-a-tz-- 100.00m raid1 0.00
 virt_2                 Vwi-a-tz-- 100.00m raid1 0.00

[root@host-109 ~]# dmsetup ls
TEST-raid1      (253:9)
TEST-raid1-tpool        (253:8)
TEST-raid1_tdata        (253:7)
TEST-raid1_tmeta        (253:2)
TEST-virt_2     (253:11)
TEST-raid1_tdata_rimage_1       (253:6)
TEST-virt_1     (253:10)
TEST-raid1_tdata_rimage_0       (253:4)
TEST-raid1_tdata_rmeta_1        (253:5)
TEST-raid1_tdata_rmeta_0        (253:3)

[root@host-109 ~]# echo offline > /sys/block/sda/device/state
[root@host-109 ~]# pvscan --cache /dev/sda1
 /dev/sda1: read failed after 0 of 2048 at 0: Input/output error
 No PV label found on /dev/sda1.

[root@host-109 ~]# lvs -a -o +devices
 PV 85qEBV-5uad-iH4L-KAjW-DMgj-EvHJ-fL2VfE not recognised. Is the device missing?
 PV 85qEBV-5uad-iH4L-KAjW-DMgj-EvHJ-fL2VfE not recognised. Is the device missing?
 LV                     Attr       LSize   Pool  Data%  Meta% Cpy%Sync Devices
 [lvol0_pmspare]        ewi-----p- 500.00m                             unknown device(126)
 raid1                  twi-a-tzp- 500.00m       0.00   0.01           raid1_tdata(0)
 [raid1_tdata]          rwi-aor-p- 500.00m                    100.00   raid1_tdata_rimage_0(0),raid1_tdata_rimage_1(0)
 [raid1_tdata_rimage_0] iwi-aor-p- 500.00m                             unknown device(1)
 [raid1_tdata_rimage_1] iwi-aor--- 500.00m                             /dev/sdb1(1)
 [raid1_tdata_rmeta_0]  ewi-aor-p-   4.00m                             unknown device(0)
 [raid1_tdata_rmeta_1]  ewi-aor---   4.00m                             /dev/sdb1(0)
 [raid1_tmeta]          ewi-ao---- 500.00m                             /dev/sdd1(0)
 virt_1                 Vwi-a-tzp- 100.00m raid1 0.00
 virt_2                 Vwi-a-tzp- 100.00m raid1 0.00

[root@host-109 ~]# dmsetup ls
TEST-raid1      (253:9)
TEST-raid1-tpool        (253:8)
TEST-raid1_tdata        (253:7)
TEST-raid1_tmeta        (253:2)
TEST-virt_2     (253:11)
TEST-raid1_tdata_rimage_1       (253:6)
TEST-virt_1     (253:10)
TEST-raid1_tdata_rimage_0       (253:4)
TEST-raid1_tdata_rmeta_1        (253:5)
TEST-raid1_tdata_rmeta_0        (253:3)


3.10.0-163.el7.x86_64

lvm2-2.02.111-1.el7    BUILT: Mon Sep 29 09:18:07 CDT 2014
lvm2-libs-2.02.111-1.el7    BUILT: Mon Sep 29 09:18:07 CDT 2014
lvm2-cluster-2.02.111-1.el7    BUILT: Mon Sep 29 09:18:07 CDT 2014
device-mapper-1.02.90-1.el7    BUILT: Mon Sep 29 09:18:07 CDT 2014
device-mapper-libs-1.02.90-1.el7    BUILT: Mon Sep 29 09:18:07 CDT 2014
device-mapper-event-1.02.90-1.el7    BUILT: Mon Sep 29 09:18:07 CDT 2014
device-mapper-event-libs-1.02.90-1.el7    BUILT: Mon Sep 29 09:18:07 CDT 2014
device-mapper-persistent-data-0.3.2-1.el7    BUILT: Thu Apr  3 09:58:51 CDT 2014
cmirror-2.02.111-1.el7    BUILT: Mon Sep 29 09:18:07 CDT 2014

Comment 3 Zdenek Kabelac 2015-10-15 20:17:27 UTC
_pmspare - as such will appear in the 'dmtable'  in certain cases.

One of them will be 'pvmove' - where we so far don't how to move 'extents' assigned to some LV offline.

Other case could be some repair operation in progress.

But other then these limited case, it should stay inactive.