Bug 1420514

Summary: after a failure, raid1 "span" images do not maintain proper D kernel state across reactivation
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Mirroring and RAID (RHEL6) QA Contact: cluster-qe <cluster-qe>
Status: CLOSED NOTABUG Docs Contact:
Severity: low    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac
Version: 6.9   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-13 17:14:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2017-02-08 21:01:00 UTC
Description of problem:
This is one of the individual checks listed for the transient failure feature bug 1265191. Basically, after the reactivation, the kernel state can improperly go back to "A" when it should be maintained as "D", correct? Sometimes if you wait awhile or run pvscan enough times, it'll go back to the proper "D".

From 1265191, comment #22:
2) kernel sets leg device to dead ('D' device status char) on
        access to RaidLV and throws an event

[...]

5) "lvchange -an RaidLV;lvchange -ay RaidLV" causes the creation of
        transient SubLVs "*_r{meta|image}_*-missing_0_0" with error targets for
        "*_r{meta|image}_*"



EXPECTED BEHAVIOR (regular raid1 volume):

# RAID1 image, one complete image/device is failed

[root@host-112 ~]# lvs -a -o +devices
  /dev/sdd1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdd1: read failed after 0 of 1024 at 22545367040: Input/output error
  /dev/sdd1: read failed after 0 of 1024 at 22545448960: Input/output error
  /dev/sdd1: read failed after 0 of 1024 at 0: Input/output error
  /dev/sdd1: read failed after 0 of 1024 at 4096: Input/output error
  Couldn't find device with uuid gQSZvQ-BWHp-444R-JXiE-lhiC-dkTT-NRn3Qb.
  Couldn't find device for segment belonging to black_bird/synced_primary_raid1_2legs_1_rimage_0 while checking used and assumed devices.
  LV                                      Attr       LSize   Origin                       Data% Cpy%Sync Devices
  bb_snap1                                swi-a-s--- 252.00m synced_primary_raid1_2legs_1 29.22          /dev/sdb1(126)
  synced_primary_raid1_2legs_1            owi-a-r-p- 500.00m                                    100.00   synced_primary_raid1_2legs_1_rimage_0(0),synced_primary_raid1_2legs_1_rimage_1(0),sy
  [synced_primary_raid1_2legs_1_rimage_0] iwi-aor-p- 500.00m                                             unknown device(1)
  [synced_primary_raid1_2legs_1_rimage_1] iwi-aor--- 500.00m                                             /dev/sdb1(1)
  [synced_primary_raid1_2legs_1_rimage_2] iwi-aor--- 500.00m                                             /dev/sdg1(1)
  [synced_primary_raid1_2legs_1_rmeta_0]  ewi-aor-p-   4.00m                                             unknown device(0)
  [synced_primary_raid1_2legs_1_rmeta_1]  ewi-aor---   4.00m                                             /dev/sdb1(0)
  [synced_primary_raid1_2legs_1_rmeta_2]  ewi-aor---   4.00m                                             /dev/sdg1(0)

[root@host-112 ~]# dmsetup status
black_bird-synced_primary_raid1_2legs_1_rmeta_2: 0 8192 linear 
black_bird-synced_primary_raid1_2legs_1_rmeta_1: 0 8192 linear 
black_bird-synced_primary_raid1_2legs_1_rmeta_0: 0 8192 linear 
black_bird-bb_snap1: 0 1024000 snapshot 150816/516096 600
black_bird-synced_primary_raid1_2legs_1: 0 1024000 snapshot-origin 
black_bird-bb_snap1-cow: 0 516096 linear 
black_bird-synced_primary_raid1_2legs_1_rimage_2: 0 1024000 linear 
black_bird-synced_primary_raid1_2legs_1_rimage_1: 0 1024000 linear 
black_bird-synced_primary_raid1_2legs_1-real: 0 1024000 raid raid1 3 DAA 1024000/1024000 idle 0    <- Proper DAA state
black_bird-synced_primary_raid1_2legs_1_rimage_0: 0 1024000 linear 

[root@host-112 ~]# lvchange -an black_bird/synced_primary_raid1_2legs_1
  /dev/sdd1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdd1: read failed after 0 of 1024 at 22545367040: Input/output error
  /dev/sdd1: read failed after 0 of 1024 at 22545448960: Input/output error
  /dev/sdd1: read failed after 0 of 1024 at 0: Input/output error
  /dev/sdd1: read failed after 0 of 1024 at 4096: Input/output error
  Couldn't find device with uuid gQSZvQ-BWHp-444R-JXiE-lhiC-dkTT-NRn3Qb.
  Couldn't find device for segment belonging to black_bird/synced_primary_raid1_2legs_1_rimage_0 while checking used and assumed devices.

[root@host-112 ~]# lvchange -ay  black_bird/synced_primary_raid1_2legs_1
  /dev/sdd1: open failed: No such device or address
  Couldn't find device with uuid gQSZvQ-BWHp-444R-JXiE-lhiC-dkTT-NRn3Qb.

[root@host-112 ~]# dmsetup status
black_bird-synced_primary_raid1_2legs_1_rmeta_2: 0 8192 linear 
black_bird-synced_primary_raid1_2legs_1_rmeta_1: 0 8192 linear 
black_bird-synced_primary_raid1_2legs_1_rmeta_0: 0 8192 linear 
black_bird-bb_snap1: 0 1024000 snapshot 150816/516096 600
black_bird-synced_primary_raid1_2legs_1: 0 1024000 snapshot-origin 
black_bird-bb_snap1-cow: 0 516096 linear 
black_bird-synced_primary_raid1_2legs_1_rimage_2: 0 1024000 linear 
black_bird-synced_primary_raid1_2legs_1_rimage_1: 0 1024000 linear 
black_bird-synced_primary_raid1_2legs_1_rmeta_0-missing_0_0: 0 8192 error 
black_bird-synced_primary_raid1_2legs_1-real: 0 1024000 raid raid1 3 DAA 1024000/1024000 idle 0    <- Maintains proper DAA state
black_bird-synced_primary_raid1_2legs_1_rimage_0: 0 1024000 linear 
black_bird-synced_primary_raid1_2legs_1_rimage_0-missing_0_0: 0 1024000 error 




FAILING BEHAVIOR (spanned image raid1 volume):

# RAID1 image spanning multiple devices, one device is failed                                                                                                                  
                                                                                                                                                                                      
[root@host-077 ~]# lvs -a -o +devices                                                                                                                                                         
  WARNING: Device for PV pTT8JI-XefI-NdaN-m2og-Bv1l-2LtL-Vs7E3b not found or rejected by a filter.                                                                                                                      
  Couldn't find device for segment belonging to black_bird/synced_spanned_primary_raid1_2legs_1_rimage_0 while checking used and assumed devices.                                                                       
  LV                                              Attr       LSize   Cpy%Sync Devices                                                                                                                                                  
  synced_spanned_primary_raid1_2legs_1            rwi-a-r-p- 500.00m 100.00   synced_spanned_primary_raid1_2legs_1_rimage_0(0),synced_spanned_primary_raid1_2legs_1_rimage_1(0)                                                        
  [synced_spanned_primary_raid1_2legs_1_rimage_0] iwi-aor-p- 500.00m          /dev/sde1(1)                                                                                                                                                            
  [synced_spanned_primary_raid1_2legs_1_rimage_0] iwi-aor-p- 500.00m          unknown device(0)                                                                                                                                                       
  [synced_spanned_primary_raid1_2legs_1_rimage_1] iwi-aor--- 500.00m          /dev/sdc1(1)                                                                                                                                                            
  [synced_spanned_primary_raid1_2legs_1_rimage_1] iwi-aor--- 500.00m          /dev/sdh1(0)                                                                                                                                                                       
  [synced_spanned_primary_raid1_2legs_1_rmeta_0]  ewi-aor-r-   4.00m          /dev/sde1(0)                                                                                                                                                                           
  [synced_spanned_primary_raid1_2legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sdc1(0)                                                                                                                                                                                 

[root@host-077 ~]# dmsetup status 
black_bird-synced_spanned_primary_raid1_2legs_1_rmeta_1: 0 8192 linear 
black_bird-synced_spanned_primary_raid1_2legs_1_rmeta_0: 0 8192 linear 
black_bird-synced_spanned_primary_raid1_2legs_1_rimage_1: 0 507904 linear 
black_bird-synced_spanned_primary_raid1_2legs_1_rimage_1: 507904 516096 linear 
black_bird-synced_spanned_primary_raid1_2legs_1: 0 1024000 raid raid1 2 DA 1024000/1024000 idle 0   <- Proper DA state
black_bird-synced_spanned_primary_raid1_2legs_1_rimage_0: 0 507904 linear 
black_bird-synced_spanned_primary_raid1_2legs_1_rimage_0: 507904 516096 linear 

[root@host-077 ~]# lvchange -an black_bird/synced_spanned_primary_raid1_2legs_1
  WARNING: Device for PV pTT8JI-XefI-NdaN-m2og-Bv1l-2LtL-Vs7E3b not found or rejected by a filter.
  Couldn't find device for segment belonging to black_bird/synced_spanned_primary_raid1_2legs_1_rimage_0 while checking used and assumed devices.

[root@host-077 ~]# lvchange -ay black_bird/synced_spanned_primary_raid1_2legs_1
  WARNING: Device for PV pTT8JI-XefI-NdaN-m2og-Bv1l-2LtL-Vs7E3b not found or rejected by a filter.

[root@host-077 ~]# dmsetup status 
black_bird-synced_spanned_primary_raid1_2legs_1_rmeta_1: 0 8192 linear 
black_bird-synced_spanned_primary_raid1_2legs_1_rmeta_0: 0 8192 linear 
black_bird-synced_spanned_primary_raid1_2legs_1_rimage_1: 0 507904 linear 
black_bird-synced_spanned_primary_raid1_2legs_1_rimage_1: 507904 516096 linear 
black_bird-synced_spanned_primary_raid1_2legs_1: 0 1024000 raid raid1 2 AA 1024000/1024000 idle 0   <- Why is it back to AA?
black_bird-synced_spanned_primary_raid1_2legs_1_rimage_0: 0 507904 linear 
black_bird-synced_spanned_primary_raid1_2legs_1_rimage_0: 507904 516096 linear 
black_bird-synced_spanned_primary_raid1_2legs_1_rimage_0-missing_1_0: 0 516096 error 

[root@host-077 ~]# lvs -a -o +devices
  WARNING: Device for PV pTT8JI-XefI-NdaN-m2og-Bv1l-2LtL-Vs7E3b not found or rejected by a filter.
  LV                                              Attr       LSize   Cpy%Sync Devices
  synced_spanned_primary_raid1_2legs_1            rwi-a-r-p- 500.00m 100.00   synced_spanned_primary_raid1_2legs_1_rimage_0(0),synced_spanned_primary_raid1_2legs_1_rimage_1(0)
  [synced_spanned_primary_raid1_2legs_1_rimage_0] iwi-aor-p- 500.00m          /dev/sde1(1)
  [synced_spanned_primary_raid1_2legs_1_rimage_0] iwi-aor-p- 500.00m          unknown device(0)
  [synced_spanned_primary_raid1_2legs_1_rimage_1] iwi-aor--- 500.00m          /dev/sdc1(1)
  [synced_spanned_primary_raid1_2legs_1_rimage_1] iwi-aor--- 500.00m          /dev/sdh1(0)
  [synced_spanned_primary_raid1_2legs_1_rmeta_0]  ewi-aor---   4.00m          /dev/sde1(0)
  [synced_spanned_primary_raid1_2legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sdc1(0)



Version-Release number of selected component (if applicable):
2.6.32-688.el6.x86_64

lvm2-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
lvm2-libs-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
lvm2-cluster-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
udev-147-2.73.el6_8.2    BUILT: Tue Aug 30 08:17:19 CDT 2016
device-mapper-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-libs-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-event-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-event-libs-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-persistent-data-0.6.2-0.1.rc7.el6    BUILT: Tue Mar 22 08:58:09 CDT 2016

Comment 2 Heinz Mauelshagen 2017-02-13 17:14:38 UTC
It is back to AA, because the MD kernel hasn't accessed the failing segment in the second half of the first raid1 leg yet.
If it does, the state will change to DA again.
This expected.