Bug 1462028

Summary: when up converting linear to raid primary image fails, one set of warnings/instructions is enough
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Heinz Mauelshagen <heinzm>
lvm2 sub component: Mirroring and RAID QA Contact: cluster-qe <cluster-qe>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: medium    
Priority: unspecified CC: agk, heinzm, jbrassow, msnitzer, prajnoha, prockai, zkabelac
Version: 7.4   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-08 17:54:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2017-06-15 22:48:15 UTC
Description of problem:
This is the test scenario for bug 1458239. When all hope is lost during this type of failure, there's no need to spam the syslog.



Scenario kill_primary_non_synced_converted_linear: Kill primary leg of NON synced 1 leg raid1 (upconverted linear) volume(s)

********* RAID hash info for this scenario *********
* names:              non_synced_primary_linear_convert_2legs_1
* sync:               0
* type:               linear
* -m |-i value:       1
* leg devices:        /dev/sdf1 /dev/sde1
* spanned legs:       0
* manual repair:      1
* no MDA devices:    
* failpv(s):          /dev/sdf1
* failnode(s):        host-094
* lvmetad:            1
* raid fault policy:  allocate
******************************************************
 
Creating raids(s) on host-094...
host-094: lvcreate  --type linear -n non_synced_primary_linear_convert_2legs_1 -L 3G black_bird /dev/sdf1:0-3600
host-094: lvconvert --yes --type raid1 -m 1 black_bird/non_synced_primary_linear_convert_2legs_1 /dev/sdf1:0-3600 /dev/sde1:0-3600
 
Current mirror/raid device structure(s):
  LV                                                   Attr       LSize Cpy%Sync Devices
  non_synced_primary_linear_convert_2legs_1            rwi-a-r--- 3.00g 100.00   non_synced_primary_linear_convert_2legs_1_rimage_0(0),non_synced_primary_linear_convert_2legs_1_rimage_1(0)
  [non_synced_primary_linear_convert_2legs_1_rimage_0] iwi-aor--- 3.00g          /dev/sdf1(0)
  [non_synced_primary_linear_convert_2legs_1_rimage_1] Iwi-aor--- 3.00g          /dev/sde1(1)
  [non_synced_primary_linear_convert_2legs_1_rmeta_0]  ewi-aor--- 4.00m          /dev/sdf1(768)
  [non_synced_primary_linear_convert_2legs_1_rmeta_1]  ewi-aor--- 4.00m          /dev/sde1(0)
 
Creating xfs on top of mirror(s) on host-094...
Mounting mirrored xfs filesystems on host-094...
 
PV=/dev/sdf1
        non_synced_primary_linear_convert_2legs_1_rimage_0: 1.0
        non_synced_primary_linear_convert_2legs_1_rmeta_0: 1.0
 
Writing verification files (checkit) to mirror(s) on...
        ---- host-094 ----
 
Verifying files (checkit) on mirror(s) on...
        ---- host-094 ----
 
Current sync percent just before failure
        ( 29.42% )
 
Disabling device sdf on host-094rescan device...
  WARNING: Not using lvmetad because a repair command was run.
 
Getting recovery check start time from /var/log/messages: Jun 15 16:09
Attempting I/O to cause mirror down conversion(s) on host-094
dd if=/dev/zero of=/mnt/non_synced_primary_linear_convert_2legs_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.11141 s, 376 MB/s
 
<fail name="host-094_non_synced_primary_linear_convert_2legs_1"  pid="18729" time="Thu Jun 15 16:09:46 2017 -0500" type="cmd" duration="16" ec="1" />
ALL STOP!
Verifying current sanity of lvm after the failure
 
Current mirror/raid device structure(s):
  WARNING: Not using lvmetad because a repair command was run.
  /dev/black_bird/non_synced_primary_linear_convert_2legs_1: read failed after 0 of 512 at 0: Input/output error
  /dev/black_bird/non_synced_primary_linear_convert_2legs_1: read failed after 0 of 512 at 3221159936: Input/output error
  /dev/black_bird/non_synced_primary_linear_convert_2legs_1: read failed after 0 of 512 at 3221217280: Input/output error
  /dev/black_bird/non_synced_primary_linear_convert_2legs_1: read failed after 0 of 512 at 4096: Input/output error
  /dev/black_bird/non_synced_primary_linear_convert_2legs_1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 1024 at 22545367040: Input/output error
  /dev/sdf1: read failed after 0 of 1024 at 22545448960: Input/output error
  /dev/sdf1: read failed after 0 of 1024 at 0: Input/output error
  /dev/sdf1: read failed after 0 of 1024 at 4096: Input/output error
  /dev/sdf1: read failed after 0 of 2048 at 0: Input/output error
  Couldn't find device with uuid mCtW2b-cn17-AaTY-sYl1-tYla-YDiW-IEE25j.
  WARNING: Couldn't find all devices for LV black_bird/non_synced_primary_linear_convert_2legs_1_rmeta_0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/non_synced_primary_linear_convert_2legs_1_rimage_0 while checking used and assumed devices.
  LV                                                   Attr       LSize   Cpy%Sync Devices                                                                                                    
   non_synced_primary_linear_convert_2legs_1            rwi-aor-p-   3.00g 100.00   non_synced_primary_linear_convert_2legs_1_rimage_0(0),non_synced_primary_linear_convert_2legs_1_rimage_1(0)
   [non_synced_primary_linear_convert_2legs_1_rimage_0] iwi-aor-p-   3.00g          [unknown](0)                                                                                              
   [non_synced_primary_linear_convert_2legs_1_rimage_1] Iwi-aor---   3.00g          /dev/sde1(1)                                                                                              
   [non_synced_primary_linear_convert_2legs_1_rmeta_0]  ewi-aor-p-   4.00m          [unknown](768)                                                                                            
   [non_synced_primary_linear_convert_2legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sde1(0)                                                                                              
   root                                                 -wi-ao----  <6.20g          /dev/vda2(205)                                                                                            
   swap                                                 -wi-ao---- 820.00m          /dev/vda2(0)                                                                                              
 
 
Verifying FAILED device /dev/sdf1 is *NOT* in the volume(s)
Verifying IMAGE device /dev/sde1 *IS* in the volume(s)
Verify the rimage/rmeta dm devices remain after the failures
Checking EXISTENCE and STATE of non_synced_primary_linear_convert_2legs_1_rimage_0 on: host-094
        Since this failure occured on the primary image while the raid was not in sync, portions of the array may be unrecoverable.
        As such, an allocate policy will be unable to automatically repair it. An unknown device is expected here.
 
Checking EXISTENCE and STATE of non_synced_primary_linear_convert_2legs_1_rmeta_0 on: host-094
        Since this failure occured on the primary image while the raid was not in sync, portions of the array may be unrecoverable.
        As such, an allocate policy will be unable to automatically repair it. An unknown device is expected here.
 
Verify the raid image order is what's expected based on raid fault policy
EXPECTED LEG ORDER: unknown /dev/sde1
ACTUAL LEG ORDER: [unknown] /dev/sde1
unknown ne [unknown]
/dev/sde1 ne /dev/sde1
Manually repairing failed raid volumes
host-094: 'lvconvert --yes --force --repair black_bird/non_synced_primary_linear_convert_2legs_1'
  [...]
  WARNING: Couldn't find all devices for LV black_bird/non_synced_primary_linear_convert_2legs_1_rmeta_0 while checking used and assumed devices.
  WARNING: Couldn't find all devices for LV black_bird/non_synced_primary_linear_convert_2legs_1_rimage_0 while checking used and assumed devices.
  Unable to repair black_bird/non_synced_primary_linear_convert_2legs_1.  Source devices failed before the RAID could synchronize.
  You should choose one of the following:
    1) deactivate black_bird/non_synced_primary_linear_convert_2legs_1, revive failed device, re-activate LV, and proceed.
    2) remove the LV (all data is lost).
    3) Seek expert advice to attempt to salvage any data from remaining devices.
  Failed to replace faulty devices in black_bird/non_synced_primary_linear_convert_2legs_1.
 
 
[root@host-094 ~]# grep choose /var/log/messages
Jun 15 16:09:41 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:44 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:44 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:45 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:45 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:45 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:46 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:46 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:47 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:47 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:47 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:47 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:47 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:47 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:57 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:57 host-094 lvm[21736]: You should choose one of the following:
Jun 15 16:09:57 host-094 lvm[21736]: You should choose one of the following:


Version-Release number of selected component (if applicable):
lvm2-2.02.171-5.el7    BUILT: Wed Jun 14 10:33:32 CDT 2017
lvm2-libs-2.02.171-5.el7    BUILT: Wed Jun 14 10:33:32 CDT 2017
lvm2-cluster-2.02.171-5.el7    BUILT: Wed Jun 14 10:33:32 CDT 2017
device-mapper-1.02.140-5.el7    BUILT: Wed Jun 14 10:33:32 CDT 2017
device-mapper-libs-1.02.140-5.el7    BUILT: Wed Jun 14 10:33:32 CDT 2017
device-mapper-event-1.02.140-5.el7    BUILT: Wed Jun 14 10:33:32 CDT 2017
device-mapper-event-libs-1.02.140-5.el7    BUILT: Wed Jun 14 10:33:32 CDT 2017
device-mapper-persistent-data-0.7.0-0.1.rc6.el7    BUILT: Mon Mar 27 10:15:46 CDT 2017

Comment 2 Heinz Mauelshagen 2018-03-08 17:54:11 UTC
Upstream commit 9d976c0002f06e97c50bca7dad35d647848ed60f reduced those multiple "WARNING: Couldn't find all devices for LV..." into one.  But that commit already made it into lv2-2.02.151