Bug 1420550 - lvmetad PV confusion led to activation failure of transiently failed raid volume
Summary: lvmetad PV confusion led to activation failure of transiently failed raid volume
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: lvm2
Version: 6.9
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Heinz Mauelshagen
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-08 23:48 UTC by Corey Marthaler
Modified: 2017-12-06 11:57 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-06 11:57:44 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Corey Marthaler 2017-02-08 23:48:26 UTC
Description of problem:
================================================================================                                                                                                                                                                                      
Iteration 0.16 started at Wed Feb  8 16:45:00 CST 2017
================================================================================
Scenario kill_second_spanned_primary_synced_raid4_2legs: Kill primary leg of synced 2 leg raid4 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_spanned_primary_raid4_2legs_1
* sync:               1
* type:               raid4
* -m |-i value:       2
* leg devices:        /dev/sda1 /dev/sdd1 /dev/sdf1 /dev/sdc1 /dev/sdb1 /dev/sdg1
* spanned legs:       1
* manual repair:      0
* failpv(s):          /dev/sdc1
* failnode(s):        host-082
* lvmetad:            1
* raid fault policy:  warn
******************************************************

Creating raids(s) on host-082...
host-082: lvcreate --type raid4 -i 2 -n synced_spanned_primary_raid4_2legs_1 -L 500M black_bird /dev/sda1:0-62 /dev/sdd1:0-62 /dev/sdf1:0-62 /dev/sdc1:0-62 /dev/sdb1:0-62 /dev/sdg1:0-62

Current mirror/raid device structure(s):
  LV                                              Attr       LSize   Cpy%Sync Devices
  synced_spanned_primary_raid4_2legs_1            rwi-a-r--- 504.00m 0.00     synced_spanned_primary_raid4_2legs_1_rimage_0(0),synced_spanned_primary_raid4_2legs_1_rimage_1(0),synced_spanned_primary_raid4_2legs_1_rimage_2(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_0] Iwi-aor--- 252.00m          /dev/sda1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_0] Iwi-aor--- 252.00m          /dev/sdc1(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_1] Iwi-aor--- 252.00m          /dev/sdd1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_1] Iwi-aor--- 252.00m          /dev/sdb1(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_2] Iwi-aor--- 252.00m          /dev/sdf1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_2] Iwi-aor--- 252.00m          /dev/sdg1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_0]  ewi-aor---   4.00m          /dev/sda1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sdd1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_2]  ewi-aor---   4.00m          /dev/sdf1(0)


Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating gfs2 on top of mirror(s) on host-082...
mkfs.gfs2 -J 32M -j 1 -p lock_nolock /dev/black_bird/synced_spanned_primary_raid4_2legs_1 -O
Mounting mirrored gfs2 filesystems on host-082...

PV=/dev/sdc1
        synced_spanned_primary_raid4_2legs_1_rimage_0: 2

Writing verification files (checkit) to mirror(s) on...
        ---- host-082 ----


<start name="host-082_synced_spanned_primary_raid4_2legs_1"  pid="13692" time="Wed Feb  8 16:45:40 2017 -0600" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-082 ----


Disabling device sdc on host-082

Attempting I/O to cause mirror down conversion(s) on host-082
dd if=/dev/zero of=/mnt/synced_spanned_primary_raid4_2legs_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.207122 s, 203 MB/s
dd if=/dev/zero of=/mnt/synced_spanned_primary_raid4_2legs_1/ddfile seek=200 count=50 bs=1M
50+0 records in
50+0 records out
52428800 bytes (52 MB) copied, 2.1289 s, 24.6 MB/s

HACK TO KILL XDOIO...
<fail name="host-082_synced_spanned_primary_raid4_2legs_1"  pid="13692" time="Wed Feb  8 16:46:11 2017 -0600" type="cmd" duration="31" ec="143" />
ALL STOP!
Unmounting gfs and removing mnt point on host-082...

Verifying proper "D"ead kernel status state for failed raid images(s)
No dead "D" kernel state was found for this raid image
This is a known issue where triggering a repair in raid4|5 spanned volumes is difficult and inconsistent, moving on...

Reactivaing the raids containing transiently failed raid images
lvchange -an black_bird/synced_spanned_primary_raid4_2legs_1

lvchange -ay  black_bird/synced_spanned_primary_raid4_2legs_1
  /dev/sdc1: open failed: No such device or address
  Device /dev/sdc1 has size of 0 sectors which is smaller than corresponding PV size of 44034102 sectors. Was device resized?
  One or more devices used as PVs in VG black_bird have changed sizes.
  device-mapper: reload ioctl on (253:3) failed: No such device or address
unable to re-activate black_bird/synced_spanned_primary_raid4_2legs_1




[root@host-082 ~]# lvs -a -o +devices
  /dev/sdc1: open failed: No such device or address
  Device /dev/sdc1 has size of 0 sectors which is smaller than corresponding PV size of 44034102 sectors. Was device resized?
  One or more devices used as PVs in VG black_bird have changed sizes.
  LV                                              Attr       LSize   Cpy%Sync Devices
  synced_spanned_primary_raid4_2legs_1            rwi---r--- 504.00m          synced_spanned_primary_raid4_2legs_1_rimage_0(0),synced_spanned_primary_raid4_2legs_1_rimage_1(0),synced_spanned_primary_raid4_2legs_1_rimage_2(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_0] Iwi---r--- 252.00m          /dev/sda1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_0] Iwi---r--- 252.00m          /dev/sdc1(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_1] Iwi---r--- 252.00m          /dev/sdd1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_1] Iwi---r--- 252.00m          /dev/sdb1(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_2] Iwi---r--- 252.00m          /dev/sdf1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_2] Iwi---r--- 252.00m          /dev/sdg1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_0]  ewi-a-r-r-   4.00m          /dev/sda1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_1]  ewi---r---   4.00m          /dev/sdd1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_2]  ewi---r---   4.00m          /dev/sdf1(0)

[root@host-082 ~]# lvchange -ay  black_bird/synced_spanned_primary_raid4_2legs_1
  /dev/sdc1: open failed: No such device or address
  Device /dev/sdc1 has size of 0 sectors which is smaller than corresponding PV size of 44034102 sectors. Was device resized?
  One or more devices used as PVs in VG black_bird have changed sizes.
  device-mapper: reload ioctl on (253:3) failed: No such device or address

[root@host-082 ~]# dmsetup ls
black_bird-synced_spanned_primary_raid4_2legs_1_rmeta_0 (253:2)
[root@host-082 ~]# dmsetup remove black_bird-synced_spanned_primary_raid4_2legs_1_rmeta_0

[root@host-082 ~]# lvchange -ay  black_bird/synced_spanned_primary_raid4_2legs_1
  /dev/sdc1: open failed: No such device or address
  Device /dev/sdc1 has size of 0 sectors which is smaller than corresponding PV size of 44034102 sectors. Was device resized?
  One or more devices used as PVs in VG black_bird have changed sizes.
  device-mapper: reload ioctl on (253:3) failed: No such device or address

[root@host-082 ~]# dmsetup remove black_bird-synced_spanned_primary_raid4_2legs_1_rmeta_0
[root@host-082 ~]# pvscan --cache

# Now it works after the pvscan
[root@host-082 ~]# lvchange -ay  black_bird/synced_spanned_primary_raid4_2legs_1
  WARNING: Device for PV G3DKdR-Jt8T-HeuK-L3HW-TCCy-SNg5-Qnibiu not found or rejected by a filter.


Version-Release number of selected component (if applicable):
2.6.32-688.el6.x86_64

lvm2-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
lvm2-libs-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
lvm2-cluster-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
udev-147-2.73.el6_8.2    BUILT: Tue Aug 30 08:17:19 CDT 2016
device-mapper-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-libs-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-event-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-event-libs-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-persistent-data-0.6.2-0.1.rc7.el6    BUILT: Tue Mar 22 08:58:09 CDT 2016


How reproducible:
Only once so far

Comment 2 Jan Kurik 2017-12-06 11:57:44 UTC
Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/


Note You need to log in before you can comment on or make changes to this bug.