1420550 – lvmetad PV confusion led to activation failure of transiently failed raid volume

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1420550 - lvmetad PV confusion led to activation failure of transiently failed raid volume

Summary: lvmetad PV confusion led to activation failure of transiently failed raid volume

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	6.9
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Heinz Mauelshagen
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-08 23:48 UTC by Corey Marthaler
Modified:	2017-12-06 11:57 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-06 11:57:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Corey Marthaler 2017-02-08 23:48:26 UTC

Description of problem:
================================================================================                                                                                                                                                                                      
Iteration 0.16 started at Wed Feb  8 16:45:00 CST 2017
================================================================================
Scenario kill_second_spanned_primary_synced_raid4_2legs: Kill primary leg of synced 2 leg raid4 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_spanned_primary_raid4_2legs_1
* sync:               1
* type:               raid4
* -m |-i value:       2
* leg devices:        /dev/sda1 /dev/sdd1 /dev/sdf1 /dev/sdc1 /dev/sdb1 /dev/sdg1
* spanned legs:       1
* manual repair:      0
* failpv(s):          /dev/sdc1
* failnode(s):        host-082
* lvmetad:            1
* raid fault policy:  warn
******************************************************

Creating raids(s) on host-082...
host-082: lvcreate --type raid4 -i 2 -n synced_spanned_primary_raid4_2legs_1 -L 500M black_bird /dev/sda1:0-62 /dev/sdd1:0-62 /dev/sdf1:0-62 /dev/sdc1:0-62 /dev/sdb1:0-62 /dev/sdg1:0-62

Current mirror/raid device structure(s):
  LV                                              Attr       LSize   Cpy%Sync Devices
  synced_spanned_primary_raid4_2legs_1            rwi-a-r--- 504.00m 0.00     synced_spanned_primary_raid4_2legs_1_rimage_0(0),synced_spanned_primary_raid4_2legs_1_rimage_1(0),synced_spanned_primary_raid4_2legs_1_rimage_2(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_0] Iwi-aor--- 252.00m          /dev/sda1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_0] Iwi-aor--- 252.00m          /dev/sdc1(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_1] Iwi-aor--- 252.00m          /dev/sdd1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_1] Iwi-aor--- 252.00m          /dev/sdb1(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_2] Iwi-aor--- 252.00m          /dev/sdf1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_2] Iwi-aor--- 252.00m          /dev/sdg1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_0]  ewi-aor---   4.00m          /dev/sda1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_1]  ewi-aor---   4.00m          /dev/sdd1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_2]  ewi-aor---   4.00m          /dev/sdf1(0)


Waiting until all mirror|raid volumes become fully syncd...
   1/1 mirror(s) are fully synced: ( 100.00% )

Creating gfs2 on top of mirror(s) on host-082...
mkfs.gfs2 -J 32M -j 1 -p lock_nolock /dev/black_bird/synced_spanned_primary_raid4_2legs_1 -O
Mounting mirrored gfs2 filesystems on host-082...

PV=/dev/sdc1
        synced_spanned_primary_raid4_2legs_1_rimage_0: 2

Writing verification files (checkit) to mirror(s) on...
        ---- host-082 ----


<start name="host-082_synced_spanned_primary_raid4_2legs_1"  pid="13692" time="Wed Feb  8 16:45:40 2017 -0600" type="cmd" />
Sleeping 15 seconds to get some outsanding I/O locks before the failure 
Verifying files (checkit) on mirror(s) on...
        ---- host-082 ----


Disabling device sdc on host-082

Attempting I/O to cause mirror down conversion(s) on host-082
dd if=/dev/zero of=/mnt/synced_spanned_primary_raid4_2legs_1/ddfile count=10 bs=4M
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.207122 s, 203 MB/s
dd if=/dev/zero of=/mnt/synced_spanned_primary_raid4_2legs_1/ddfile seek=200 count=50 bs=1M
50+0 records in
50+0 records out
52428800 bytes (52 MB) copied, 2.1289 s, 24.6 MB/s

HACK TO KILL XDOIO...
<fail name="host-082_synced_spanned_primary_raid4_2legs_1"  pid="13692" time="Wed Feb  8 16:46:11 2017 -0600" type="cmd" duration="31" ec="143" />
ALL STOP!
Unmounting gfs and removing mnt point on host-082...

Verifying proper "D"ead kernel status state for failed raid images(s)
No dead "D" kernel state was found for this raid image
This is a known issue where triggering a repair in raid4|5 spanned volumes is difficult and inconsistent, moving on...

Reactivaing the raids containing transiently failed raid images
lvchange -an black_bird/synced_spanned_primary_raid4_2legs_1

lvchange -ay  black_bird/synced_spanned_primary_raid4_2legs_1
  /dev/sdc1: open failed: No such device or address
  Device /dev/sdc1 has size of 0 sectors which is smaller than corresponding PV size of 44034102 sectors. Was device resized?
  One or more devices used as PVs in VG black_bird have changed sizes.
  device-mapper: reload ioctl on (253:3) failed: No such device or address
unable to re-activate black_bird/synced_spanned_primary_raid4_2legs_1




[root@host-082 ~]# lvs -a -o +devices
  /dev/sdc1: open failed: No such device or address
  Device /dev/sdc1 has size of 0 sectors which is smaller than corresponding PV size of 44034102 sectors. Was device resized?
  One or more devices used as PVs in VG black_bird have changed sizes.
  LV                                              Attr       LSize   Cpy%Sync Devices
  synced_spanned_primary_raid4_2legs_1            rwi---r--- 504.00m          synced_spanned_primary_raid4_2legs_1_rimage_0(0),synced_spanned_primary_raid4_2legs_1_rimage_1(0),synced_spanned_primary_raid4_2legs_1_rimage_2(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_0] Iwi---r--- 252.00m          /dev/sda1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_0] Iwi---r--- 252.00m          /dev/sdc1(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_1] Iwi---r--- 252.00m          /dev/sdd1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_1] Iwi---r--- 252.00m          /dev/sdb1(0)
  [synced_spanned_primary_raid4_2legs_1_rimage_2] Iwi---r--- 252.00m          /dev/sdf1(1)
  [synced_spanned_primary_raid4_2legs_1_rimage_2] Iwi---r--- 252.00m          /dev/sdg1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_0]  ewi-a-r-r-   4.00m          /dev/sda1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_1]  ewi---r---   4.00m          /dev/sdd1(0)
  [synced_spanned_primary_raid4_2legs_1_rmeta_2]  ewi---r---   4.00m          /dev/sdf1(0)

[root@host-082 ~]# lvchange -ay  black_bird/synced_spanned_primary_raid4_2legs_1
  /dev/sdc1: open failed: No such device or address
  Device /dev/sdc1 has size of 0 sectors which is smaller than corresponding PV size of 44034102 sectors. Was device resized?
  One or more devices used as PVs in VG black_bird have changed sizes.
  device-mapper: reload ioctl on (253:3) failed: No such device or address

[root@host-082 ~]# dmsetup ls
black_bird-synced_spanned_primary_raid4_2legs_1_rmeta_0 (253:2)
[root@host-082 ~]# dmsetup remove black_bird-synced_spanned_primary_raid4_2legs_1_rmeta_0

[root@host-082 ~]# lvchange -ay  black_bird/synced_spanned_primary_raid4_2legs_1
  /dev/sdc1: open failed: No such device or address
  Device /dev/sdc1 has size of 0 sectors which is smaller than corresponding PV size of 44034102 sectors. Was device resized?
  One or more devices used as PVs in VG black_bird have changed sizes.
  device-mapper: reload ioctl on (253:3) failed: No such device or address

[root@host-082 ~]# dmsetup remove black_bird-synced_spanned_primary_raid4_2legs_1_rmeta_0
[root@host-082 ~]# pvscan --cache

# Now it works after the pvscan
[root@host-082 ~]# lvchange -ay  black_bird/synced_spanned_primary_raid4_2legs_1
  WARNING: Device for PV G3DKdR-Jt8T-HeuK-L3HW-TCCy-SNg5-Qnibiu not found or rejected by a filter.


Version-Release number of selected component (if applicable):
2.6.32-688.el6.x86_64

lvm2-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
lvm2-libs-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
lvm2-cluster-2.02.143-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
udev-147-2.73.el6_8.2    BUILT: Tue Aug 30 08:17:19 CDT 2016
device-mapper-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-libs-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-event-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-event-libs-1.02.117-12.el6    BUILT: Wed Jan 11 09:35:04 CST 2017
device-mapper-persistent-data-0.6.2-0.1.rc7.el6    BUILT: Tue Mar 22 08:58:09 CDT 2016


How reproducible:
Only once so far

Comment 2 Jan Kurik 2017-12-06 11:57:44 UTC

Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Note You need to log in before you can comment on or make changes to this bug.