RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1075819 - thin pool stacked on raid doesn't appear to survive a _tmeta device failure
Summary: thin pool stacked on raid doesn't appear to survive a _tmeta device failure
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: LVM and device-mapper development team
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-12 22:06 UTC by Corey Marthaler
Modified: 2021-09-03 12:39 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-03-14 09:00:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Corey Marthaler 2014-03-12 22:06:44 UTC
Description of problem:
Scenario kill_primary_synced_raid1_2legs: Kill primary leg of synced 2 leg raid1 volume(s)

********* RAID hash info for this scenario *********
* names:              synced_primary_raid1_2legs_1
* sync:               1
* type:               raid1
* -m |-i value:       2
* leg devices:        /dev/sdd1 /dev/sde1 /dev/sdc1
* failpv(s):          /dev/sdd1
* failnode(s):        host-005.virt.lab.msp.redhat.com
* additional snap:    /dev/sde1
* lvmetad:             1
* thinpool stack:      1
* raid fault policy:   warn
******************************************************

Creating raids(s) on host-005.virt.lab.msp.redhat.com...
host-005.virt.lab.msp.redhat.com: lvcreate --type raid1 -m 2 -n synced_primary_raid1_2legs_1 -L 500M black_bird /dev/sdd1:0-2000 /dev/sde1:0-2000 /dev/sdc1:0-2000

Current mirror/raid device structure(s):
  LV                                      Attr       LSize   Cpy%Sync Devices
   synced_primary_raid1_2legs_1            rwi-a-r--- 500.00m    5.60 synced_primary_raid1_2legs_1_rimage_0(0),synced_primary_raid1_2legs_1_rimage_1(0),synced_primary_raid1_2legs_1_rimage_2(0)
   [synced_primary_raid1_2legs_1_rimage_0] Iwi-aor--- 500.00m         /dev/sdd1(1)
   [synced_primary_raid1_2legs_1_rimage_1] Iwi-aor--- 500.00m         /dev/sde1(1)
   [synced_primary_raid1_2legs_1_rimage_2] Iwi-aor--- 500.00m         /dev/sdc1(1)
   [synced_primary_raid1_2legs_1_rmeta_0]  ewi-aor---   4.00m         /dev/sdd1(0)
   [synced_primary_raid1_2legs_1_rmeta_1]  ewi-aor---   4.00m         /dev/sde1(0)
   [synced_primary_raid1_2legs_1_rmeta_2]  ewi-aor---   4.00m         /dev/sdc1(0)

Waiting until all mirror|raid volumes become fully syncd...
   0/1 mirror(s) are fully synced: ( 65.80% )
   1/1 mirror(s) are fully synced: ( 100.00% )

********* THIN POOL info for this scenario *********
* poolmetadatadevice:           /dev/sdd1
* kill meta device:             1
* poolmetadataspare:            0
******************************************************

Convert mirror/raid volume(s) to Thinpool volume(s) on host-005.virt.lab.msp.redhat.com...
lvcreate -n meta_synced_primary_raid1_2legs_1 -L 200M black_bird /dev/sdd1
lvconvert --thinpool black_bird/synced_primary_raid1_2legs_1 --poolmetadata meta_synced_primary_raid1_2legs_1 --poolmetadataspare n
  WARNING: recovery of pools without pool metadata spare LV is not automated.
lvcreate --virtualsize 200M --thinpool black_bird/synced_primary_raid1_2legs_1 -n virt_synced_primary_raid1_2legs_1

Creating ext on top of mirror(s) on host-005.virt.lab.msp.redhat.com...
mke2fs 1.42.9 (28-Dec-2013)
Mounting mirrored ext filesystems on host-005.virt.lab.msp.redhat.com...

Current mirror/raid device structure(s):
  LV                                            Attr       LSize   Cpy%Sync Devices
   synced_primary_raid1_2legs_1                  twi-a-tz-- 500.00m          synced_primary_raid1_2legs_1_tdata(0)
   [synced_primary_raid1_2legs_1_tdata]          rwi-aor--- 500.00m   100.00 synced_primary_raid1_2legs_1_tdata_rimage_0(0),synced_primary_raid1_2legs_1_tdata_rimage_1(0),synced_primary_raid1_2legs_1_tdata_rimage_2(0)
   [synced_primary_raid1_2legs_1_tdata_rimage_0] iwi-aor--- 500.00m          /dev/sdd1(1)
   [synced_primary_raid1_2legs_1_tdata_rimage_1] iwi-aor--- 500.00m          /dev/sde1(1)
   [synced_primary_raid1_2legs_1_tdata_rimage_2] iwi-aor--- 500.00m          /dev/sdc1(1)
   [synced_primary_raid1_2legs_1_tdata_rmeta_0]  ewi-aor---   4.00m          /dev/sdd1(0)
   [synced_primary_raid1_2legs_1_tdata_rmeta_1]  ewi-aor---   4.00m          /dev/sde1(0)
   [synced_primary_raid1_2legs_1_tdata_rmeta_2]  ewi-aor---   4.00m          /dev/sdc1(0)
   [synced_primary_raid1_2legs_1_tmeta]          ewi-ao---- 200.00m          /dev/sdd1(126)
   virt_synced_primary_raid1_2legs_1             Vwi-aotz-- 200.00m

PV=/dev/sdd1
        synced_primary_raid1_2legs_1_tdata_rimage_0: 2
        synced_primary_raid1_2legs_1_tdata_rmeta_0: 2
        synced_primary_raid1_2legs_1_tmeta: 2

Writing verification files (checkit) to mirror(s) on...
        ---- host-005.virt.lab.msp.redhat.com ----

Sleeping 15 seconds to get some outsanding EXT I/O locks before the failure 
lvcreate -s /dev/black_bird/virt_synced_primary_raid1_2legs_1 -n snap1_synced_primary_raid1_2legs_1
lvcreate -s /dev/black_bird/virt_synced_primary_raid1_2legs_1 -n snap2_synced_primary_raid1_2legs_1
lvcreate -s /dev/black_bird/virt_synced_primary_raid1_2legs_1 -n snap3_synced_primary_raid1_2legs_1

Verifying files (checkit) on mirror(s) on...
        ---- host-005.virt.lab.msp.redhat.com ----

Disabling device sdd on host-005.virt.lab.msp.redhat.com  /dev/sdd1: read failed after 0 of 2048 at 0: Input/output error

Mar 12 16:49:16 host-005 kernel: [  366.378105] sd 8:0:0:1: rejecting I/O to offline device
Mar 12 16:49:16 host-005 kernel: [  366.378857] device-mapper: thin: 253:10: metadata operation 'dm_pool_commit_metadata' failed: error = -5
Mar 12 16:49:16 host-005 kernel: [  366.380079] device-mapper: thin: 253:10: switching pool to read-only mode
Mar 12 16:49:16 host-005 kernel: [  366.397336] sd 8:0:0:1: rejecting I/O to offline device
Mar 12 16:49:16 host-005 kernel: [  366.403215] device-mapper: thin: 253:10: aborting transaction failed
Mar 12 16:49:16 host-005 kernel: [  366.404049] device-mapper: thin: 253:10: switching pool to failure mode

Getting recovery check start time from /var/log/messages: Mar 12 16:49
Attempting I/O to cause mirror down conversion(s) on host-005.virt.lab.msp.redhat.com
10+0 records in
10+0 records out
41943040 bytes (42 MB) copied, 0.113346 s, 370 MB/s

Verifying current sanity of lvm after the failure
down conversion didn't appear to work as a simple lvs fails

[root@host-005 ~]# lvs -a -o +devices
  PV hx01Xy-y902-Awu6-NabD-TY9K-q03L-eqrO72 not recognised. Is the device missing?
  PV hx01Xy-y902-Awu6-NabD-TY9K-q03L-eqrO72 not recognised. Is the device missing?
  Failed to parse thin pool params: Fail.
  Failed to parse thin pool params: Fail.
  dm_report_object: report function failed for field data_percent
  Failed to parse thin params: Fail.
  Failed to parse thin params: Fail.
  dm_report_object: report function failed for field data_percent
  LV                                            Attr       LSize   Pool                         Origin                            Cpy%Sync Devices
  snap1_synced_primary_raid1_2legs_1            Vwi---tzpk 200.00m synced_primary_raid1_2legs_1 virt_synced_primary_raid1_2legs_1
  snap2_synced_primary_raid1_2legs_1            Vwi---tzpk 200.00m synced_primary_raid1_2legs_1 virt_synced_primary_raid1_2legs_1
  snap3_synced_primary_raid1_2legs_1            Vwi---tzpk 200.00m synced_primary_raid1_2legs_1 virt_synced_primary_raid1_2legs_1
  synced_primary_raid1_2legs_1                  twi-a-tzp- 500.00m                                                             
  [synced_primary_raid1_2legs_1_tdata]          rwi-aor-p- 500.00m                                                                  100.00 synced_primary_raid1_2legs_1_tdata_rimage_0(0),synced_primary_raid1_2legs_1_tdata_rimage_1(0),synced_primary_raid1_2legs_1_tdata_rimage_2(0)
  [synced_primary_raid1_2legs_1_tdata_rimage_0] iwi-aor-p- 500.00m                                                                         unknown device(1)
  [synced_primary_raid1_2legs_1_tdata_rimage_1] iwi-aor--- 500.00m                                                                         /dev/sde1(1)
  [synced_primary_raid1_2legs_1_tdata_rimage_2] iwi-aor--- 500.00m                                                                         /dev/sdc1(1)
  [synced_primary_raid1_2legs_1_tdata_rmeta_0]  ewi-aor-p-   4.00m                                                                         unknown device(0)
  [synced_primary_raid1_2legs_1_tdata_rmeta_1]  ewi-aor---   4.00m                                                                         /dev/sde1(0)
  [synced_primary_raid1_2legs_1_tdata_rmeta_2]  ewi-aor---   4.00m                                                                         /dev/sdc1(0)
  [synced_primary_raid1_2legs_1_tmeta]          ewi-ao--p- 200.00m                                                                         unknown device(126)
  virt_synced_primary_raid1_2legs_1             Vwi-a-tzp- 200.00m synced_primary_raid1_2legs_1


  Mar 12 16:49:33 host-005 qarshd[4561]: Running cmdline: lvs > /dev/null 2>&1
  Mar 12 16:49:34 host-005 lvm[1592]: device-mapper: waitevent ioctl on  failed: Interrupted system call
  Mar 12 16:49:34 host-005 lvm[1592]: Failed to parse thin pool params: Fail.
  Mar 12 16:49:34 host-005 lvm[1592]: Failed to parse status.
  Mar 12 16:49:44 host-005 lvm[1592]: device-mapper: waitevent ioctl on  failed: Interrupted system call
  Mar 12 16:49:44 host-005 lvm[1592]: Failed to parse thin pool params: Fail.
  Mar 12 16:49:44 host-005 lvm[1592]: Failed to parse status.
  [...]
  Mar 12 16:56:14 host-005 lvm[1592]: device-mapper: waitevent ioctl on  failed: Interrupted system call
  Mar 12 16:56:14 host-005 lvm[1592]: Failed to parse thin pool params: Fail.
  Mar 12 16:56:14 host-005 lvm[1592]: Failed to parse status.
  Mar 12 16:56:24 host-005 lvm[1592]: device-mapper: waitevent ioctl on  failed: Interrupted system call
  Mar 12 16:56:24 host-005 lvm[1592]: Failed to parse thin pool params: Fail.
  Mar 12 16:56:24 host-005 lvm[1592]: Failed to parse status.


Version-Release number of selected component (if applicable):
3.10.0-90.el7.x86_64
lvm2-2.02.105-12.el7    BUILT: Wed Mar 12 10:49:52 CDT 2014
lvm2-libs-2.02.105-12.el7    BUILT: Wed Mar 12 10:49:52 CDT 2014
lvm2-cluster-2.02.105-12.el7    BUILT: Wed Mar 12 10:49:52 CDT 2014
device-mapper-1.02.84-12.el7    BUILT: Wed Mar 12 10:49:52 CDT 2014
device-mapper-libs-1.02.84-12.el7    BUILT: Wed Mar 12 10:49:52 CDT 2014
device-mapper-event-1.02.84-12.el7    BUILT: Wed Mar 12 10:49:52 CDT 2014
device-mapper-event-libs-1.02.84-12.el7    BUILT: Wed Mar 12 10:49:52 CDT 2014
device-mapper-persistent-data-0.2.8-4.el7    BUILT: Fri Jan 24 14:28:55 CST 2014
cmirror-2.02.105-12.el7    BUILT: Wed Mar 12 10:49:52 CDT 2014


How reproducible:
Everytime

Comment 2 Marian Csontos 2014-03-14 09:00:50 UTC
_tmeta is not on RAID. I can not imagine how could it survive losing the only metadata device except perhaps if everything were kept in memory (which IIUC it is not) dumping it to pmspare.


Note You need to log in before you can comment on or make changes to this bug.