1079067 – unable to restore pool _tmeta device from live thin_dump'ed file

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1079067 - unable to restore pool _tmeta device from live thin_dump'ed file

Summary: unable to restore pool _tmeta device from live thin_dump'ed file

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	device-mapper-persistent-data
Sub Component:
Version:	7.0
Hardware:	x86_64
OS:	Other
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Joe Thornber
QA Contact:	Jakub Krysl
Docs Contact:	Joe Thornber
URL:
Whiteboard:
Depends On:
Blocks:	1469559
TreeView+	depends on / blocked

Reported:	2014-03-20 21:56 UTC by Corey Marthaler
Modified:	2021-09-03 12:04 UTC (History)
CC List:	11 users (show)
Fixed In Version:	device-mapper-persistent-data-0.7.3-1.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-10 13:15:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:0776	0	None	None	None	2018-04-10 13:16:51 UTC

Description Corey Marthaler 2014-03-20 21:56:29 UTC

Description of problem:
This is one of the left over issues from bug 970798. Bug 970798 basically says the only way to properly restore a corrupted meta device is an inactive swap to a new device. However lvm will still let you attempt to dump from a live device to a file and then attempt to restore from that dumped file. This doesn't work. If it's not supported then it shouldn't be allowed and if it is supported then if should be fixed.


SCENARIO - [verify_io_between_online_mda_corruptions]
Create a snapshot and then verify it's io contents in between ONLINE pool mda corruptions and restorations
Making origin volume
lvcreate --thinpool POOL --zero n -L 5G snapper_thinp
Sanity checking pool device metadata
(thin_check /dev/mapper/snapper_thinp-POOL_tmeta)
examining superblock
examining devices tree
examining mapping tree
lvcreate --virtualsize 1G -T snapper_thinp/POOL -n origin
lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other1
lvcreate -V 1G -T snapper_thinp/POOL -n other2
lvcreate -V 1G -T snapper_thinp/POOL -n other3
lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other4
lvcreate -V 1G -T snapper_thinp/POOL -n other5
Placing an XFS filesystem on origin volume
Mounting origin volume

Writing files to /mnt/origin
Checking files on /mnt/origin

*** Pool MDA Corrupt/Restore iteration 1/5 ***
syncing before snap creation...
Creating thin snapshot of origin volume
lvcreate -K -s /dev/snapper_thinp/origin -n snap1
Mounting thin snapshot volume
mkdir -p /mnt/snap1
mount -o nouuid /dev/snapper_thinp/snap1 /mnt/snap1

Checking files on /mnt/snap1
Writing files to /mnt/origin
Checking files on /mnt/origin

Dumping current pool metadata to /tmp/snapper_thinp_dump_1.8910.2296
thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump_1.8910.2296

Corrupting pool meta device (/dev/mapper/snapper_thinp-POOL_tmeta)
dd if=/dev/zero of=/dev/mapper/snapper_thinp-POOL_tmeta count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000413308 s, 1.2 MB/s
Verifying that pool meta device is now corrupt
thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
  superblock is corrupt
    bad checksum in superblock

Restoring /dev/mapper/snapper_thinp-POOL_tmeta using dumped file
thin_restore -i /tmp/snapper_thinp_dump_1.8910.2296 -o /dev/mapper/snapper_thinp-POOL_tmeta
Verifying that pool meta device is no longer corrupt
thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
examining devices tree
examining mapping tree

*** Pool MDA Corrupt/Restore iteration 2/5 ***
syncing before snap creation...
Creating thin snapshot of origin volume
lvcreate -K -s /dev/snapper_thinp/origin -n snap2
Mounting thin snapshot volume
mkdir -p /mnt/snap2
mount -o nouuid /dev/snapper_thinp/snap2 /mnt/snap2

Checking files on /mnt/snap2
Writing files to /mnt/origin
Checking files on /mnt/origin

Dumping current pool metadata to /tmp/snapper_thinp_dump_2.7659.2296
thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump_2.7659.2296
metadata contains errors (run thin_check for details).
perhaps you wanted to run with --repair
dump of current pool meta data failed

[root@harding-02 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
examining devices tree
examining mapping tree
  thin device 1 is missing mappings [16384, -]
    too few entries in btree_node: 1, expected at least 84(max_entries = 252)too few entries in btree_node: 1, expected at least 84(max_entries = 252)too few entries in btree_node: 1, expected at least 84(max_entries = 252)too few entries in btree_node: 1, expected at least 84(max_entries = 252)too few entries in btree_node: 1, expected at least 84(max_entries = 252)too few entries in btree_node: 24, expected at least 84(max_entries = 252)parent key mismatch: parent was 1764, but lowest in node was 0parent key mismatch: parent was 1890, but lowest in node was 126the last key of the previous leaf was 16383 and the first key of this leaf is 2016the last key of the previous leaf was 16383 and the first key of this leaf is 2142the last key of the previous leaf was 16383 and the first key of this leaf is 2268the last key of the previous leaf was 16383 and the first key of this leaf is 2396the last key of the previous leaf was 16383 and the first key of this leaf is 2522the last key of the previous leaf was 16383 and the first key of this leaf is 2648the last key of the previous 


[root@harding-02 ~]# umount /mnt/*

[root@harding-02 ~]# vgchange -an snapper_thinp
  WARNING: Integrity check of metadata for thin pool snapper_thinp/POOL failed.
  0 logical volume(s) in volume group "snapper_thinp" now active

[root@harding-02 ~]# lvremove snapper_thinp
Removing pool "POOL" will remove 8 dependent volume(s). Proceed? [y/n]: y
  Check of thin pool snapper_thinp/POOL failed (status:1). Manual repair required (thin_dump --repair /dev/mapper/snapper_thinp-POOL_tmeta)!
  Failed to update thin pool POOL.
  Failed to update thin pool POOL.
  Failed to update thin pool POOL.
  Failed to update thin pool POOL.
  Failed to update thin pool POOL.
  Failed to update thin pool POOL.
  Failed to update thin pool POOL.
  Failed to update thin pool POOL.


Version-Release number of selected component (if applicable):
3.10.0-110.el7.x86_64
lvm2-2.02.105-13.el7    BUILT: Wed Mar 19 05:38:19 CDT 2014
lvm2-libs-2.02.105-13.el7    BUILT: Wed Mar 19 05:38:19 CDT 2014
lvm2-cluster-2.02.105-13.el7    BUILT: Wed Mar 19 05:38:19 CDT 2014
device-mapper-1.02.84-13.el7    BUILT: Wed Mar 19 05:38:19 CDT 2014
device-mapper-libs-1.02.84-13.el7    BUILT: Wed Mar 19 05:38:19 CDT 2014
device-mapper-event-1.02.84-13.el7    BUILT: Wed Mar 19 05:38:19 CDT 2014
device-mapper-event-libs-1.02.84-13.el7    BUILT: Wed Mar 19 05:38:19 CDT 2014
device-mapper-persistent-data-0.2.8-4.el7    BUILT: Fri Jan 24 14:28:55 CST 2014
cmirror-2.02.105-13.el7    BUILT: Wed Mar 19 05:38:19 CDT 2014

Comment 1 Corey Marthaler 2014-03-20 22:00:19 UTC

This is related to (if not a dup of) bug 1023828 and bug 1038387.

Comment 2 Zdenek Kabelac 2014-04-23 09:22:55 UTC

Using thin_dump on live metadata is not a supported usage - however it might be seen equal to using 'dd' on a device.

In general lvm2/dm doesn't prohibit root user to shot himself - root could equally use commands like rm -rf /*  or  dd if=/dev/zero of=/dev/vg/pool_tmeta  - so while we do not suggest to use this case at all (access of life metadata) - it might be a situation where  'some data'  are better then 'no data' at all.  So at this moment 'thin_dump' is able to open live _tmeta device (which is not being opened by the thin pool target in exclusive mode).

In the future we would like to 'map' usage of thin_dump to automated internal snapshot of thin pool metadata (feature supported by thin_pool target).

I guess it would be really useful if thin_dump would be able to detect that there could be a serious problem ahead when dumping live (and likely inconsistent)  metadata and print bold WARNING to a user - but likely this operation will not be banned.

Comment 3 Corey Marthaler 2014-04-23 16:47:23 UTC

I guess I've always always seen thin_dump as more of a read operation, and not at all like doing a "dd if=/dev/zero of=/dev/vg/pool_tmeta" or an "rm -rf *". I thought we used thin_dump to get the data off the meta device and use thin_repair to put the valid data back on. So, maybe it's just me who needs to change his outlook on how potent this tool is.

Comment 4 Joe Thornber 2014-07-14 12:04:36 UTC

(In reply to Corey Marthaler from comment #3)
> I guess I've always always seen thin_dump as more of a read operation, and
> not at all like doing a "dd if=/dev/zero of=/dev/vg/pool_tmeta" or an "rm
> -rf *". I thought we used thin_dump to get the data off the meta device and
> use thin_repair to put the valid data back on. So, maybe it's just me who
> needs to change his outlook on how potent this tool is.

The metadata is changing all the time in a live system, so any dump you made would likely be inconsistent as the kernel changed the metadata underneath thin_dump.

However, there is support for taking a *metadata* snapshot and dumping this from the live device (see the --metadata-snap switch for thin_dump).  Of course the metadata snap could well be out of date, so it's mainly useful for getting the mappings of devices that we know aren't changing (eg, a snap for backup).

Comment 5 Alasdair Kergon 2014-07-16 18:53:18 UTC

Perhaps thin_dump should warn if it was unable to open the device O_EXCL (unless it's doing a 'safe' dump from a metadata snap)?

"on Linux 2.6 and
 later, O_EXCL can be used without O_CREAT if pathname refers
 to a block device.  If the block device is in use by the
 system (e.g., mounted), open() fails with the error EBUSY."

Comment 9 Jonathan Earl Brassow 2017-09-27 15:49:50 UTC

(In reply to Alasdair Kergon from comment #5)
> Perhaps thin_dump should warn if it was unable to open the device O_EXCL
> (unless it's doing a 'safe' dump from a metadata snap)?
> 
> "on Linux 2.6 and
>  later, O_EXCL can be used without O_CREAT if pathname refers
>  to a block device.  If the block device is in use by the
>  system (e.g., mounted), open() fails with the error EBUSY."

thin_dump does now warn and won't run if it can't get O_EXCL.

Comment 11 Jakub Krysl 2017-10-09 14:50:12 UTC

Fixed:

syscall 'open' failed: Device or resource busy
Note: you cannot run this tool with these options on live metadata.

Comment 14 errata-xmlrpc 2018-04-10 13:15:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0776

Note You need to log in before you can comment on or make changes to this bug.