Bug 1242623
Summary: | thin_[check|dump] is unable to open meta data device | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Corey Marthaler <cmarthal> |
Component: | lvm2 | Assignee: | Joe Thornber <thornber> |
lvm2 sub component: | Thin Provisioning | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac |
Version: | 7.2 | Keywords: | Regression, TestBlocker, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-08-25 13:39:20 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Corey Marthaler
2015-07-13 18:58:34 UTC
The thin-pool itself has the metadata device open and in use. You need to deactivate VG/POOL(In reply to Corey Marthaler from comment #0) > Description of problem: > [root@host-110 ~]# pvscan > PV /dev/vda2 VG rhel_host-110 lvm2 [7.51 GiB / 40.00 MiB free] > PV /dev/sda1 VG VG lvm2 [24.99 GiB / 24.99 GiB free] > PV /dev/sdb1 VG VG lvm2 [24.99 GiB / 24.99 GiB free] > PV /dev/sdc1 VG VG lvm2 [24.99 GiB / 24.99 GiB free] > PV /dev/sdd1 VG VG lvm2 [24.99 GiB / 24.99 GiB free] > PV /dev/sde1 VG VG lvm2 [24.99 GiB / 24.99 GiB free] > PV /dev/sdf1 VG VG lvm2 [24.99 GiB / 24.99 GiB free] > PV /dev/sdg1 VG VG lvm2 [24.99 GiB / 24.99 GiB free] > PV /dev/sdh1 VG VG lvm2 [24.99 GiB / 24.99 GiB free] > Total: 9 [207.45 GiB] / in use: 9 [207.45 GiB] / in no VG: 0 [0 ] > > [root@host-110 ~]# lvcreate --thinpool POOL --zero n -L 500M > --poolmetadatasize 4M VG > Logical volume "POOL" created. > [root@host-110 ~]# lvs -a -o +devices > LV VG Attr LSize Pool Origin Data% Meta% Devices > POOL VG twi-a-t--- 500.00m 0.00 0.88 > POOL_tdata(0) > [POOL_tdata] VG Twi-ao---- 500.00m > /dev/sda1(1) > [POOL_tmeta] VG ewi-ao---- 4.00m > /dev/sdh1(0) > [lvol0_pmspare] VG ewi------- 4.00m > /dev/sda1(0) > > [root@host-110 ~]# thin_check /dev/mapper/VG-POOL_tmeta > syscall 'open' failed: Device or resource busy The thin-pool (VG/POOL) has the metadata device open and in use. You need to deactivate VG/POOL. This should probably be closed as NOTABUG but I'll give you the benefit of the doubt and leave it open for now... Please elaborate on why you think this is a bug. In 6.7 (and 7.1 for that matter) this worked, assuming other pool meta data operations weren't being run simultaneously. Plus, with the volume inactive, what /dev device is present to be checked? # RHEL6.7 [root@mckinley-01 ~]# pvscan PV /dev/mapper/mpathbp1 VG VG lvm2 [249.99 GiB / 249.99 GiB free] PV /dev/mapper/mpathcp1 VG VG lvm2 [249.99 GiB / 249.99 GiB free] PV /dev/mapper/mpathdp1 VG VG lvm2 [249.99 GiB / 249.99 GiB free] PV /dev/mapper/mpathep1 VG VG lvm2 [249.99 GiB / 249.99 GiB free] PV /dev/mapper/mpathfp1 VG VG lvm2 [249.99 GiB / 249.99 GiB free] PV /dev/mapper/mpathgp1 VG VG lvm2 [249.99 GiB / 249.99 GiB free] PV /dev/mapper/mpathhp1 VG VG lvm2 [249.99 GiB / 249.99 GiB free] PV /dev/sda2 VG vg_mckinley01 lvm2 [557.26 GiB / 0 free] Total: 8 [2.25 TiB] / in use: 8 [2.25 TiB] / in no VG: 0 [0 ] [root@mckinley-01 ~]# lvcreate --thinpool POOL --zero n -L 500M --poolmetadatasize 4M VG Logical volume "POOL" created. [root@mckinley-01 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Devices POOL VG twi-a-t--- 500.00m 0.00 0.88 POOL_tdata(0) [POOL_tdata] VG Twi-ao---- 500.00m /dev/mapper/mpathbp1(1) [POOL_tmeta] VG ewi-ao---- 4.00m /dev/mapper/mpathhp1(0) [lvol0_pmspare] VG ewi------- 4.00m /dev/mapper/mpathbp1(0) [root@mckinley-01 ~]# thin_check /dev/mapper/VG-POOL_tmeta examining superblock examining devices tree examining mapping tree [root@mckinley-01 ~]# lvchange -an VG/POOL [root@mckinley-01 ~]# thin_check /dev/mapper/VG-POOL_tmeta Couldn't stat dev path (In reply to Corey Marthaler from comment #2) > In 6.7 (and 7.1 for that matter) this worked, assuming other pool meta data > operations weren't being run simultaneously. Plus, with the volume inactive, > what /dev device is present to be checked? Shouldn't you be using lvm to initiate the check? e.g.: lvconvert --repair VG/POOL (because yeah, only lvm knows/manages/activates the _hidden_ metadata device) To access _tmeta content user has to 'deactive' thin pool first and 'swap' _tmeta device with another LV and active this LV then and thin_check it. LVM2 nor DM wants to allow parallel access to life _tmeta device (it could only give you misleading data) So this new enforcing protection within kernel & thin_* tools is here to protect user from doing bad steps (i.e. exploring life device). In future LVM2 version we may support 'snapshot' feature of thin-pool target for accessing life _tmeta content. 'swapping' ----- # create some tmp LV lvcreate -L2 -n tmp vg # swap tmp LV with tmeta of inactive thin-pool lvconvert --thinpool vg/pool --poolmetadata tmp # active 'tmp' LV now with content of _tmeta lvchange -ay vg/tmp # thin check it thin_check /dev/vg/tmp # deactivate & swap back lvchange -an vg/tmp lvconvert --thinpool vg/pool --poolmetadata tmp # Use pool again ----- There steps are consider to be used by 'skilled' users. For remaining cases 'lvconvert --repair' is the way to go. And we slowly enhance case which --repair will handle. This bug can be closed if we are all in agreement of the following... If a user wants to check the current state of their poolmetadata device, even if they have quiesced all activity to that pool volume, no active thin_checks are allowed after 7.1/6.7, instead, the user needs to follow the steps in comment #4. I'll change what my tests do and file a new bug about having a better error when attempting to check a live poolmetadata device. Also, I've got a dumb question in response to comment #3... You can only run 'lvconvert --repair VG/POOL' on a non corrupted poolmetadata device correct? If your poolmetadata device is corrupt, then you're out of luck? There's no way to repair/swap in a new metadata device based on the kernel's current view of the metadata if you didn't already know to have restored the metadata to a tmp device? (In reply to Corey Marthaler from comment #6) > Also, I've got a dumb question in response to comment #3... > > You can only run 'lvconvert --repair VG/POOL' on a non corrupted > poolmetadata device correct? If your poolmetadata device is corrupt, then > you're out of luck? > > There's no way to repair/swap in a new metadata device based on the kernel's > current view of the metadata if you didn't already know to have restored the > metadata to a tmp device? No idea if I'm dense or something but your question really does seem "dumb". The entire point is to be able to repair broken ("corrupt") metadata. SO I have no idea why you think 'lvconvert --repair VG/POOL' is only for perfectly healthy metadata. There is a spare metadata area (should be anyway, by default) that is intended to allow for reapiring the corrupt metadata. Once repaired lvm2 pivots to the repaired metadata. :) It doesn't seem to work for me unless it's healthy. [root@host-115 ~]# lvcreate --thinpool POOL --zero y -L 1G --poolmetadatasize 4M VG Logical volume "POOL" created. [root@host-115 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Devices POOL VG twi-a-tz-- 1.00g 0.00 0.98 POOL_tdata(0) [POOL_tdata] VG Twi-ao---- 1.00g /dev/sda1(1) [POOL_tmeta] VG ewi-ao---- 4.00m /dev/sdh1(0) [lvol0_pmspare] VG ewi------- 4.00m /dev/sda1(0) [root@host-115 ~]# dd if=/dev/urandom of=/dev/mapper/VG-POOL_tmeta count=1 bs=1 skip=2 1+0 records in 1+0 records out 1 byte (1 B) copied, 0.00100156 s, 1.0 kB/s [root@host-115 ~]# lvchange -an VG/POOL WARNING: Integrity check of metadata for pool VG/POOL failed. [root@host-115 ~]# lvconvert --yes --repair VG/POOL bad checksum in superblock Repair of thin metadata volume of thin pool VG/POOL failed (status:1). Manual repair required! The "corruption" case you're focusing on is a bit of a bizarre/simple case. If the metadata's superblock is modified I'm not sure the tools can handle fixing that (probably can't). Only way to recover from a bad superblock is to write multiple copies and recover from one of the backups... filesystems do stuff like that AFAIK. Anyway, iirc Joe has tools for ways to break metadata in a more controlled and fixable way. That raises the question, do we want to write multiple copies of the superblock that a repair operation would be able to find in the event that the original was corrupted? Just a comment from lvm2 POV. The only thing lvm2 does in this case is running: thin_repair -i -o So mostly the only thing it repairs for now is to basically clean NEEDS_CHECK flag and repair some 'minor' byte-lost on metadata. There is absolutely no checking/matching between lvm2 metadata and kernel pool metadata. So don't expect any sort of fixes from lvm2 side for now. AFAIK thin_repair is not yet able to deal for lost of root btree nodes. thin_dump doesn't work either. [root@host-115 ~]# lvs -a -o +devices LV Attr LSize Pool Origin Data% Meta% Devices POOL twi---tz-- 1.00g 0.00 1.56 POOL_tdata(0) [POOL_tdata] Twi-ao---- 1.00g /dev/sdc1(1) [POOL_tmeta] ewi-ao---- 4.00m /dev/sde1(0) [lvol0_pmspare] ewi------- 4.00m /dev/sdc1(0) newtmeta -wi-a----- 8.00m /dev/sdc1(257) origin Vwi-a-tz-- 1.00g POOL 0.00 other1 Vwi-a-tz-- 1.00g POOL 0.00 other2 Vwi-a-tz-- 1.00g POOL 0.00 other3 Vwi-a-tz-- 1.00g POOL 0.00 other4 Vwi-a-tz-- 1.00g POOL 0.00 other5 Vwi-a-tz-- 1.00g POOL 0.00 snap Vwi-a-tz-k 1.00g POOL origin 0.00 [root@host-115 ~]# thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump.1376.28760 syscall 'open' failed: Device or resource busy [root@host-115 ~]# thin_dump -f human_readable /dev/mapper/snapper_thinp-POOL_tmeta syscall 'open' failed: Device or resource busy (In reply to Jonathan Earl Brassow from comment #11) > That raises the question, do we want to write multiple copies of the > superblock that a repair operation would be able to find in the event that > the original was corrupted? I'm not sure. I split the metadata off into a separate device so people could provide resilience via raid, rather than me duplicating high level nodes in the btree (the superblock is effectively the root of all the btrees). But this doesn't protect against user error that wipes the superblock, or indeed against a kernel bug that wipes the superblock. Any thoughts Alasdair? Having multiple copies of the super block would be a nice feature in the future. However, this bug is currently about how all thin_* cmds (even just thin_dump) no longer work on the "live" pool meta volume when they used to in past releases. In comment #4 Zdenek mentions that for all thin_* operations, you now need to deactivate the pool volume, create a tmp volume, swap the tmp volume in for the meta volume (a command that has now actually "corrupted" your thin pool), activate the tmp volume (which still contains the valid pool meta data), do whatever dump/checks you'd like, and then swap that volume back in as the pool meta data volume before reactivating. So, is comment #4 the new correct procedure? If so, then we A. need a better error than just "syscall 'open' failed:" when attempted on the live meta device and B. need to document this carefully everywhere. Take the man page for thin_dump: "thin_dump - dump thin provisioning metadata from device or file to standard output". It doesn't say "dump thin provisioning metadata from a former metadata device that has been deactivated and then swapped onto a tmp volume not currently associated with the origin pool volume" Same thing goes for the thin_check man page. If however comment #4 is not the new correct procedure, then when I run thin_dump on the live metadata device (like it's implied I can do in the thin_dump man page), I should be able to see the thin metadata and not a "syscall 'open' failed" warning. Looks like this is another version of bugs 1038387/1023828. You must never ask userland to examine metadata that is potentially changing (ie. for an active pool). This has always be the case; the recent patch just started enforcing this using a Linux only extension to the O_EXCL flag to the open() call. For thin_dump, and thin_delta you can use the -m flag to examine a metadata snapshot in a live pool (the snapshot is unchanging so this is safe). Very few people use this. As for the procedure from comment #4 that you mention. It does indeed seem convoluted. As far as the thin tools are concerned you can run them on the metadata device so long as the pool isn't active. So I'd expect the procedure to be: i) deactivate pool ii) run thin_check |