Bug 1007074
Summary: | the pool *_tmeta device can only be corrupted and then restored once | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Corey Marthaler <cmarthal> |
Component: | lvm2 | Assignee: | Zdenek Kabelac <zkabelac> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 6.5 | CC: | agk, cmarthal, dwysocha, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, tlavigne, zkabelac |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.02.100-6.el6 | Doc Type: | Enhancement |
Doc Text: |
Support to repair of corrupted thin pool metadata is needed when t-p-m gets broken. For this case user may try automated repair via 'lvconvert --repair vg/pool' or low-level manual repair. In this case user may swap t-p-m volume out of thin-pool LV via 'lvconvert --poolmetadata swapLV vg/pool' and run manual recover operation with the use of thin_check, thin_dump, thin_repair commands. Once the user has repaired t-p-m volume ready, he could swap back such volume back. Although this low-level repair should be only used when user is fully aware of thin-pool functionality.
|
Story Points: | --- |
Clone Of: | 970798 | Environment: | |
Last Closed: | 2013-11-21 23:27:56 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 970798 | ||
Bug Blocks: | 1015096 |
Description
Corey Marthaler
2013-09-11 21:00:19 UTC
So, given all the included discussion, what is this 6.5 bug *actually* asking to be fixed in 6.5? So is this a case of thin_check and thin_restore proceeding blindly without checking whether or not the metadata is in use and warning first of the dangers? By analogy, what does fsck do if run on a mounted filesystem? So except for the 'usability' of thin-repair utility itself (see bug #1019217) there is a user oriented 'way' to repair thin-pool metadata devices. lvconvert --repair vg/poolname Before its use the repaired thinpool device must be inactive. It will use 'recovery' _pmspare device to create a new 'repaired' device. It will then 'swap' this repaired device (wherever it's located) back into a thinpool. The original 'bad' metadata will appear in the VG as poolname_tmeta0 (or any bigger free digit) for further analysis in case of problems. Another new pool metadata spare _pmspare volume is again allocated. There are couple surrouding WARNING messages for a user - i.e.: WARNING: If everything works, remove "@PREFIX@vg/pool_tmeta0". WARNING: Use pvmove command to move "@PREFIX@vg/pool_tmeta" on the best fitting PV. I'll add couple more patches to enable usage of swapping for manual repair operation. For lvconvert --repair functionality there should not be needed any more patches. Bug #970798 Comment 6 shows the patches that resolves the swapping issue. https://www.redhat.com/archives/lvm-devel/2013-October/msg00050.html https://www.redhat.com/archives/lvm-devel/2013-October/msg00053.html Usage of swapping needs to be well documented. Preferred way for repair is lvconvert --repair vgname/poolname. This fix is needed to avoid possibility of swapping metadata device for active thin pool which is used by active thin volumes. Patch adds missing check to ensure thin-pool volume is not active during pool metadata device swapping. *** Bug 1006062 has been marked as a duplicate of this bug. *** h probably requiring its own new bug. ISSUE 1. The restore case continues to fail if the POOL is inactive (like it's supposed to be) but appears to work fine if the POOL remains active through out the whole process. DEACTIVE AND CORRUPTED POOL RESTORE: Restoring /dev/mapper/snapper_thinp-POOL_tmeta using dumped file thin_restore -i /tmp/snapper_thinp_dump_1.5583.28170 -o /dev/mapper/snapper_thinp-POOL_tmeta transaction_manager::new_block() couldn't allocate new block ACTIVE AND CORRUPTED POOL RESTORE: Restoring /dev/mapper/snapper_thinp-POOL_tmeta using dumped file thin_restore -i /tmp/snapper_thinp_dump_15.9095.862 -o /dev/mapper/snapper_thinp-POOL_tmeta Verifying that pool meta device is no longer corrupt thin_check /dev/mapper/snapper_thinp-POOL_tmeta examining superblock examining devices tree examining mapping tree (* And this continues to work fine over and over each time i re-corrupt it *) ISSUE 2. The removal of thin snapshot volumes continues to fail even after sucessful swap and repair cases. [root@harding-02 ~]# lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL Converted snapper_thinp/POOL to thin pool. [root@harding-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Devices POOL snapper_thinp twi---t--- 1.00g POOL_tdata(0) [POOL_tdata] snapper_thinp Twi------- 1.00g /dev/sdb3(1) [POOL_tmeta] snapper_thinp ewi------- 8.00m /dev/sdb3(257) [lvol0_pmspare] snapper_thinp ewi------- 4.00m /dev/sdb3(0) origin snapper_thinp Vwi---t--- 1.00g POOL other1 snapper_thinp Vwi---t--- 1.00g POOL other2 snapper_thinp Vwi---t--- 1.00g POOL other3 snapper_thinp Vwi---t--- 1.00g POOL other4 snapper_thinp Vwi---t--- 1.00g POOL other5 snapper_thinp Vwi---t--- 1.00g POOL snap1 snapper_thinp Vwi---t--k 1.00g POOL origin [root@harding-02 ~]# lvremove snapper_thinp/tmeta_snap1 Thin pool transaction_id=0, while expected: 7. Unable to deactivate open snapper_thinp-POOL_tmeta (253:2) Unable to deactivate open snapper_thinp-POOL_tdata (253:3) Failed to deactivate snapper_thinp-POOL-tpool Failed to update thin pool POOL. ISSUE 3. This is pretty minor, a repair attempt after a swap could use a better error message when it fails. thin_check /dev/mapper/snapper_thinp-POOL_tmeta examining superblock superblock is corrupt bad checksum in superblock WARNING: Integrity check of metadata for thin pool snapper_thinp/POOL failed. Swap in new _tmeta device lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL [root@harding-02 ~]# lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL Converted snapper_thinp/POOL to thin pool. [root@harding-02 ~]# lvconvert --repair snapper_thinp/POOL Internal error: Missing allocatable pvs. ISSUE 4. Again minor, when dumping mda from the tmeta device, the pool volume has to be active. This may not even be an issue, but I thought I was told in one of these bugs to attempt to have the pool inactive for all thin_* cmds. Cut comment from above comment #11: With a bit more testing I'll probably feel confident enough to mark the basic swap and repair case verified as they now work for me. However, there are still issues, each probably requiring its own new bug. The thin_repair certainly needs a new version of device-mapper-persistent-data package (2.8 ?) (bug #1019217) There are problems with current version 2.7 which doesn't correctly detect spacemap corruptions. Destruction/removal of damaged pool needs more work and thinking. Currently it's somewhat obscure. If the thin-pool is broken - there probably no other way then to remove metadata by hand via vgcfgbackup/restore - since the code will persist on removing each individual thin-volume from the pool first before removing whole pool. This can't succeed if the pool is damage and there is no way to force this process to move on currently. This will need a new BZ to handle this case in some more usable way. (In reply to Zdenek Kabelac from comment #13) > The thin_repair certainly needs a new version of > device-mapper-persistent-data package (2.8 ?) (bug #1019217) Then once the new dmpd build is in, we should do an lvm2 blocker bug for requirement for this new version in lvm2 package! *** Bug 1006065 has been marked as a duplicate of this bug. *** With the caveats listed in comment #16, this bug can be marked verified as the basic corrupt and swap case does now work. 2.6.32-410.el6.x86_64 lvm2-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 lvm2-libs-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 lvm2-cluster-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 udev-147-2.50.el6 BUILT: Fri Oct 11 05:58:10 CDT 2013 device-mapper-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 device-mapper-libs-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 device-mapper-event-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 device-mapper-event-libs-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 cmirror-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 ============================================================ Iteration 10 of 10 started at Thu Oct 24 05:57:05 CDT 2013 ============================================================ SCENARIO - [swap_deactive_thin_pool_meta_device_w_linear] Swap a _tmeta device with newly created linear LV while pool is deactivated Making origin volume lvcreate --thinpool POOL --zero n -L 1G snapper_thinp Sanity checking pool device metadata (thin_check /dev/mapper/snapper_thinp-POOL_tmeta) examining superblock examining devices tree examining mapping tree lvcreate --virtualsize 1G -T snapper_thinp/POOL -n origin lvcreate -V 1G -T snapper_thinp/POOL -n other1 lvcreate -V 1G -T snapper_thinp/POOL -n other2 lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other3 lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other4 lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other5 Making snapshot of origin volume lvcreate -K -s /dev/snapper_thinp/origin -n snap Create new device to swap in as the new _tmeta device Dumping current pool metadata to /tmp/snapper_thinp_dump.8009.16462 thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump.8009.16462 Current tmeta device: /dev/sdc2 Restoring valid mda to new device thin_restore -i /tmp/snapper_thinp_dump.8009.16462 -o /dev/snapper_thinp/newtmeta Corrupting pool meta device (/dev/mapper/snapper_thinp-POOL_tmeta) dd if=/dev/zero of=/dev/mapper/snapper_thinp-POOL_tmeta count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00417424 s, 123 kB/s Verifying that pool meta device is now corrupt thin_check /dev/mapper/snapper_thinp-POOL_tmeta examining superblock superblock is corrupt bad checksum in superblock Swap in new _tmeta device lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL New swapped tmeta device: /dev/sdb3 lvremove snapper_thinp/newtmeta Removing volume snapper_thinp/snap lvremove -f /dev/snapper_thinp/snap Removing thin origin and other virtual thin volumes Removing thinpool snapper_thinp/POOL Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1704.html |