Red Hat Bugzilla – Bug 1007074
the pool *_tmeta device can only be corrupted and then restored once
Last modified: 2013-11-21 18:27:56 EST
This exists in rhel6.5 as well. +++ This bug was initially created as a clone of Bug #970798 +++ Description of problem: I was trying to loop through the following: thin_dump, corrupt _tmeta, thin_check, thin_restore, and thin_check again; and found that those operation will only work once or twice. After that problems occur. [root@qalvm-01 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Devices POOL snapper_thinp twi-a-tz- 5.00g 5.86 POOL_tdata(0) [POOL_tdata] snapper_thinp Twi-aot-- 5.00g /dev/vdh1(0) [POOL_tmeta] snapper_thinp ewi-aot-- 8.00m /dev/vdd1(0) origin snapper_thinp Vwi-aotz- 1.00g POOL 29.05 other1 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 other2 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 other3 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 other4 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 other5 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 snap1 snapper_thinp Vwi-aotz- 1.00g POOL origin 14.66 [root@qalvm-01 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/snapper_thinp-origin 1014M 320M 695M 32% /mnt/origin /dev/mapper/snapper_thinp-snap1 1014M 172M 843M 17% /mnt/snap1 [root@qalvm-01 ~]# thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump_1.61.9264 [root@qalvm-01 ~]# dd if=/dev/zero of=/dev/mapper/snapper_thinp-POOL_tmeta count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 2.6939e-05 s, 19.0 MB/s [root@qalvm-01 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta bad checksum in superblock [root@qalvm-01 ~]# thin_restore -i /tmp/snapper_thinp_dump_1.61.9264 -o /dev/mapper/snapper_thinp-POOL_tmeta superblock 0 flags 0 blocknr 0 transaction id 0 data mapping root 7 details root 6 data block size 128 metadata block size 4096 metadata nr blocks 2048 [root@qalvm-01 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta [root@qalvm-01 ~]# sync [root@qalvm-01 ~]# thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump_1.61.9265 [root@qalvm-01 ~]# dd if=/dev/zero of=/dev/mapper/snapper_thinp-POOL_tmeta count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 2.6454e-05 s, 19.4 MB/s [root@qalvm-01 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta bad checksum in superblock [root@qalvm-01 ~]# thin_restore -i /tmp/snapper_thinp_dump_1.61.9264 -o /dev/mapper/snapper_thinp-POOL_tmeta superblock 0 flags 0 blocknr 0 transaction id 0 data mapping root 7 details root 6 data block size 128 metadata block size 4096 metadata nr blocks 2048 [root@qalvm-01 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta [root@qalvm-01 ~]# [root@qalvm-01 ~]# umount /mnt/* [root@qalvm-01 ~]# vgchange -an snapper_thinp Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Failed to get state of mapped device Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Failed to get state of mapped device Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Failed to get state of mapped device Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Failed to get state of mapped device [...] Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. 0 logical volume(s) in volume group "snapper_thinp" now active Jun 4 17:19:27 qalvm-01 kernel: [11658.571259] device-mapper: block manager: btree_node validator check failed for block 3 Jun 4 17:19:27 qalvm-01 kernel: [11658.572468] device-mapper: btree spine: node_check failed: csum 2848471990 != wanted 2848607790 Jun 4 17:19:27 qalvm-01 kernel: [11658.573669] device-mapper: block manager: btree_node validator check failed for block 3 Jun 4 17:19:27 qalvm-01 kernel: [11658.574887] device-mapper: btree spine: node_check failed: csum 2848471990 != wanted 2848607790 Jun 4 17:19:27 qalvm-01 kernel: [11658.576042] device-mapper: block manager: btree_node validator check failed for block 3 Jun 4 17:19:27 qalvm-01 kernel: [11658.577291] device-mapper: btree spine: node_check failed: csum 2848471990 != wanted 2848607790 Jun 4 17:19:27 qalvm-01 kernel: [11658.578528] device-mapper: block manager: btree_node validator check failed for block 3 Jun 4 17:19:27 qalvm-01 kernel: [11658.579898] device-mapper: btree spine: node_check failed: csum 2848471990 != wanted 2848607790 [root@qalvm-01 ~]# lvremove snapper_thinp Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Failed to get state of mapped device Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Failed to get state of mapped device [...] Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Huge memory allocation (size 67108864) rejected - metadata corruption? Couldn't create ioctl argument. Unable to deactivate logical volume "snap1" Version-Release number of selected component (if applicable): 3.8.0-0.40.el7.x86_64 lvm2-2.02.99-0.39.el7 BUILT: Wed May 29 08:12:36 CDT 2013 lvm2-libs-2.02.99-0.39.el7 BUILT: Wed May 29 08:12:36 CDT 2013 lvm2-cluster-2.02.99-0.39.el7 BUILT: Wed May 29 08:12:36 CDT 2013 device-mapper-1.02.78-0.39.el7 BUILT: Wed May 29 08:12:36 CDT 2013 device-mapper-libs-1.02.78-0.39.el7 BUILT: Wed May 29 08:12:36 CDT 2013 device-mapper-event-1.02.78-0.39.el7 BUILT: Wed May 29 08:12:36 CDT 2013 device-mapper-event-libs-1.02.78-0.39.el7 BUILT: Wed May 29 08:12:36 CDT 2013 cmirror-2.02.99-0.39.el7 BUILT: Wed May 29 08:12:36 CDT 2013 How reproducible: Often --- Additional comment from Zdenek Kabelac on 2013-06-04 19:02:05 EDT --- I assume you've tried to use them with live thin pool device - which is not how it's working. If you are attempting to 'repair' metadata volume - you are doing it into a separate new LV (which should have at least the size of origin _tmeta device) (2nd. note -this way you could offline resize _tmeta which is running out of free space) So this is not a bug of thin tools - but rather lack of clear documentation dealing with this. In general - it's prohibited to write anything to the _tmeta device if the dm thin target is using this device - it will lead to the cruel death of data location on such device. The proper way for recovery goes along this path - create a new empty LV with size of _tmeta. Preferably keep thin pool device unused (when thin check detects error, you are left with active _tmeta, but not active pool - this is best case scenario for recovery) The other way around is to combine/play with 'swapping' - explained later. Now when you are recovering _tmeta - you could pipe thin_dump --repair | thin_restore from one LV to another. Once finished - you could swap this 'recovered' _tmeta LV into thin pool via: lvconvert --poolmetadata NewLV --thinpool ExistingThinPoolLV (see man lvconvert(8)). (You can swap any LV - activation of thin pool with incompatible metadata will fail or makes a horrible damage - so be careful here...). Can you confirm this is the issue - I guess this bug can be closed notabug. --- Additional comment from Corey Marthaler on 2013-06-04 19:12:14 EDT --- I'll rewrite the test case and see what happens. Thanks for the info! --- Additional comment from Corey Marthaler on 2013-06-05 14:58:06 EDT --- I can't seem to make this work. What am I doing wrong here, and why is lvm letting me do it wrong? :) [root@qalvm-01 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Devices POOL snapper_thinp twi-a-tz- 5.00g 6.10 POOL_tdata(0) [POOL_tdata] snapper_thinp Twi-aot-- 5.00g /dev/vdh1(0) [POOL_tmeta] snapper_thinp ewi-aot-- 8.00m /dev/vdd1(0) origin snapper_thinp Vwi-aotz- 1.00g POOL 30.27 other1 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 other2 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 other3 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 other4 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 other5 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 snap1 snapper_thinp Vwi-aotz- 1.00g POOL origin 15.52 [root@qalvm-01 ~]# df -h Filesystem Size Used Avail Use% Mounted on [...] /dev/mapper/snapper_thinp-origin 1014M 332M 683M 33% /mnt/origin /dev/mapper/snapper_thinp-snap1 1014M 181M 834M 18% /mnt/snap1 [root@qalvm-01 ~]# dmsetup ls snapper_thinp-origin (253:6) snapper_thinp-POOL (253:5) snapper_thinp-snap1 (253:12) snapper_thinp-other5 (253:11) snapper_thinp-other4 (253:10) snapper_thinp-other3 (253:9) snapper_thinp-POOL-tpool (253:4) snapper_thinp-POOL_tdata (253:3) snapper_thinp-other2 (253:8) snapper_thinp-POOL_tmeta (253:2) snapper_thinp-other1 (253:7) [root@qalvm-01 ~]# thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump_1.408.19872 [root@qalvm-01 ~]# umount /mnt/* [root@qalvm-01 ~]# vgchange -an snapper_thinp 0 logical volume(s) in volume group "snapper_thinp" now active [root@qalvm-01 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Devices POOL snapper_thinp twi---tz- 5.00g POOL_tdata(0) [POOL_tdata] snapper_thinp Twi---t-- 5.00g /dev/vdh1(0) [POOL_tmeta] snapper_thinp ewi---t-- 8.00m /dev/vdd1(0) origin snapper_thinp Vwi---tz- 1.00g POOL other1 snapper_thinp Vwi---tz- 1.00g POOL other2 snapper_thinp Vwi---tz- 1.00g POOL other3 snapper_thinp Vwi---tz- 1.00g POOL other4 snapper_thinp Vwi---tz- 1.00g POOL other5 snapper_thinp Vwi---tz- 1.00g POOL snap1 snapper_thinp Vwi---tz- 1.00g POOL origin [root@qalvm-01 ~]# dmsetup ls [root@qalvm-01 ~]# lvcreate -n meta -L 8M snapper_thinp Logical volume "meta" created [root@qalvm-01 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Devices POOL snapper_thinp twi---tz- 5.00g POOL_tdata(0) [POOL_tdata] snapper_thinp Twi---t-- 5.00g /dev/vdh1(0) [POOL_tmeta] snapper_thinp ewi---t-- 8.00m /dev/vdd1(0) meta snapper_thinp -wi-a---- 8.00m /dev/vdh1(1280) origin snapper_thinp Vwi---tz- 1.00g POOL other1 snapper_thinp Vwi---tz- 1.00g POOL other2 snapper_thinp Vwi---tz- 1.00g POOL other3 snapper_thinp Vwi---tz- 1.00g POOL other4 snapper_thinp Vwi---tz- 1.00g POOL other5 snapper_thinp Vwi---tz- 1.00g POOL snap1 snapper_thinp Vwi---tz- 1.00g POOL origin [root@qalvm-01 ~]# thin_restore -i /tmp/snapper_thinp_dump_1.408.19872 -o /dev/snapper_thinp/meta superblock 0 flags 0 blocknr 0 transaction id 0 data mapping root 7 details root 6 data block size 128 metadata block size 4096 metadata nr blocks 2048 [root@qalvm-01 ~]# lvconvert --poolmetadata /dev/snapper_thinp/meta --thinpool /dev/snapper_thinp/POOL Do you want to swap metadata of snapper_thinp/POOL pool with volume snapper_thinp/meta? [y/n]: y Thin pool transaction_id=0, while expected: 6. Unable to deactivate open snapper_thinp-POOL_tmeta (253:2) Unable to deactivate open snapper_thinp-POOL_tdata (253:3) Failed to deactivate snapper_thinp-POOL-tpool Failed to activate pool logical volume snapper_thinp/POOL. Device snapper_thinp-POOL_tdata (253:3) is used by another device. Failed to deactivate pool data logical volume. Device snapper_thinp-POOL_tmeta (253:2) is used by another device. Failed to deactivate pool metadata logical volume. [root@qalvm-01 ~]# dmsetup ls snapper_thinp-POOL (253:5) snapper_thinp-POOL-tpool (253:4) snapper_thinp-POOL_tdata (253:3) snapper_thinp-POOL_tmeta (253:2) [root@qalvm-01 ~]# lvchange -an snapper_thinp [root@qalvm-01 ~]# lvchange -ay snapper_thinp Thin pool transaction_id=0, while expected: 6. Unable to deactivate open snapper_thinp-POOL_tmeta (253:2) Unable to deactivate open snapper_thinp-POOL_tdata (253:3) Failed to deactivate snapper_thinp-POOL-tpool device-mapper: reload ioctl on failed: No data available device-mapper: reload ioctl on failed: No data available device-mapper: reload ioctl on failed: No data available device-mapper: reload ioctl on failed: No data available device-mapper: reload ioctl on failed: No data available device-mapper: reload ioctl on failed: No data available device-mapper: reload ioctl on failed: No data available [root@qalvm-01 ~]# lvconvert --poolmetadata /dev/snapper_thinp/meta --thinpool /dev/snapper_thinp/POOL Do you want to swap metadata of snapper_thinp/POOL pool with volume snapper_thinp/meta? [y/n]: y Converted snapper_thinp/POOL to thin pool. # the convert above supposedly worked, but the tmeta device is still on /dev/vdd1, and not on /dev/vdh1? [root@qalvm-01 ~]# lvs -a -o +devices dm_report_object: report function failed for field data_percent dm_report_object: report function failed for field data_percent dm_report_object: report function failed for field data_percent dm_report_object: report function failed for field data_percent dm_report_object: report function failed for field data_percent dm_report_object: report function failed for field data_percent dm_report_object: report function failed for field data_percent One or more specified logical volume(s) not found. LV VG Attr LSize Pool Origin Data% Devices POOL snapper_thinp twi-a-tz- 5.00g 6.10 POOL_tdata(0) [POOL_tdata] snapper_thinp Twi-aot-- 5.00g /dev/vdh1(0) [POOL_tmeta] snapper_thinp ewi-aot-- 8.00m /dev/vdd1(0) meta snapper_thinp -wi------ 8.00m /dev/vdh1(1280) origin snapper_thinp Vwi-d-tz- 1.00g POOL other1 snapper_thinp Vwi-d-tz- 1.00g POOL other2 snapper_thinp Vwi-d-tz- 1.00g POOL other3 snapper_thinp Vwi-d-tz- 1.00g POOL other4 snapper_thinp Vwi-d-tz- 1.00g POOL other5 snapper_thinp Vwi-d-tz- 1.00g POOL snap1 snapper_thinp Vwi-d-tz- 1.00g POOL origin --- Additional comment from Zdenek Kabelac on 2013-06-05 16:16:57 EDT --- (In reply to Corey Marthaler from comment #3) > I can't seem to make this work. What am I doing wrong here, and why is lvm > letting me do it wrong? :) In some cases I don't know ;) but in general - the 'intelligent' recovery is yet to be written - so far, we have rather helping tools for allowing 'assisted' recovery - and this exposes some 'dangerous to modify' internals of thin pool. > > [root@qalvm-01 ~]# lvs -a -o +devices > LV VG Attr LSize Pool Origin Data% Devices > POOL snapper_thinp twi-a-tz- 5.00g 6.10 > POOL_tdata(0) > [POOL_tdata] snapper_thinp Twi-aot-- 5.00g /dev/vdh1(0) > [POOL_tmeta] snapper_thinp ewi-aot-- 8.00m /dev/vdd1(0) > origin snapper_thinp Vwi-aotz- 1.00g POOL 30.27 > other1 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 > other2 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 > other3 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 > other4 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 > other5 snapper_thinp Vwi-a-tz- 1.00g POOL 0.00 > snap1 snapper_thinp Vwi-aotz- 1.00g POOL origin 15.52 > > [root@qalvm-01 ~]# thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > > /tmp/snapper_thinp_dump_1.408.19872 Running thin_dump on 'live' thin pool isn't really a good idea, especially if there is something causing changes to the content of metadata. The easiest way to access metadata - is to deactivate thin pool, swap _tmeta with some temporary volume - and active this swapped LV as normal LV and read data from this place. > [root@qalvm-01 ~]# lvconvert --poolmetadata /dev/snapper_thinp/meta > --thinpool /dev/snapper_thinp/POOL > Do you want to swap metadata of snapper_thinp/POOL pool with volume > snapper_thinp/meta? [y/n]: y > Thin pool transaction_id=0, while expected: 6. > Unable to deactivate open snapper_thinp-POOL_tmeta (253:2) > Unable to deactivate open snapper_thinp-POOL_tdata (253:3) > Failed to deactivate snapper_thinp-POOL-tpool > Failed to activate pool logical volume snapper_thinp/POOL. > Device snapper_thinp-POOL_tdata (253:3) is used by another device. > Failed to deactivate pool data logical volume. > Device snapper_thinp-POOL_tmeta (253:2) is used by another device. > Failed to deactivate pool metadata logical volume. hmmm - and this is interesting - your recovered data on meta volume are not in 'sync' with lvm metadata. Kernel dm-thin data have transaction_id 0 - but lvm2 metadata expects 6. (Intelligent recovery here would have number of ways to resolve this issue) You must select which version needs to be fixed to match - either kernel data or lvm data. But specifically in this case it looks your thin_restore simple recovered 'empty' data (looking at output of it) So I guess it's related to your thin_dump of live thin pool device. If you looks into /tmp/snapper_thinp_dump_1.408.19872 you should see there listed at least some devices i.e.: <superblock uuid="" time="1" transaction="3" data_block_size="256" nr_data_blocks="80"> <device dev_id="1" mapped_blocks="0" transaction="0" creation_time="0" snap_time="1"> </device> <device dev_id="2" mapped_blocks="0" transaction="1" creation_time="1" snap_time="1"> </device> <device dev_id="3" mapped_blocks="0" transaction="2" creation_time="1" snap_time="1"> </device> </superblock> But I think in your case it will be empty. > [root@qalvm-01 ~]# lvchange -an snapper_thinp > [root@qalvm-01 ~]# lvchange -ay snapper_thinp > Thin pool transaction_id=0, while expected: 6. > Unable to deactivate open snapper_thinp-POOL_tmeta (253:2) > Unable to deactivate open snapper_thinp-POOL_tdata (253:3) > Failed to deactivate snapper_thinp-POOL-tpool > device-mapper: reload ioctl on failed: No data available > device-mapper: reload ioctl on failed: No data available > device-mapper: reload ioctl on failed: No data available > device-mapper: reload ioctl on failed: No data available > device-mapper: reload ioctl on failed: No data available > device-mapper: reload ioctl on failed: No data available > device-mapper: reload ioctl on failed: No data available Yes in this case now the 'dmsetup' commands needs to step in. There is serious incompatibility error between 'kernel' and 'lvm' data - and needs clever recovery. > > [root@qalvm-01 ~]# lvconvert --poolmetadata /dev/snapper_thinp/meta > --thinpool /dev/snapper_thinp/POOL > Do you want to swap metadata of snapper_thinp/POOL pool with volume > snapper_thinp/meta? [y/n]: y > Converted snapper_thinp/POOL to thin pool. Yes - you've put valid old metadata back. > [root@qalvm-01 ~]# lvs -a -o +devices > dm_report_object: report function failed for field data_percent > dm_report_object: report function failed for field data_percent > dm_report_object: report function failed for field data_percent > dm_report_object: report function failed for field data_percent > dm_report_object: report function failed for field data_percent > dm_report_object: report function failed for field data_percent > dm_report_object: report function failed for field data_percent > One or more specified logical volume(s) not found. > LV VG Attr LSize Pool Origin Data% Devices I believe this has been already fixed upstream - percent reporting missed check for active thin lv presence of queried device. --- Additional comment from Corey Marthaler on 2013-06-11 18:04:30 EDT --- > The easiest way to access metadata - is to deactivate thin pool, > swap _tmeta with some temporary volume - and active this swapped > LV as normal LV and read data from this place. Swapping the meta device doesn't currently work (See bug 973419 - thin pool mda device swapping doesn't work).
So, given all the included discussion, what is this 6.5 bug *actually* asking to be fixed in 6.5?
So is this a case of thin_check and thin_restore proceeding blindly without checking whether or not the metadata is in use and warning first of the dangers? By analogy, what does fsck do if run on a mounted filesystem?
So except for the 'usability' of thin-repair utility itself (see bug #1019217) there is a user oriented 'way' to repair thin-pool metadata devices. lvconvert --repair vg/poolname Before its use the repaired thinpool device must be inactive. It will use 'recovery' _pmspare device to create a new 'repaired' device. It will then 'swap' this repaired device (wherever it's located) back into a thinpool. The original 'bad' metadata will appear in the VG as poolname_tmeta0 (or any bigger free digit) for further analysis in case of problems. Another new pool metadata spare _pmspare volume is again allocated. There are couple surrouding WARNING messages for a user - i.e.: WARNING: If everything works, remove "@PREFIX@vg/pool_tmeta0". WARNING: Use pvmove command to move "@PREFIX@vg/pool_tmeta" on the best fitting PV. I'll add couple more patches to enable usage of swapping for manual repair operation. For lvconvert --repair functionality there should not be needed any more patches.
Bug #970798 Comment 6 shows the patches that resolves the swapping issue. https://www.redhat.com/archives/lvm-devel/2013-October/msg00050.html https://www.redhat.com/archives/lvm-devel/2013-October/msg00053.html Usage of swapping needs to be well documented. Preferred way for repair is lvconvert --repair vgname/poolname.
This fix is needed to avoid possibility of swapping metadata device for active thin pool which is used by active thin volumes. Patch adds missing check to ensure thin-pool volume is not active during pool metadata device swapping.
*** Bug 1006062 has been marked as a duplicate of this bug. ***
h probably requiring its own new bug. ISSUE 1. The restore case continues to fail if the POOL is inactive (like it's supposed to be) but appears to work fine if the POOL remains active through out the whole process. DEACTIVE AND CORRUPTED POOL RESTORE: Restoring /dev/mapper/snapper_thinp-POOL_tmeta using dumped file thin_restore -i /tmp/snapper_thinp_dump_1.5583.28170 -o /dev/mapper/snapper_thinp-POOL_tmeta transaction_manager::new_block() couldn't allocate new block ACTIVE AND CORRUPTED POOL RESTORE: Restoring /dev/mapper/snapper_thinp-POOL_tmeta using dumped file thin_restore -i /tmp/snapper_thinp_dump_15.9095.862 -o /dev/mapper/snapper_thinp-POOL_tmeta Verifying that pool meta device is no longer corrupt thin_check /dev/mapper/snapper_thinp-POOL_tmeta examining superblock examining devices tree examining mapping tree (* And this continues to work fine over and over each time i re-corrupt it *) ISSUE 2. The removal of thin snapshot volumes continues to fail even after sucessful swap and repair cases. [root@harding-02 ~]# lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL Converted snapper_thinp/POOL to thin pool. [root@harding-02 ~]# lvs -a -o +devices LV VG Attr LSize Pool Origin Devices POOL snapper_thinp twi---t--- 1.00g POOL_tdata(0) [POOL_tdata] snapper_thinp Twi------- 1.00g /dev/sdb3(1) [POOL_tmeta] snapper_thinp ewi------- 8.00m /dev/sdb3(257) [lvol0_pmspare] snapper_thinp ewi------- 4.00m /dev/sdb3(0) origin snapper_thinp Vwi---t--- 1.00g POOL other1 snapper_thinp Vwi---t--- 1.00g POOL other2 snapper_thinp Vwi---t--- 1.00g POOL other3 snapper_thinp Vwi---t--- 1.00g POOL other4 snapper_thinp Vwi---t--- 1.00g POOL other5 snapper_thinp Vwi---t--- 1.00g POOL snap1 snapper_thinp Vwi---t--k 1.00g POOL origin [root@harding-02 ~]# lvremove snapper_thinp/tmeta_snap1 Thin pool transaction_id=0, while expected: 7. Unable to deactivate open snapper_thinp-POOL_tmeta (253:2) Unable to deactivate open snapper_thinp-POOL_tdata (253:3) Failed to deactivate snapper_thinp-POOL-tpool Failed to update thin pool POOL. ISSUE 3. This is pretty minor, a repair attempt after a swap could use a better error message when it fails. thin_check /dev/mapper/snapper_thinp-POOL_tmeta examining superblock superblock is corrupt bad checksum in superblock WARNING: Integrity check of metadata for thin pool snapper_thinp/POOL failed. Swap in new _tmeta device lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL [root@harding-02 ~]# lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL Converted snapper_thinp/POOL to thin pool. [root@harding-02 ~]# lvconvert --repair snapper_thinp/POOL Internal error: Missing allocatable pvs. ISSUE 4. Again minor, when dumping mda from the tmeta device, the pool volume has to be active. This may not even be an issue, but I thought I was told in one of these bugs to attempt to have the pool inactive for all thin_* cmds.
Cut comment from above comment #11: With a bit more testing I'll probably feel confident enough to mark the basic swap and repair case verified as they now work for me. However, there are still issues, each probably requiring its own new bug.
The thin_repair certainly needs a new version of device-mapper-persistent-data package (2.8 ?) (bug #1019217) There are problems with current version 2.7 which doesn't correctly detect spacemap corruptions. Destruction/removal of damaged pool needs more work and thinking. Currently it's somewhat obscure. If the thin-pool is broken - there probably no other way then to remove metadata by hand via vgcfgbackup/restore - since the code will persist on removing each individual thin-volume from the pool first before removing whole pool. This can't succeed if the pool is damage and there is no way to force this process to move on currently. This will need a new BZ to handle this case in some more usable way.
(In reply to Zdenek Kabelac from comment #13) > The thin_repair certainly needs a new version of > device-mapper-persistent-data package (2.8 ?) (bug #1019217) Then once the new dmpd build is in, we should do an lvm2 blocker bug for requirement for this new version in lvm2 package!
*** Bug 1006065 has been marked as a duplicate of this bug. ***
With the caveats listed in comment #16, this bug can be marked verified as the basic corrupt and swap case does now work. 2.6.32-410.el6.x86_64 lvm2-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 lvm2-libs-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 lvm2-cluster-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 udev-147-2.50.el6 BUILT: Fri Oct 11 05:58:10 CDT 2013 device-mapper-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 device-mapper-libs-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 device-mapper-event-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 device-mapper-event-libs-1.02.79-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 cmirror-2.02.100-7.el6 BUILT: Wed Oct 23 10:19:11 CDT 2013 ============================================================ Iteration 10 of 10 started at Thu Oct 24 05:57:05 CDT 2013 ============================================================ SCENARIO - [swap_deactive_thin_pool_meta_device_w_linear] Swap a _tmeta device with newly created linear LV while pool is deactivated Making origin volume lvcreate --thinpool POOL --zero n -L 1G snapper_thinp Sanity checking pool device metadata (thin_check /dev/mapper/snapper_thinp-POOL_tmeta) examining superblock examining devices tree examining mapping tree lvcreate --virtualsize 1G -T snapper_thinp/POOL -n origin lvcreate -V 1G -T snapper_thinp/POOL -n other1 lvcreate -V 1G -T snapper_thinp/POOL -n other2 lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other3 lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other4 lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other5 Making snapshot of origin volume lvcreate -K -s /dev/snapper_thinp/origin -n snap Create new device to swap in as the new _tmeta device Dumping current pool metadata to /tmp/snapper_thinp_dump.8009.16462 thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump.8009.16462 Current tmeta device: /dev/sdc2 Restoring valid mda to new device thin_restore -i /tmp/snapper_thinp_dump.8009.16462 -o /dev/snapper_thinp/newtmeta Corrupting pool meta device (/dev/mapper/snapper_thinp-POOL_tmeta) dd if=/dev/zero of=/dev/mapper/snapper_thinp-POOL_tmeta count=1 1+0 records in 1+0 records out 512 bytes (512 B) copied, 0.00417424 s, 123 kB/s Verifying that pool meta device is now corrupt thin_check /dev/mapper/snapper_thinp-POOL_tmeta examining superblock superblock is corrupt bad checksum in superblock Swap in new _tmeta device lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL New swapped tmeta device: /dev/sdb3 lvremove snapper_thinp/newtmeta Removing volume snapper_thinp/snap lvremove -f /dev/snapper_thinp/snap Removing thin origin and other virtual thin volumes Removing thinpool snapper_thinp/POOL
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1704.html