Bug 1007074

Summary: the pool *_tmeta device can only be corrupted and then restored once
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.5CC: agk, cmarthal, dwysocha, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, tlavigne, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: lvm2-2.02.100-6.el6 Doc Type: Enhancement
Doc Text:
Support to repair of corrupted thin pool metadata is needed when t-p-m gets broken. For this case user may try automated repair via 'lvconvert --repair vg/pool' or low-level manual repair. In this case user may swap t-p-m volume out of thin-pool LV via 'lvconvert --poolmetadata swapLV vg/pool' and run manual recover operation with the use of thin_check, thin_dump, thin_repair commands. Once the user has repaired t-p-m volume ready, he could swap back such volume back. Although this low-level repair should be only used when user is fully aware of thin-pool functionality.
Story Points: ---
Clone Of: 970798 Environment:
Last Closed: 2013-11-21 23:27:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 970798    
Bug Blocks: 1015096    

Description Corey Marthaler 2013-09-11 21:00:19 UTC
This exists in rhel6.5 as well.

+++ This bug was initially created as a clone of Bug #970798 +++

Description of problem:
I was trying to loop through the following: thin_dump, corrupt _tmeta, thin_check, thin_restore, and thin_check again; and found that those operation will only work once or twice. After that problems occur.

[root@qalvm-01 ~]# lvs -a -o +devices
  LV           VG            Attr      LSize  Pool Origin Data%  Devices
  POOL         snapper_thinp twi-a-tz-  5.00g               5.86 POOL_tdata(0)
  [POOL_tdata] snapper_thinp Twi-aot--  5.00g                    /dev/vdh1(0)
  [POOL_tmeta] snapper_thinp ewi-aot--  8.00m                    /dev/vdd1(0)
  origin       snapper_thinp Vwi-aotz-  1.00g POOL         29.05
  other1       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
  other2       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
  other3       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
  other4       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
  other5       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
  snap1        snapper_thinp Vwi-aotz-  1.00g POOL origin  14.66

[root@qalvm-01 ~]# df -h
Filesystem                        Size  Used Avail Use% Mounted on
/dev/mapper/snapper_thinp-origin 1014M  320M  695M  32% /mnt/origin
/dev/mapper/snapper_thinp-snap1  1014M  172M  843M  17% /mnt/snap1

[root@qalvm-01 ~]# thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump_1.61.9264
[root@qalvm-01 ~]# dd if=/dev/zero of=/dev/mapper/snapper_thinp-POOL_tmeta count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 2.6939e-05 s, 19.0 MB/s
[root@qalvm-01 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta
bad checksum in superblock
[root@qalvm-01 ~]# thin_restore -i /tmp/snapper_thinp_dump_1.61.9264 -o /dev/mapper/snapper_thinp-POOL_tmeta
superblock 0
flags 0
blocknr 0
transaction id 0
data mapping root 7
details root 6
data block size 128
metadata block size 4096
metadata nr blocks 2048
[root@qalvm-01 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta
[root@qalvm-01 ~]# sync


[root@qalvm-01 ~]# thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump_1.61.9265
[root@qalvm-01 ~]# dd if=/dev/zero of=/dev/mapper/snapper_thinp-POOL_tmeta count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 2.6454e-05 s, 19.4 MB/s
[root@qalvm-01 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta
bad checksum in superblock
[root@qalvm-01 ~]# thin_restore -i /tmp/snapper_thinp_dump_1.61.9264 -o /dev/mapper/snapper_thinp-POOL_tmeta
superblock 0
flags 0
blocknr 0
transaction id 0
data mapping root 7
details root 6
data block size 128
metadata block size 4096
metadata nr blocks 2048
[root@qalvm-01 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta
[root@qalvm-01 ~]# 
[root@qalvm-01 ~]# umount /mnt/*
[root@qalvm-01 ~]# vgchange -an snapper_thinp
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get state of mapped device
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get state of mapped device
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get state of mapped device
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get state of mapped device
[...]
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  0 logical volume(s) in volume group "snapper_thinp" now active


Jun  4 17:19:27 qalvm-01 kernel: [11658.571259] device-mapper: block manager: btree_node validator check failed for block 3
Jun  4 17:19:27 qalvm-01 kernel: [11658.572468] device-mapper: btree spine: node_check failed: csum 2848471990 != wanted 2848607790
Jun  4 17:19:27 qalvm-01 kernel: [11658.573669] device-mapper: block manager: btree_node validator check failed for block 3
Jun  4 17:19:27 qalvm-01 kernel: [11658.574887] device-mapper: btree spine: node_check failed: csum 2848471990 != wanted 2848607790
Jun  4 17:19:27 qalvm-01 kernel: [11658.576042] device-mapper: block manager: btree_node validator check failed for block 3
Jun  4 17:19:27 qalvm-01 kernel: [11658.577291] device-mapper: btree spine: node_check failed: csum 2848471990 != wanted 2848607790
Jun  4 17:19:27 qalvm-01 kernel: [11658.578528] device-mapper: block manager: btree_node validator check failed for block 3
Jun  4 17:19:27 qalvm-01 kernel: [11658.579898] device-mapper: btree spine: node_check failed: csum 2848471990 != wanted 2848607790


[root@qalvm-01 ~]# lvremove snapper_thinp
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get state of mapped device
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Failed to get state of mapped device
[...]
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Huge memory allocation (size 67108864) rejected - metadata corruption?
  Couldn't create ioctl argument.
  Unable to deactivate logical volume "snap1"


Version-Release number of selected component (if applicable):
3.8.0-0.40.el7.x86_64

lvm2-2.02.99-0.39.el7    BUILT: Wed May 29 08:12:36 CDT 2013
lvm2-libs-2.02.99-0.39.el7    BUILT: Wed May 29 08:12:36 CDT 2013
lvm2-cluster-2.02.99-0.39.el7    BUILT: Wed May 29 08:12:36 CDT 2013
device-mapper-1.02.78-0.39.el7    BUILT: Wed May 29 08:12:36 CDT 2013
device-mapper-libs-1.02.78-0.39.el7    BUILT: Wed May 29 08:12:36 CDT 2013
device-mapper-event-1.02.78-0.39.el7    BUILT: Wed May 29 08:12:36 CDT 2013
device-mapper-event-libs-1.02.78-0.39.el7    BUILT: Wed May 29 08:12:36 CDT 2013
cmirror-2.02.99-0.39.el7    BUILT: Wed May 29 08:12:36 CDT 2013

How reproducible:
Often

--- Additional comment from Zdenek Kabelac on 2013-06-04 19:02:05 EDT ---

I assume you've tried to use them with live  thin pool device - which is not how it's working.

If you are attempting to 'repair' metadata volume - you are doing it into a separate new LV (which should have at least the size of origin _tmeta device)
(2nd. note -this way you could offline resize _tmeta which is running out of free space)

So this is not a bug of thin tools - but rather lack of clear documentation dealing with this.

In general - it's prohibited to write anything to the _tmeta device if the dm thin target is using this device - it will lead to the cruel death of data location on such device.

The proper way for recovery goes along this path -

create a new empty LV with size of _tmeta.
Preferably keep thin pool device unused (when thin check detects error, you are left with active _tmeta, but not active pool - this is best case scenario for recovery)
The other way around is to combine/play with 'swapping' - explained later.

Now when you are recovering _tmeta -

you could pipe  thin_dump --repair | thin_restore  from one LV to another.

Once finished - you could swap this 'recovered' _tmeta LV into thin pool via:

lvconvert --poolmetadata NewLV  --thinpool ExistingThinPoolLV  (see man lvconvert(8)).

(You can swap any LV - activation of thin pool with incompatible metadata will fail or makes a horrible damage - so be careful here...).

Can you confirm this is the issue  - I guess this bug can be closed notabug.

--- Additional comment from Corey Marthaler on 2013-06-04 19:12:14 EDT ---

I'll rewrite the test case and see what happens. Thanks for the info!

--- Additional comment from Corey Marthaler on 2013-06-05 14:58:06 EDT ---

I can't seem to make this work. What am I doing wrong here, and why is lvm letting me do it wrong? :)

[root@qalvm-01 ~]# lvs -a -o +devices
  LV           VG            Attr      LSize  Pool Origin Data%  Devices
  POOL         snapper_thinp twi-a-tz-  5.00g               6.10 POOL_tdata(0)
  [POOL_tdata] snapper_thinp Twi-aot--  5.00g                    /dev/vdh1(0)
  [POOL_tmeta] snapper_thinp ewi-aot--  8.00m                    /dev/vdd1(0)
  origin       snapper_thinp Vwi-aotz-  1.00g POOL         30.27
  other1       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
  other2       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
  other3       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
  other4       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
  other5       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
  snap1        snapper_thinp Vwi-aotz-  1.00g POOL origin  15.52

[root@qalvm-01 ~]# df -h 
Filesystem                        Size  Used Avail Use% Mounted on
[...]
/dev/mapper/snapper_thinp-origin 1014M  332M  683M  33% /mnt/origin
/dev/mapper/snapper_thinp-snap1  1014M  181M  834M  18% /mnt/snap1

[root@qalvm-01 ~]# dmsetup ls
snapper_thinp-origin    (253:6)
snapper_thinp-POOL      (253:5)
snapper_thinp-snap1     (253:12)
snapper_thinp-other5    (253:11)
snapper_thinp-other4    (253:10)
snapper_thinp-other3    (253:9)
snapper_thinp-POOL-tpool        (253:4)
snapper_thinp-POOL_tdata        (253:3)
snapper_thinp-other2    (253:8)
snapper_thinp-POOL_tmeta        (253:2)
snapper_thinp-other1    (253:7)

[root@qalvm-01 ~]# thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump_1.408.19872

[root@qalvm-01 ~]# umount /mnt/*

[root@qalvm-01 ~]# vgchange -an snapper_thinp
  0 logical volume(s) in volume group "snapper_thinp" now active

[root@qalvm-01 ~]# lvs -a -o +devices
  LV           VG            Attr      LSize  Pool Origin Devices
  POOL         snapper_thinp twi---tz-  5.00g             POOL_tdata(0)
  [POOL_tdata] snapper_thinp Twi---t--  5.00g             /dev/vdh1(0)
  [POOL_tmeta] snapper_thinp ewi---t--  8.00m             /dev/vdd1(0)
  origin       snapper_thinp Vwi---tz-  1.00g POOL
  other1       snapper_thinp Vwi---tz-  1.00g POOL
  other2       snapper_thinp Vwi---tz-  1.00g POOL
  other3       snapper_thinp Vwi---tz-  1.00g POOL
  other4       snapper_thinp Vwi---tz-  1.00g POOL
  other5       snapper_thinp Vwi---tz-  1.00g POOL
  snap1        snapper_thinp Vwi---tz-  1.00g POOL origin

[root@qalvm-01 ~]# dmsetup ls

[root@qalvm-01 ~]# lvcreate -n meta -L 8M snapper_thinp
  Logical volume "meta" created

[root@qalvm-01 ~]# lvs -a -o +devices
  LV           VG            Attr      LSize  Pool Origin Devices
  POOL         snapper_thinp twi---tz-  5.00g             POOL_tdata(0)
  [POOL_tdata] snapper_thinp Twi---t--  5.00g             /dev/vdh1(0)
  [POOL_tmeta] snapper_thinp ewi---t--  8.00m             /dev/vdd1(0)
  meta         snapper_thinp -wi-a----  8.00m             /dev/vdh1(1280)
  origin       snapper_thinp Vwi---tz-  1.00g POOL
  other1       snapper_thinp Vwi---tz-  1.00g POOL
  other2       snapper_thinp Vwi---tz-  1.00g POOL
  other3       snapper_thinp Vwi---tz-  1.00g POOL
  other4       snapper_thinp Vwi---tz-  1.00g POOL
  other5       snapper_thinp Vwi---tz-  1.00g POOL
  snap1        snapper_thinp Vwi---tz-  1.00g POOL origin

[root@qalvm-01 ~]#  thin_restore -i /tmp/snapper_thinp_dump_1.408.19872 -o /dev/snapper_thinp/meta
superblock 0
flags 0
blocknr 0
transaction id 0
data mapping root 7
details root 6
data block size 128
metadata block size 4096
metadata nr blocks 2048

[root@qalvm-01 ~]# lvconvert --poolmetadata /dev/snapper_thinp/meta --thinpool /dev/snapper_thinp/POOL
Do you want to swap metadata of snapper_thinp/POOL pool with volume snapper_thinp/meta? [y/n]: y
  Thin pool transaction_id=0, while expected: 6.
  Unable to deactivate open snapper_thinp-POOL_tmeta (253:2)
  Unable to deactivate open snapper_thinp-POOL_tdata (253:3)
  Failed to deactivate snapper_thinp-POOL-tpool
  Failed to activate pool logical volume snapper_thinp/POOL.
  Device snapper_thinp-POOL_tdata (253:3) is used by another device.
  Failed to deactivate pool data logical volume.
  Device snapper_thinp-POOL_tmeta (253:2) is used by another device.
  Failed to deactivate pool metadata logical volume.

[root@qalvm-01 ~]# dmsetup ls
snapper_thinp-POOL      (253:5)
snapper_thinp-POOL-tpool        (253:4)
snapper_thinp-POOL_tdata        (253:3)
snapper_thinp-POOL_tmeta        (253:2)

[root@qalvm-01 ~]# lvchange -an snapper_thinp
[root@qalvm-01 ~]# lvchange -ay snapper_thinp
  Thin pool transaction_id=0, while expected: 6.
  Unable to deactivate open snapper_thinp-POOL_tmeta (253:2)
  Unable to deactivate open snapper_thinp-POOL_tdata (253:3)
  Failed to deactivate snapper_thinp-POOL-tpool
  device-mapper: reload ioctl on  failed: No data available
  device-mapper: reload ioctl on  failed: No data available
  device-mapper: reload ioctl on  failed: No data available
  device-mapper: reload ioctl on  failed: No data available
  device-mapper: reload ioctl on  failed: No data available
  device-mapper: reload ioctl on  failed: No data available
  device-mapper: reload ioctl on  failed: No data available

[root@qalvm-01 ~]#  lvconvert --poolmetadata /dev/snapper_thinp/meta --thinpool /dev/snapper_thinp/POOL
Do you want to swap metadata of snapper_thinp/POOL pool with volume snapper_thinp/meta? [y/n]: y
  Converted snapper_thinp/POOL to thin pool.

# the convert above supposedly worked, but the tmeta device is still on /dev/vdd1, and not on /dev/vdh1?
[root@qalvm-01 ~]# lvs -a -o +devices
  dm_report_object: report function failed for field data_percent
  dm_report_object: report function failed for field data_percent
  dm_report_object: report function failed for field data_percent
  dm_report_object: report function failed for field data_percent
  dm_report_object: report function failed for field data_percent
  dm_report_object: report function failed for field data_percent
  dm_report_object: report function failed for field data_percent
  One or more specified logical volume(s) not found.
  LV           VG            Attr      LSize  Pool Origin Data%  Devices
  POOL         snapper_thinp twi-a-tz-  5.00g               6.10 POOL_tdata(0)
  [POOL_tdata] snapper_thinp Twi-aot--  5.00g                    /dev/vdh1(0)
  [POOL_tmeta] snapper_thinp ewi-aot--  8.00m                    /dev/vdd1(0)
  meta         snapper_thinp -wi------  8.00m                    /dev/vdh1(1280)
  origin       snapper_thinp Vwi-d-tz-  1.00g POOL
  other1       snapper_thinp Vwi-d-tz-  1.00g POOL
  other2       snapper_thinp Vwi-d-tz-  1.00g POOL
  other3       snapper_thinp Vwi-d-tz-  1.00g POOL
  other4       snapper_thinp Vwi-d-tz-  1.00g POOL
  other5       snapper_thinp Vwi-d-tz-  1.00g POOL
  snap1        snapper_thinp Vwi-d-tz-  1.00g POOL origin

--- Additional comment from Zdenek Kabelac on 2013-06-05 16:16:57 EDT ---

(In reply to Corey Marthaler from comment #3)
> I can't seem to make this work. What am I doing wrong here, and why is lvm
> letting me do it wrong? :)

In some cases I don't know ;) but in general - the 'intelligent' recovery is yet to be written - so far, we have rather helping tools for allowing 'assisted' recovery - and this exposes some 'dangerous to modify' internals of thin pool.

> 
> [root@qalvm-01 ~]# lvs -a -o +devices
>   LV           VG            Attr      LSize  Pool Origin Data%  Devices
>   POOL         snapper_thinp twi-a-tz-  5.00g               6.10
> POOL_tdata(0)
>   [POOL_tdata] snapper_thinp Twi-aot--  5.00g                    /dev/vdh1(0)
>   [POOL_tmeta] snapper_thinp ewi-aot--  8.00m                    /dev/vdd1(0)
>   origin       snapper_thinp Vwi-aotz-  1.00g POOL         30.27
>   other1       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
>   other2       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
>   other3       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
>   other4       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
>   other5       snapper_thinp Vwi-a-tz-  1.00g POOL          0.00
>   snap1        snapper_thinp Vwi-aotz-  1.00g POOL origin  15.52
> 
> [root@qalvm-01 ~]# thin_dump /dev/mapper/snapper_thinp-POOL_tmeta >
> /tmp/snapper_thinp_dump_1.408.19872

Running thin_dump on 'live' thin pool isn't really a good idea,
especially if there is something causing changes to the content
of metadata.

The easiest way to access metadata - is to deactivate thin pool,
swap _tmeta with some temporary volume - and active this swapped
LV as normal LV and read data from this place.


> [root@qalvm-01 ~]# lvconvert --poolmetadata /dev/snapper_thinp/meta
> --thinpool /dev/snapper_thinp/POOL
> Do you want to swap metadata of snapper_thinp/POOL pool with volume
> snapper_thinp/meta? [y/n]: y
>   Thin pool transaction_id=0, while expected: 6.
>   Unable to deactivate open snapper_thinp-POOL_tmeta (253:2)
>   Unable to deactivate open snapper_thinp-POOL_tdata (253:3)
>   Failed to deactivate snapper_thinp-POOL-tpool
>   Failed to activate pool logical volume snapper_thinp/POOL.
>   Device snapper_thinp-POOL_tdata (253:3) is used by another device.
>   Failed to deactivate pool data logical volume.
>   Device snapper_thinp-POOL_tmeta (253:2) is used by another device.
>   Failed to deactivate pool metadata logical volume.

hmmm - and this is interesting - your recovered data on meta volume
are not in 'sync' with lvm metadata.
Kernel dm-thin data have transaction_id 0 - but lvm2 metadata expects 6.
(Intelligent recovery here would have number of ways to resolve this issue)
You must select which version needs to be fixed to match -
either kernel data or lvm data.

But specifically in this case it looks your thin_restore simple recovered 'empty' data (looking at output of it)
So I guess it's related to your thin_dump of live thin pool device.

If you looks into  /tmp/snapper_thinp_dump_1.408.19872
you should see there listed at least some devices i.e.:

<superblock uuid="" time="1" transaction="3" data_block_size="256" nr_data_blocks="80">
  <device dev_id="1" mapped_blocks="0" transaction="0" creation_time="0" snap_time="1">
  </device>
  <device dev_id="2" mapped_blocks="0" transaction="1" creation_time="1" snap_time="1">
  </device>
  <device dev_id="3" mapped_blocks="0" transaction="2" creation_time="1" snap_time="1">
  </device>
</superblock>

But I think in your case it will be empty.


> [root@qalvm-01 ~]# lvchange -an snapper_thinp
> [root@qalvm-01 ~]# lvchange -ay snapper_thinp
>   Thin pool transaction_id=0, while expected: 6.
>   Unable to deactivate open snapper_thinp-POOL_tmeta (253:2)
>   Unable to deactivate open snapper_thinp-POOL_tdata (253:3)
>   Failed to deactivate snapper_thinp-POOL-tpool
>   device-mapper: reload ioctl on  failed: No data available
>   device-mapper: reload ioctl on  failed: No data available
>   device-mapper: reload ioctl on  failed: No data available
>   device-mapper: reload ioctl on  failed: No data available
>   device-mapper: reload ioctl on  failed: No data available
>   device-mapper: reload ioctl on  failed: No data available
>   device-mapper: reload ioctl on  failed: No data available

Yes in this case now the 'dmsetup' commands needs to step in.
There is serious incompatibility error between 'kernel' and 'lvm'
data - and needs clever recovery.

> 
> [root@qalvm-01 ~]#  lvconvert --poolmetadata /dev/snapper_thinp/meta
> --thinpool /dev/snapper_thinp/POOL
> Do you want to swap metadata of snapper_thinp/POOL pool with volume
> snapper_thinp/meta? [y/n]: y
>   Converted snapper_thinp/POOL to thin pool.

Yes - you've put valid old metadata back.

> [root@qalvm-01 ~]# lvs -a -o +devices
>   dm_report_object: report function failed for field data_percent
>   dm_report_object: report function failed for field data_percent
>   dm_report_object: report function failed for field data_percent
>   dm_report_object: report function failed for field data_percent
>   dm_report_object: report function failed for field data_percent
>   dm_report_object: report function failed for field data_percent
>   dm_report_object: report function failed for field data_percent
>   One or more specified logical volume(s) not found.
>   LV           VG            Attr      LSize  Pool Origin Data%  Devices

I believe this has been already fixed upstream - percent reporting missed check for active thin lv presence of queried device.

--- Additional comment from Corey Marthaler on 2013-06-11 18:04:30 EDT ---

> The easiest way to access metadata - is to deactivate thin pool,
> swap _tmeta with some temporary volume - and active this swapped
> LV as normal LV and read data from this place.

Swapping the meta device doesn't currently work (See bug 973419 - thin pool mda device swapping doesn't work).

Comment 1 Alasdair Kergon 2013-10-07 22:03:01 UTC
So, given all the included discussion, what is this 6.5 bug *actually* asking to be fixed in 6.5?

Comment 5 Alasdair Kergon 2013-10-14 16:28:32 UTC
So is this a case of thin_check and thin_restore proceeding blindly without checking whether or not the metadata is in use and warning first of the dangers?

By analogy, what does fsck do if run on a mounted filesystem?

Comment 6 Zdenek Kabelac 2013-10-15 15:26:32 UTC
So except for the 'usability' of thin-repair utility itself (see bug #1019217) there is a user oriented 'way' to repair thin-pool metadata devices.

lvconvert --repair  vg/poolname 

Before its use the repaired thinpool device must be inactive.

It will use 'recovery' _pmspare device to create a new 'repaired' device.
It will then 'swap' this repaired device (wherever it's located) back into a thinpool.  The original 'bad' metadata will appear in the VG as  poolname_tmeta0 (or any bigger free digit) for further analysis in case of problems.
Another new pool metadata spare _pmspare volume is again allocated.

There are couple surrouding WARNING messages for a user - i.e.:

WARNING: If everything works, remove "@PREFIX@vg/pool_tmeta0".
WARNING: Use pvmove command to move "@PREFIX@vg/pool_tmeta" on the best fitting PV.


I'll add couple more patches to enable usage of swapping for manual repair operation.

For lvconvert --repair functionality there should not be needed any more patches.

Comment 7 Zdenek Kabelac 2013-10-16 11:07:21 UTC
Bug #970798  Comment 6   shows the patches that resolves the swapping issue.

https://www.redhat.com/archives/lvm-devel/2013-October/msg00050.html
https://www.redhat.com/archives/lvm-devel/2013-October/msg00053.html

Usage of swapping needs to be well documented.

Preferred way for repair is  lvconvert --repair vgname/poolname.

Comment 8 Zdenek Kabelac 2013-10-16 11:12:27 UTC
This fix is needed to avoid possibility of swapping metadata device for active thin pool which is used by active thin volumes.

Patch adds missing check to ensure thin-pool volume is not active during pool metadata device swapping.

Comment 9 Zdenek Kabelac 2013-10-16 12:17:11 UTC
*** Bug 1006062 has been marked as a duplicate of this bug. ***

Comment 11 Corey Marthaler 2013-10-16 23:16:39 UTC
h probably requiring its own new bug.

ISSUE 1. The restore case continues to fail if the POOL is inactive (like it's supposed to be) but appears to work fine if the POOL remains active through out the whole process.

DEACTIVE AND CORRUPTED POOL RESTORE:
Restoring /dev/mapper/snapper_thinp-POOL_tmeta using dumped file
thin_restore -i /tmp/snapper_thinp_dump_1.5583.28170 -o /dev/mapper/snapper_thinp-POOL_tmeta
transaction_manager::new_block() couldn't allocate new block


ACTIVE AND CORRUPTED POOL RESTORE:
Restoring /dev/mapper/snapper_thinp-POOL_tmeta using dumped file
thin_restore -i /tmp/snapper_thinp_dump_15.9095.862 -o /dev/mapper/snapper_thinp-POOL_tmeta
Verifying that pool meta device is no longer corrupt
thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
examining devices tree
examining mapping tree
(* And this continues to work fine over and over each time i re-corrupt it *)


ISSUE 2. The removal of thin snapshot volumes continues to fail even after sucessful swap and repair cases.

[root@harding-02 ~]# lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL
  Converted snapper_thinp/POOL to thin pool.
[root@harding-02 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize  Pool Origin Devices
  POOL            snapper_thinp twi---t---  1.00g             POOL_tdata(0)
  [POOL_tdata]    snapper_thinp Twi-------  1.00g             /dev/sdb3(1)
  [POOL_tmeta]    snapper_thinp ewi-------  8.00m             /dev/sdb3(257)
  [lvol0_pmspare] snapper_thinp ewi-------  4.00m             /dev/sdb3(0)
  origin          snapper_thinp Vwi---t---  1.00g POOL
  other1          snapper_thinp Vwi---t---  1.00g POOL
  other2          snapper_thinp Vwi---t---  1.00g POOL
  other3          snapper_thinp Vwi---t---  1.00g POOL
  other4          snapper_thinp Vwi---t---  1.00g POOL
  other5          snapper_thinp Vwi---t---  1.00g POOL
  snap1           snapper_thinp Vwi---t--k  1.00g POOL origin
[root@harding-02 ~]# lvremove snapper_thinp/tmeta_snap1
  Thin pool transaction_id=0, while expected: 7.
  Unable to deactivate open snapper_thinp-POOL_tmeta (253:2)
  Unable to deactivate open snapper_thinp-POOL_tdata (253:3)
  Failed to deactivate snapper_thinp-POOL-tpool
  Failed to update thin pool POOL.



ISSUE 3. This is pretty minor, a repair attempt after a swap could use a better error message when it fails.

thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
  superblock is corrupt
    bad checksum in superblock

  WARNING: Integrity check of metadata for thin pool snapper_thinp/POOL failed.
Swap in new _tmeta device
lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL
[root@harding-02 ~]# lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL
  Converted snapper_thinp/POOL to thin pool.

[root@harding-02 ~]# lvconvert --repair  snapper_thinp/POOL
  Internal error: Missing allocatable pvs.


ISSUE 4. Again minor, when dumping mda from the tmeta device, the pool volume has to be active. This may not even be an issue, but I thought I was told in one of these bugs to attempt to have the pool inactive for all thin_* cmds.

Comment 12 Corey Marthaler 2013-10-16 23:18:12 UTC
Cut comment from above comment #11:

With a bit more testing I'll probably feel confident enough to mark the basic swap and repair case verified as they now work for me. However, there are still issues, each probably requiring its own new bug.

Comment 13 Zdenek Kabelac 2013-10-17 07:57:43 UTC
The thin_repair certainly needs a new version of device-mapper-persistent-data package (2.8 ?)  (bug #1019217)
There are problems with current version 2.7 which doesn't correctly detect spacemap corruptions.

Destruction/removal of damaged pool needs more work and thinking.
Currently it's somewhat obscure.

If the thin-pool is broken - there probably no other way then to remove metadata by hand via  vgcfgbackup/restore -  since the code will persist on removing each individual thin-volume from the pool first before removing whole pool. This can't succeed if the pool is damage and there is no way to force this process to move on currently.  This will need a new BZ to handle this case in some more usable way.

Comment 14 Peter Rajnoha 2013-10-17 08:40:29 UTC
(In reply to Zdenek Kabelac from comment #13)
> The thin_repair certainly needs a new version of
> device-mapper-persistent-data package (2.8 ?)  (bug #1019217)

Then once the new dmpd build is in, we should do an lvm2 blocker bug for requirement for this new version in lvm2 package!

Comment 15 Zdenek Kabelac 2013-10-17 13:39:39 UTC
*** Bug 1006065 has been marked as a duplicate of this bug. ***

Comment 17 Corey Marthaler 2013-10-24 19:08:28 UTC
With the caveats listed in comment #16, this bug can be marked verified as the basic corrupt and swap case does now work.



2.6.32-410.el6.x86_64
lvm2-2.02.100-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
lvm2-libs-2.02.100-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
lvm2-cluster-2.02.100-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
udev-147-2.50.el6    BUILT: Fri Oct 11 05:58:10 CDT 2013
device-mapper-1.02.79-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
device-mapper-libs-1.02.79-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
device-mapper-event-1.02.79-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
device-mapper-event-libs-1.02.79-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013
cmirror-2.02.100-7.el6    BUILT: Wed Oct 23 10:19:11 CDT 2013



============================================================
Iteration 10 of 10 started at Thu Oct 24 05:57:05 CDT 2013
============================================================
SCENARIO - [swap_deactive_thin_pool_meta_device_w_linear]
Swap a _tmeta device with newly created linear LV while pool is deactivated
Making origin volume
lvcreate --thinpool POOL --zero n -L 1G snapper_thinp
Sanity checking pool device metadata
(thin_check /dev/mapper/snapper_thinp-POOL_tmeta)
examining superblock
examining devices tree
examining mapping tree
lvcreate --virtualsize 1G -T snapper_thinp/POOL -n origin
lvcreate -V 1G -T snapper_thinp/POOL -n other1
lvcreate -V 1G -T snapper_thinp/POOL -n other2
lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other3
lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other4
lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other5
Making snapshot of origin volume
lvcreate -K -s /dev/snapper_thinp/origin -n snap
Create new device to swap in as the new _tmeta device
Dumping current pool metadata to /tmp/snapper_thinp_dump.8009.16462
thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump.8009.16462

Current tmeta device: /dev/sdc2
Restoring valid mda to new device
thin_restore -i /tmp/snapper_thinp_dump.8009.16462 -o /dev/snapper_thinp/newtmeta
Corrupting pool meta device (/dev/mapper/snapper_thinp-POOL_tmeta)
dd if=/dev/zero of=/dev/mapper/snapper_thinp-POOL_tmeta count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.00417424 s, 123 kB/s
Verifying that pool meta device is now corrupt
thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
  superblock is corrupt
    bad checksum in superblock

Swap in new _tmeta device
lvconvert --yes --poolmetadata snapper_thinp/newtmeta --thinpool snapper_thinp/POOL
New swapped tmeta device: /dev/sdb3
lvremove snapper_thinp/newtmeta
Removing volume snapper_thinp/snap
lvremove -f /dev/snapper_thinp/snap
Removing thin origin and other virtual thin volumes
Removing thinpool snapper_thinp/POOL

Comment 18 errata-xmlrpc 2013-11-21 23:27:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1704.html