Bug 1302454 - thin pool meta device can only be corrupted and repaired once [NEEDINFO]
thin pool meta device can only be corrupted and repaired once
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-persistent-data (Show other bugs)
6.8
x86_64 Linux
unspecified Severity medium
: rc
: ---
Assigned To: Peter Rajnoha
Bruno Goncalves
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-27 18:19 EST by Corey Marthaler
Modified: 2016-05-10 21:12 EDT (History)
12 users (show)

See Also:
Fixed In Version: device-mapper-persistent-data-0.6.2-0.1.rc5.el6
Doc Type: Bug Fix
Doc Text:
Intra-release bug, no documentation needed.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-10 21:12:40 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
cmarthal: needinfo? (thornber)


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2016-01-27 18:19:01 EST
Description of problem:
In 7.2, you can corrupt and then repair the metadata device over and over. In 6.8 however, it appears you can only do it once.

SCENARIO - [swap_inactive_thin_pool_meta_device_using_lvconvert]
Swap _tmeta devices with newly created volumes while pool is inactive multiple times
Making pool volume
lvcreate  --thinpool POOL --profile thin-performance --zero n -L 1G --poolmetadatasize 4M snapper_thinp

Making origin volume
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n origin
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other1
  WARNING: Sum of all thin volume sizes (2.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  -V 1G -T snapper_thinp/POOL -n other2
  WARNING: Sum of all thin volume sizes (3.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other3
  WARNING: Sum of all thin volume sizes (4.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other4
  WARNING: Sum of all thin volume sizes (5.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other5
  WARNING: Sum of all thin volume sizes (6.00 GiB) exceeds the size of thin pool snapper_thinp/POOL (1.00 GiB)!
Making snapshot of origin volume
lvcreate  -k n -s /dev/snapper_thinp/origin -n snap


*** Swap corrupt pool metadata iteration 1 ***
Current tmeta device: /dev/sdf1
Corrupting pool meta device (/dev/mapper/snapper_thinp-POOL_tmeta)
dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 seek=4096 bs=1
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00347229 s, 147 kB/s

Sanity checking pool device (POOL) metadata
examining superblock
examining devices tree
examining mapping tree
checking space map counts
bad checksum in space map bitmap
meta data appears corrupt
  Check of pool snapper_thinp/POOL failed (status:1). Manual repair required!
couldn't reactivate all volumes associated with pool device

Swap in new _tmeta device using lvconvert --repair
lvconvert --yes --repair snapper_thinp/POOL /dev/sdb1
  WARNING: Sum of all thin volume sizes (7.00 GiB) exceeds the size of thin pools (1.00 GiB)!
  WARNING: If everything works, remove "snapper_thinp/POOL_meta0".
  WARNING: Use pvmove command to move "snapper_thinp/POOL_tmeta" on the best fitting PV.

New swapped tmeta device: /dev/sda1
Sanity checking pool device (POOL) metadata
  WARNING: Sum of all thin volume sizes (7.00 GiB) exceeds the size of thin pools (1.00 GiB)!
thin_check /dev/mapper/snapper_thinp-meta_swap
examining superblock
examining devices tree
examining mapping tree
checking space map counts

Removing snap volume snapper_thinp/POOL_meta0
lvremove -f /dev/snapper_thinp/POOL_meta0


*** Swap corrupt pool metadata iteration 2 ***
Current tmeta device: /dev/sda1
Corrupting pool meta device (/dev/mapper/snapper_thinp-POOL_tmeta)
dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 seek=4096 bs=1
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00321505 s, 159 kB/s

Sanity checking pool device (POOL) metadata
  WARNING: Sum of all thin volume sizes (7.00 GiB) exceeds the size of thin pools (1.00 GiB)!
thin_check /dev/mapper/snapper_thinp-meta_swap
examining superblock
examining devices tree
examining mapping tree
checking space map counts
bad checksum in metadata index block
meta data appears corrupt
  Check of pool snapper_thinp/POOL failed (status:1). Manual repair required!
couldn't reactivate all volumes associated with pool device

Swap in new _tmeta device using lvconvert --repair
lvconvert --yes --repair snapper_thinp/POOL /dev/sdd1
bad checksum in metadata index block
  Repair of thin metadata volume of thin pool snapper_thinp/POOL failed (status:1). Manual repair required!
lvconvert repair failed


[root@host-116 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool Origin Data%  Meta% Devices
  POOL            snapper_thinp twi---t---   1.00g                          POOL_tdata(0)
  [POOL_tdata]    snapper_thinp Twi-------   1.00g                          /dev/sda1(1)
  [POOL_tmeta]    snapper_thinp ewi-------   4.00m                          /dev/sda1(0)
  [lvol1_pmspare] snapper_thinp ewi-------   4.00m                          /dev/sdb1(0)
  origin          snapper_thinp Vwi---t---   1.00g POOL
  other1          snapper_thinp Vwi---t---   1.00g POOL
  other2          snapper_thinp Vwi---t---   1.00g POOL
  other3          snapper_thinp Vwi---t---   1.00g POOL
  other4          snapper_thinp Vwi---t---   1.00g POOL
  other5          snapper_thinp Vwi---t---   1.00g POOL
  snap            snapper_thinp Vwi---t---   1.00g POOL origin

[root@host-116 ~]# lvconvert --yes --repair snapper_thinp/POOL /dev/sdd1
bad checksum in metadata index block
  Repair of thin metadata volume of thin pool snapper_thinp/POOL failed (status:1). Manual repair required!


Version-Release number of selected component (if applicable):
2.6.32-604.el6.x86_64

lvm2-2.02.140-3.el6    BUILT: Thu Jan 21 05:40:10 CST 2016
lvm2-libs-2.02.140-3.el6    BUILT: Thu Jan 21 05:40:10 CST 2016
lvm2-cluster-2.02.140-3.el6    BUILT: Thu Jan 21 05:40:10 CST 2016
udev-147-2.66.el6    BUILT: Mon Jan 18 02:42:20 CST 2016
device-mapper-1.02.114-3.el6    BUILT: Thu Jan 21 05:40:10 CST 2016
device-mapper-libs-1.02.114-3.el6    BUILT: Thu Jan 21 05:40:10 CST 2016
device-mapper-event-1.02.114-3.el6    BUILT: Thu Jan 21 05:40:10 CST 2016
device-mapper-event-libs-1.02.114-3.el6    BUILT: Thu Jan 21 05:40:10 CST 2016
device-mapper-persistent-data-0.6.0-2.el6    BUILT: Thu Jan 21 02:40:25 CST 2016
cmirror-2.02.140-3.el6    BUILT: Thu Jan 21 05:40:10 CST 2016


How reproducible:
Everytime
Comment 1 Joe Thornber 2016-02-16 10:28:25 EST
This was a regression in the v0.6.x series.

Test case in dmtest:

https://github.com/jthornber/device-mapper-test-suite/commit/a4cee3b2844441cbd6f9b3f331242105f3b94299


Fix to thinp-tools:

https://github.com/jthornber/thin-provisioning-tools/commit/2815aeace9df510814c2e5d78b3a2ef398440501

In v0.6.2-rc2 onwards
Comment 3 Corey Marthaler 2016-02-24 17:58:17 EST
The corruption doesn't appear to be detected by thin_check now the second time. Let me know if I'm doing something wrong here, but the test still fails for me, just not in the same way. Corruption isn't detected (and as a result of that I'm assuming) the repair to a new device doesn't happen.


# Second iteration of this test...

*** Swap corrupt pool metadata iteration 2 ***
Current tmeta device: /dev/sdb1
Corrupting pool meta device (/dev/mapper/snapper_thinp-POOL_tmeta)
dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 seek=4096 bs=1
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00182352 s, 281 kB/s

Sanity checking pool device (POOL) metadata
lvcreate -L 4M -n meta_swap snapper_thinp
  WARNING: Sum of all thin volume sizes (7.25 GiB) exceeds the size of thin pools (1.00 GiB)!
lvconvert --yes --thinpool snapper_thinp/POOL --poolmetadata snapper_thinp/meta_swap
lvchange -ay snapper_thinp/meta_swap

### NOT FOUND TO BE CORRUPT
thin_check /dev/mapper/snapper_thinp-meta_swap
examining superblock
examining devices tree
examining mapping tree
checking space map counts
meta data NOT corrupt
lvchange -an snapper_thinp/meta_swap
lvconvert --yes --thinpool snapper_thinp/POOL --poolmetadata snapper_thinp/meta_swap
lvremove snapper_thinp/meta_swap
lvchange -ay --yes --select 'lv_name=POOL || pool_lv=POOL'

Swap in new _tmeta device using lvconvert --repair
lvconvert --yes --repair snapper_thinp/POOL /dev/sda1
  Only inactive pool can be repaired.

[root@host-118 ~]# vgchange -an snapper_thinp
  0 logical volume(s) in volume group "snapper_thinp" now active

[root@host-118 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool Origin Devices
  POOL            snapper_thinp twi---tz--   1.00g             POOL_tdata(0)
  [POOL_tdata]    snapper_thinp Twi-------   1.00g             /dev/sdb1(1)
  [POOL_tmeta]    snapper_thinp ewi-------   4.00m             /dev/sdb1(0)
  [lvol0_pmspare] snapper_thinp ewi-------   4.00m             /dev/sdb1(258)
  origin          snapper_thinp Vwi---tz--   1.00g POOL
  other1          snapper_thinp Vwi---tz--   1.00g POOL
  other2          snapper_thinp Vwi---tz--   1.00g POOL
  other3          snapper_thinp Vwi---tz--   1.00g POOL
  other4          snapper_thinp Vwi---tz--   1.00g POOL
  other5          snapper_thinp Vwi---tz--   1.00g POOL
  snap            snapper_thinp Vwi---tz--   1.25g POOL origin

[root@host-118 ~]# lvconvert --yes --repair snapper_thinp/POOL /dev/sda1
  WARNING: recovery of pools without pool metadata spare LV is not automated.
  WARNING: If everything works, remove "snapper_thinp/POOL_meta0".
  WARNING: Use pvmove command to move "snapper_thinp/POOL_tmeta" on the best fitting PV.


### Still on the same device sdb1
[root@host-118 ~]# lvs -a -o +devices
  LV           VG            Attr       LSize   Pool Origin Devices
  POOL         snapper_thinp twi---tz--   1.00g             POOL_tdata(0)
  POOL_meta0   snapper_thinp -wi-------   4.00m             /dev/sdb1(0)
  [POOL_tdata] snapper_thinp Twi-------   1.00g             /dev/sdb1(1)
  [POOL_tmeta] snapper_thinp ewi-------   4.00m             /dev/sdb1(258)
  origin       snapper_thinp Vwi---tz--   1.00g POOL
  other1       snapper_thinp Vwi---tz--   1.00g POOL
  other2       snapper_thinp Vwi---tz--   1.00g POOL
  other3       snapper_thinp Vwi---tz--   1.00g POOL
  other4       snapper_thinp Vwi---tz--   1.00g POOL
  other5       snapper_thinp Vwi---tz--   1.00g POOL
  snap         snapper_thinp Vwi---tz--   1.25g POOL origin
Comment 4 Zdenek Kabelac 2016-02-25 05:19:17 EST
I think the test is not very deterministic.

Metadata have some layout - so it depends on runtime where various blocks of stored btree will lend in sectors.

So if thin_repair fixes some initial corruption to a completely new device with new layout of btree blocks - it's then different 'data' set.

So applying exactly 'same' corruption might or might not hit some existing used btree block.

You may already also notice  'lvconvert --repair' is not doing cross validation between kernel and lvm2 metadata - so if kernel will lose some device - this will not be reflected in lvm2 metadata.

So lvm2 may reference DevicesID which has been 'erased' from kernel metadata - this will take some time until this all will work automatically - for now it's task for human operator to figure this it.

This could explain why your '2nd.' corruption trial ends with:
### NOT FOUND TO BE CORRUPT

As the data set after 1st. and possibly removing something - will be different
so your 'dd' will erase something possibly unused.

To get 'exactly' same corruption - you would need a tool - that would expose 'metadata' layout - so some 2nd. level of thin_dump  -  thin_dump of metadata layout - so i.e. you would know where all btree blocks related to DeviceID X are located so you could properly 'hit' something.

Something in that direction has one person in linux-lvm list where he create some extension to existing thin tools - so we will see how that could be integrated.

As of now - thin_repair has limited capabilities of repair - especially if you damage core pool blocks  (there are no backups like in extX fs)
Comment 5 Bruno Goncalves 2016-03-03 05:30:07 EST
It did pass on our tests with:
device-mapper-persistent-data-0.6.2-0.1.rc5.el6
kernel 2.6.32-621.el6
device-mapper-multipath-0.4.9-92.el6
lvm2-2.02.143-1.el6


# dmtest run --profile mytest --suite thin-provisioning -t /ToolsTests/
Loaded suite thin-provisioning
ToolsTests
  metadata_snap_stress1...PASS
  metadata_snap_stress2...iteration 0
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
iteration 6
iteration 7
iteration 8
iteration 9
PASS
  thin_dump_a_metadata_snap_of_an_active_pool...PASS
  thin_ls...PASS
  thin_repair_repeatable...PASS
  you_can_dump_a_metadata_snapshot...PASS
  you_cannot_dump_live_metadata...PASS
  you_cannot_run_thin_check_on_live_metadata...PASS
  you_cannot_run_thin_restore_on_a_live_metadata...PASS
Comment 10 Zdenek Kabelac 2016-03-12 13:54:52 EST
I start to think - as Joe writes new tool 'thin_generate_metadata' - maybe such tool could be enhanced with 'intelligent' 'error' placement into generated metadata ?

So then it would be possible to 'create' 'repairable' metadata in repeatable way.
Comment 12 Alasdair Kergon 2016-03-16 07:37:54 EDT
> dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 seek=4096 bs=1

Is this the 'corruption' every time?

You need to check that the area you are changing (count=512 seek=4096 bs=1) holds actual metadata and is not just unused space.  If it is unused, then it does not count as corruption.  If it is in use then you would hope the checksum covering that area changes and gets noticed.
Comment 13 Corey Marthaler 2016-03-16 18:58:18 EDT
Let's mark this bug verified for the case where an mda backup is taken prior to the corruption. In that case, repair appears to work over and over.



For the case where a back up isn't taken and 'lvconvert --repair' will be used (the original intent of this bz) there are still two remaining issues. 

1. a reliable way to find and corrupt the exact area where the mda resides for each iteration.

2. a bug to be filed about the fact that without lvmetad running, and with the mda verified to be corrupted, the repair will still fail on the first iteration (see comment #11).
Comment 14 Corey Marthaler 2016-03-21 17:18:09 EDT
Filed bug 1319937 for issue 2 listed above.
Comment 16 errata-xmlrpc 2016-05-10 21:12:40 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0960.html

Note You need to log in before you can comment on or make changes to this bug.