Bug 1319937

Summary: pool created without zeroing the first 4KiB (--zero n) can not have meta corrupted and then repaired
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: device-mapper-persistent-dataAssignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED ERRATA QA Contact: Bruno Goncalves <bgoncalv>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.8CC: agk, bgoncalv, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, tlavigne, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: device-mapper-persistent-data-0.6.2-0.1.rc7.el6 Doc Type: No Doc Update
Doc Text:
Intra-release bug, no documentation needed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-11 01:13:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
step by step reproducer none

Description Corey Marthaler 2016-03-21 21:16:54 UTC
Description of problem:
This is the second issue raised in comment #13 of bug 1302454.

It appears that meta corruption and repair doesn't work if the pool is created with '--zero n'.



### No lvm2-lvmetad, pool created w/ --zero y ###

[root@host-113 ~]# service lvm2-lvmetad status
lvmetad is stopped

[...]

============================================================
Iteration 10 of 10 started at Mon Mar 21 15:19:29 CDT 2016
============================================================
SCENARIO - [swap_inactive_thin_pool_meta_device_using_lvconvert]
Swap _tmeta devices with newly created volumes while pool is inactive multiple times
Making pool volume
lvcreate  --thinpool POOL -L 1G --profile thin-performance --zero y --poolmetadatasize 4M snapper_thinp

Making origin volume
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n origin
Making snapshot of origin volume
lvcreate  -k n -s /dev/snapper_thinp/origin -n snap


*** Swap corrupt pool metadata iteration 1 ***
Current tmeta device: /dev/sdb1
Corrupting pool meta device (/dev/mapper/snapper_thinp-POOL_tmeta)
dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 seek=4096 bs=1
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00324429 s, 158 kB/s

Sanity checking pool device (POOL) metadata
  WARNING: Sum of all thin volume sizes (2.00 GiB) exceeds the size of thin pools (1.00 GiB)!
thin_check /dev/mapper/snapper_thinp-meta_swap
examining superblock
examining devices tree
examining mapping tree
  thin device 1 is missing mappings [0, -]
    bad checksum in btree node (block 1)
  thin device 2 is missing mappings [0, -]
    bad checksum in btree node (block 1)

  Check of pool snapper_thinp/POOL failed (status:1). Manual repair required!
couldn't reactivate all volumes associated with pool device

Swap in new _tmeta device using lvconvert --repair
lvconvert --yes --repair snapper_thinp/POOL /dev/sdc1
  WARNING: recovery of pools without pool metadata spare LV is not automated.
  WARNING: If everything works, remove "snapper_thinp/POOL_meta0".
  WARNING: Use pvmove command to move "snapper_thinp/POOL_tmeta" on the best fitting PV.

New swapped tmeta device: /dev/sda1
vgchange -ay snapper_thinp
Sanity checking pool device (POOL) metadata
  WARNING: Sum of all thin volume sizes (2.00 GiB) exceeds the size of thin pools (1.00 GiB)!
  WARNING: Sum of all thin volume sizes (2.00 GiB) exceeds the size of thin pools (1.00 GiB)!
thin_check /dev/mapper/snapper_thinp-meta_swap
examining superblock
examining devices tree
examining mapping tree
checking space map counts

Removing snap volume snapper_thinp/POOL_meta0
lvremove -f /dev/snapper_thinp/POOL_meta0


Removing snap volume snapper_thinp/snap
lvremove -f /dev/snapper_thinp/snap
Removing thin origin and other virtual thin volumes
Removing thinpool snapper_thinp/POOL





### Running lvm2-lvmetad, pool created w/ --zero n ###

[root@host-116 ~]# service lvm2-lvmetad status
lvmetad (pid  6213) is running...


============================================================
Iteration 1 of 10 started at Mon Mar 21 16:06:13 CDT 2016
============================================================
SCENARIO - [swap_inactive_thin_pool_meta_device_using_lvconvert]
Swap _tmeta devices with newly created volumes while pool is inactive multiple times
Making pool volume
lvcreate  --thinpool POOL -L 1G --profile thin-performance --zero n --poolmetadatasize 4M snapper_thinp

Making origin volume
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n origin
Making snapshot of origin volume
lvcreate  -k n -s /dev/snapper_thinp/origin -n snap


*** Swap corrupt pool metadata iteration 1 ***
Current tmeta device: /dev/sdf1
Corrupting pool meta device (/dev/mapper/snapper_thinp-POOL_tmeta)
dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 seek=4096 bs=1
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00342302 s, 150 kB/s

Sanity checking pool device (POOL) metadata
thin_check /dev/mapper/snapper_thinp-meta_swap
examining superblock
examining devices tree
examining mapping tree
checking space map counts
bad checksum in space map bitmap

  Check of pool snapper_thinp/POOL failed (status:1). Manual repair required!
couldn't reactivate all volumes associated with pool device

Swap in new _tmeta device using lvconvert --repair
lvconvert --yes --repair snapper_thinp/POOL /dev/sdb1
  WARNING: recovery of pools without pool metadata spare LV is not automated.
  WARNING: If everything works, remove "snapper_thinp/POOL_meta0".
  WARNING: Use pvmove command to move "snapper_thinp/POOL_tmeta" on the best fitting PV.

New swapped tmeta device: /dev/sde1
vgchange -ay snapper_thinp
  Check of pool snapper_thinp/POOL failed (status:1). Manual repair required!
VG activation failed




Version-Release number of selected component (if applicable):
2.6.32-633.el6.x86_64

lvm2-2.02.143-2.el6    BUILT: Wed Mar 16 08:30:42 CDT 2016
lvm2-libs-2.02.143-2.el6    BUILT: Wed Mar 16 08:30:42 CDT 2016
lvm2-cluster-2.02.143-2.el6    BUILT: Wed Mar 16 08:30:42 CDT 2016
udev-147-2.72.el6    BUILT: Tue Mar  1 06:14:05 CST 2016
device-mapper-1.02.117-2.el6    BUILT: Wed Mar 16 08:30:42 CDT 2016
device-mapper-libs-1.02.117-2.el6    BUILT: Wed Mar 16 08:30:42 CDT 2016
device-mapper-event-1.02.117-2.el6    BUILT: Wed Mar 16 08:30:42 CDT 2016
device-mapper-event-libs-1.02.117-2.el6    BUILT: Wed Mar 16 08:30:42 CDT 2016
device-mapper-persistent-data-0.6.2-0.1.rc5.el6    BUILT: Wed Feb 24 07:07:09 CST 2016
cmirror-2.02.143-2.el6    BUILT: Wed Mar 16 08:30:42 CDT 2016

Comment 3 Corey Marthaler 2016-03-22 17:19:31 UTC
Fix verified in the latest rpms. Same test case now runs fine.


device-mapper-persistent-data-0.6.2-0.1.rc7.el6    BUILT: Tue Mar 22 08:58:09 CDT 2016



============================================================
Iteration 10 of 10 started at Tue Mar 22 12:13:05 CDT 2016
============================================================
SCENARIO - [swap_inactive_thin_pool_meta_device_using_lvconvert]
Swap _tmeta devices with newly created volumes while pool is inactive multiple times
Making pool volume
lvcreate  --thinpool POOL -L 1G  --zero n --poolmetadatasize 4M snapper_thinp

Sanity checking pool device (POOL) metadata
thin_check /dev/mapper/snapper_thinp-meta_swap
examining superblock
examining devices tree
examining mapping tree
checking space map counts

Making origin volume
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n origin
lvcreate  -V 1G -T snapper_thinp/POOL -n other1
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other2
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other3
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other4
lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n other5
Making snapshot of origin volume
lvcreate  -k n -s /dev/snapper_thinp/origin -n snap


*** Swap corrupt pool metadata iteration 1 ***
Current tmeta device: /dev/sda1
Corrupting pool meta device (/dev/mapper/snapper_thinp-POOL_tmeta)
dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 seek=4096 bs=1
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.0032798 s, 156 kB/s

Sanity checking pool device (POOL) metadata
thin_check /dev/mapper/snapper_thinp-meta_swap
examining superblock
examining devices tree
examining mapping tree
checking space map counts
bad checksum in space map bitmap
meta data appears corrupt
  Check of pool snapper_thinp/POOL failed (status:1). Manual repair required!
couldn't reactivate all volumes associated with pool device

Swap in new _tmeta device using lvconvert --repair
lvconvert --yes --repair snapper_thinp/POOL /dev/sdb1
  WARNING: recovery of pools without pool metadata spare LV is not automated.
  WARNING: If everything works, remove "snapper_thinp/POOL_meta0".
  WARNING: Use pvmove command to move "snapper_thinp/POOL_tmeta" on the best fitting PV.

New swapped tmeta device: /dev/sdc1
Sanity checking pool device (POOL) metadata
thin_check /dev/mapper/snapper_thinp-meta_swap
examining superblock
examining devices tree
examining mapping tree
checking space map counts

Removing snap volume snapper_thinp/POOL_meta0
lvremove -f /dev/snapper_thinp/POOL_meta0


Removing snap volume snapper_thinp/snap
lvremove -f /dev/snapper_thinp/snap
Removing thin origin and other virtual thin volumes
Removing thinpool snapper_thinp/POOL

Comment 4 Corey Marthaler 2016-04-05 15:22:46 UTC
Created attachment 1143865 [details]
step by step reproducer

Comment 6 errata-xmlrpc 2016-05-11 01:13:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0960.html