Bug 1029170

Summary: request for clarification: does lvm auto repair its corrupted pool tmeta devices
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: Zdenek Kabelac <zkabelac>
Status: CLOSED NOTABUG QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.5CC: agk, dwysocha, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-28 13:26:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2013-11-11 20:17:43 UTC
Description of problem:
I'm attempting to create more test cases for thin_check and thin_repair/thin_restore. However, I'm having a hard time finding a way to reliably corrupt the tmeta device. It appears that lvm is either automatically repairing itself or it's not being truly corrupted in the first place. I've tried with both lvmetad on and off; and both with and with out a spare poolmetadata device. I've also tried corrupting less then 512 bytes, but when I do that it never seems to show up as corrupt. 


SCENARIO - [recover_corrupt_pool_tmeta_device]
Create a snapshot, corrupt it's pool metadata (_tmeta) device, and then restore it using thin_restore
Making origin volume
lvcreate --thinpool POOL --zero n --poolmetadataspare n -L 1G snapper_thinp
  WARNING: recovery of pools without pool metadata spare LV is not automated.

Sanity checking pool device metadata
(thin_check /dev/mapper/snapper_thinp-POOL_tmeta)
examining superblock
examining devices tree
examining mapping tree

lvcreate --virtualsize 1G -T snapper_thinp/POOL -n origin
lvcreate -V 1G -T snapper_thinp/POOL -n other1
lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other2
lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other3
lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other4
lvcreate --virtualsize 1G -T snapper_thinp/POOL -n other5
Making snapshot of origin volume
lvcreate -K -s /dev/snapper_thinp/origin -n restore

Dumping current pool metadata to /tmp/snapper_thinp_dump.2180.30275
thin_dump /dev/mapper/snapper_thinp-POOL_tmeta > /tmp/snapper_thinp_dump.2180.30275

Corrupting pool meta device (/dev/mapper/snapper_thinp-POOL_tmeta)
dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 bs=1
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00308854 s, 166 kB/s

[root@harding-03 ~]# lvs -a -o +devices
  LV           VG            Attr       LSize  Pool Origin Data%  Devices
  POOL         snapper_thinp twi-a-t---  1.00g               0.04 POOL_tdata(0)
  [POOL_tdata] snapper_thinp Twi-ao----  1.00g                    /dev/sdb3(0)
  [POOL_tmeta] snapper_thinp ewi-ao----  4.00m                    /dev/sdc2(0)
  origin       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  other1       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  other2       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  other3       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  other4       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  other5       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  restore      snapper_thinp Vwi-a-t--k  1.00g POOL origin   0.01

[root@harding-03 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
  superblock is corrupt
    bad checksum in superblock

# And now it's automatically fixed?
[root@harding-03 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
examining devices tree
examining mapping tree


# Corrupt it again
[root@harding-03 ~]# dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 bs=1
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00279119 s, 183 kB/s

# Corrupt
[root@harding-03 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
  superblock is corrupt
    bad checksum in superblock

# Corrupt
[root@harding-03 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
  superblock is corrupt
    bad checksum in superblock

# Not Corrupt (and all I've done is run this cmd)
[root@harding-03 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
examining devices tree
examining mapping tree


# Corrupt it again
[root@harding-03 ~]# dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 bs=1
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00279241 s, 183 kB/s

# Run a sync
[root@harding-03 ~]# sync

# Not Corrupt
[root@harding-03 ~]# thin_check /dev/mapper/snapper_thinp-POOL_tmeta
examining superblock
examining devices tree
examining mapping tree

[root@harding-03 ~]# lvs -a -o +devices
  LV           VG            Attr       LSize  Pool Origin Data%  Devices
  POOL         snapper_thinp twi-a-t---  1.00g               0.04 POOL_tdata(0)
  [POOL_tdata] snapper_thinp Twi-ao----  1.00g                    /dev/sdb3(0)
  [POOL_tmeta] snapper_thinp ewi-ao----  4.00m                    /dev/sdc2(0)
  origin       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  other1       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  other2       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  other3       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  other4       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  other5       snapper_thinp Vwi-a-t---  1.00g POOL          0.01
  restore      snapper_thinp Vwi-a-t--k  1.00g POOL origin   0.01



Version-Release number of selected component (if applicable):
2.6.32-424.el6.x86_64
lvm2-2.02.100-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
lvm2-libs-2.02.100-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
lvm2-cluster-2.02.100-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
udev-147-2.50.el6    BUILT: Fri Oct 11 05:58:10 CDT 2013
device-mapper-1.02.79-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
device-mapper-libs-1.02.79-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
device-mapper-event-1.02.79-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
device-mapper-event-libs-1.02.79-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013
cmirror-2.02.100-8.el6    BUILT: Wed Oct 30 03:10:56 CDT 2013


How reproducible:
Everytime

Comment 2 RHEL Program Management 2013-11-14 21:06:23 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 3 Zdenek Kabelac 2014-01-28 13:26:41 UTC
This test is not correct.

thin_check is not supposed to be executed on live (active thin pool) metadata volume. Now we have Bug 1023828 (and Bug 1038387), where thin_check should warn before such use.  As of current state of tools -  lvm2 by default only detects error during activation (and deactivation) via thin_check tool.

To actively fix metadata, user should use: lvconvert --repair vg/pool