Bug 1596313

Summary: [GSS] Corrupted LVM logical volume
Product: Red Hat Enterprise Linux 7 Reporter: Cal Calhoun <ccalhoun>
Component: device-mapper-persistent-dataAssignee: Joe Thornber <thornber>
Status: CLOSED ERRATA QA Contact: Jakub Krysl <jkrysl>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.5CC: agk, akaiser, bkunal, bmarzins, cmarthal, heinzm, jbrassow, jpittman, loberman, lvm-team, mcsontos, msnitzer, nravinas, prajnoha, rhandlin, thornber, zkabelac
Target Milestone: rcKeywords: Rebase
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: device-mapper-persistent-data-0.8.1-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 13:17:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1614151    
Bug Blocks: 1577173    
Attachments:
Description Flags
repaired metadata none

Comment 6 Joe Thornber 2018-06-29 14:35:33 UTC
Have you tried running thin_repair as it suggested?

Comment 7 John Pittman 2018-06-29 14:44:24 UTC
Joe, they ran 'lvconvert --repair' against the volume, then tried to activate the volumes afterwards, but could not.  The meta0 volume was created, so we've attached that here in hopes manual repair is possible.

Or are you saying they should have run thin_repair in addition to the 'lvconvert --repair'?

Comment 9 Anderson Kaiser 2018-07-03 11:15:17 UTC
// EMT //

Since this case is escalated for a while and customer is pushing us I am setting the "Customer Escalation = Yes" into this BZ and for now I can see that we have a NEEDINFO set.

--
Anderson Kaiser
Escalation Manager

Comment 19 Ben Marzinski 2018-07-20 22:32:37 UTC
Looking at the data here, the superblock appears to be fine.  There is something wrong with the device details root node pointed to by the superblock. Its header doesn't look corrupted

csum: 2423521636
flags: 2
blocknr: 2121
nr_entries: 206
max_entries: 252
value_size: 8
padding: 0

however it doesn't look like it's holding device detail entries.  Those are 24 bytes in size instead of 8.  The first 8 keys look like this:

keys: 301910, 301911, 301912, 301913, 301914, 301915, 301916, 301917

And the corresponding values look like this:

values: 0x0000006063040000 0x0000006163040000 0x0000006363040000 0x0000006563040000 0x0000006663040000 0x0000006263040000 0x0000006463040000  0x0000006763040000

These look a lot more like block_time values. There are two blocks in the metadata, #2265 and #2272, that do look like device_details root nodes. I'm not sure yet if these are old nodes.

The data mapping root block pointed to by the superblock is, on the other hand, completely corrupted. Right now, I'm not sure if this is simply a bad superblock, or if there really is corruption across multiple data structures in the metadata device.

Comment 20 Ben Marzinski 2018-07-23 21:02:33 UTC
I was mistaken in my last comment. The block that is listed as the data mapping root node in the superblock is not corrupted. It just isn't a btree node. It's a valid bitmap block. In fact, all of the blocks with a non-zero block number/checksum have a valid checksum for some type of block. However, I don't believe that there should be any way for the the blocks being pointed to by the superblock to change type without the superblock being updated. First they would have to be freed, and that shouldn't happen until after the superblock is updated to no longer point to them.

Comment 28 Jonathan Earl Brassow 2019-04-29 14:03:04 UTC
Our upstream code seems now to be able to repair the metadata given in this case.  We feel confident that this issue is resolved with the latest changes we have made.

Comment 29 Joe Thornber 2019-04-29 14:05:13 UTC
Created attachment 1559932 [details]
repaired metadata

0.8 release of thin_dump produces this output.

Comment 32 Marian Csontos 2019-05-02 13:18:00 UTC
Given there were many improvements in the thin tools, and backporting is as likely to break things as to improve them, so this will be a rebase.

Comment 34 Jakub Krysl 2019-06-27 14:35:05 UTC
Testing found no issues with:
device-mapper-1.02.158-2.el7.x86_64
device-mapper-persistent-data-0.8.5-1.el7.x86_64
kernel-3.10.0-1058.el7.x86_64

Comment 36 errata-xmlrpc 2019-08-06 13:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2320