Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1688451

Summary:

Thinpool metadata of OCS corrupted

Product:

Red Hat Enterprise Linux 7

Reporter:

Alex Wang <alex.wang>

Component:

lvm2

Assignee:

Joe Thornber <thornber>

lvm2 sub component:

Thin Provisioning

QA Contact:

cluster-qe <cluster-qe>

Status:

CLOSED NOTABUG

Docs Contact:

Severity:

high

Priority:

unspecified

CC:

agk, akaiser, amark, bmarzins, bmr, cmarthal, heinzm, jbrassow, jmagrini, jraju, loberman, mpatocka, msnitzer, mszczewski, nkshirsa, prajnoha, revers, sankarshan, tcarlin, thornber, zkabelac

Version:

7.5

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-07-17 15:01:36 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
one of the metadata from a corrupted volume	none
infra-1: tmeta mountinfo lvmdump	none

Description Alex Wang 2019-03-13 18:05:10 UTC

Created attachment 1543717 [details]
one of the metadata from a corrupted volume

This bug is opened at the request of Nikhil to analyse and repair corrupted metadata from a OCS 3.11 volume.  Customer is seeing this issue on multiple volumes and standard repair and swap method has failed.

Comment 4 Zdenek Kabelac 2019-03-15 13:35:08 UTC

The metadata even with latest tools just tell 'bad checksum in metadata index block' - so requires Joe's eye.

However - how the state was reached?

Taking a look at attached logs - it does look like the system started from Mar 6, 17:50 experience many
"Medium errors" and various iscsi failures  and there is tremendous amount of multipath failures.

So far this BZ does not provide any information about layout of device (even lvm2 VG metadata are missing, and those are mandatory for manual try of metadata repair - so please attach).

But if the failing storage was meant to be used for thin-provisioning metadata device - this can cause very unusual and hard to recover state.

Comment 17 Yaniv Kaul 2019-03-20 18:24:21 UTC

Also note the version of lvm2 - 2.02.177(2)-RHEL7 - a bit old?
See https://access.redhat.com/downloads/content/rhel---7/x86_64/2456/lvm2/2.02.180-10.el7_6.3/x86_64/fd431d51/package-changelog -

Comment 18 Jonathan Earl Brassow 2019-03-20 19:32:53 UTC

Trying to distill a couple great questions that I've seen asked but have not yet been answered:

comment 10: Why is multipathd adding the <_tdata> as a map?  We are also getting mpath read errors - are they related?

comment 3: Why do we only see 'vgs' and 'vgdisplay' in the LVM metadata archives?  We seem to be jumping from one state to another (comment 11).

comment 12: It is odd that the affected pools come from the virtio devices - is this exclusively so?

Also, what's with all the multipath strangeness?

Comment 25 Jonathan Earl Brassow 2019-03-20 22:40:19 UTC

Restating interesting, unresolved questions:

comment 10: Why is multipathd adding the <_tdata> as a map?  We are also getting mpath read errors - are they related? [Answered in comment 21 and comment 22]

comment 3: Why do we only see 'vgs' and 'vgdisplay' in the LVM metadata archives?  We seem to be jumping from one state to another (comment 11).

comment 12: It is odd that the affected pools come from the virtio devices - is this exclusively so?

comment 22: Is there a way to distinguish between multipath medium errors and connection errors? (and why are we seeing these errors?)

comment 24: Are there multipath medium/connection errors affecting the other 15 of 16 clusters that didn't have a problem?

Comment 26 Mike Snitzer 2019-03-21 00:22:46 UTC

(In reply to Jonathan Earl Brassow from comment #25)
> Restating interesting, unresolved questions:
> 
> comment 10: Why is multipathd adding the <_tdata> as a map?  We are also
> getting mpath read errors - are they related? [Answered in comment 21 and
> comment 22]

Having been pointed to comment#10, this observation struck me as interesting:
[gluster] "bricks are not mounted on the node but there are plenty of containers"

The reason for that would seem obvious to me: no bricks were active because gluster's bricks weren't able to be activated due to thinp corruption.

> comment 12: It is odd that the affected pools come from the virtio devices -
> is this exclusively so?

Aren't virtio devices common for the upper layers? (comment#23 seems to say KVM guests are in use, and the arch overview diagram shows virtio devices.. btw that arch diagram is woefully lacking, it doesn't even show what layers are in play, e.g. no mention of dm-multipath.. yet gluster's bricks are apparently thinp layered on dm-multipath.

Think we need Gluster people to give us more context for how they are using thinp (is gluster managing dm-multipath or is it customer that setup multipath for the gluster bricks?)

> comment 22: Is there a way to distinguish between multipath medium errors
> and connection errors? (and why are we seeing these errors?)

As Ben said in comment#22, dm-multipath treats a medium error (BLK_STS_MEDIUM) as not retryable:

static inline bool blk_path_error(blk_status_t error)
{
        switch (error) {
        case BLK_STS_NOTSUPP:
        case BLK_STS_NOSPC:
        case BLK_STS_TARGET:
        case BLK_STS_NEXUS:
        case BLK_STS_MEDIUM:
        case BLK_STS_PROTECTION:
                return false;
        }

        /* Anything else could be a path failure, so should be retried */
        return true;
}

If DM thinp gets an error it will pass that error up the IO stack.  If the dm-thin-pool metadata gets an error it'll switch into fail IO mode.
I've not looked at the logs yet but at the time of the failure do the logs show any dm thin-pool metadata IO mode transitions?  Or messages of the form: "%s: aborting current metadata transaction", dev_name ?

Comment 27 Mike Snitzer 2019-03-21 01:01:09 UTC

Comment#11 shows that thin-pool data and metadata is ontop of the same virtio device (e.g. /dev/vdd).

Who created these virtio devices?  Which IO mode are they configured to use over the virtio protocol?  What is the KVM host's backing storage for these virtio devices?  Are we certain proper flushing occurs before virtio completes IO (what caching mode is virtio using?  is it passing through the underlying host storage's FLUSH/FUA flags?)

Comment 28 Jonathan Earl Brassow 2019-03-21 01:30:23 UTC

(In reply to Jonathan Earl Brassow from comment #25)
> 
> comment 3: Why do we only see 'vgs' and 'vgdisplay' in the LVM metadata
> archives?  We seem to be jumping from one state to another (comment 11).

Because heketi is controlling the creation of the volumes and it does:
       cmd == "lvcreate -qq --autobackup=<probably='n'>...
From the man page:
       -A|--autobackup y|n
              Specifies if metadata should be backed up automatically after a change.  Enabling this is strongly advised! See vgcfgbackup(8) for more information.

Comment 34 Jon Magrini 2019-03-21 21:39:14 UTC

Created attachment 1546711 [details]
infra-1: tmeta mountinfo lvmdump

infra-1-tmeta.tar.bz2 contains a dd of tmeta, output of /proc/mountinfo, and also a lvmdump -am

Comment 58 Red Hat Bugzilla 2023-09-15 00:16:12 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days