Bug 966776

Summary:

pool create attempt failure: Thin pool transaction_id=7, while expected: 18446744073709551615

Product:

Red Hat Enterprise Linux 7

Reporter:

Corey Marthaler <cmarthal>

Component:

lvm2

Assignee:

LVM and device-mapper development team <lvm-team>

lvm2 sub component:

Default / Unclassified

QA Contact:

cluster-qe <cluster-qe>

Status:

CLOSED WORKSFORME

Docs Contact:

Severity:

high

Priority:

high

CC:

agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac

Version:

7.0

Keywords:

Triaged

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-10-09 09:35:15 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
pool create attempt	none

Description Corey Marthaler 2013-05-23 22:55:56 UTC

Created attachment 752417 [details]
pool create attempt

Description of problem:
While doing thin_restore testing I've seen the pool create attempts fail from time to time.

lvcreate --thinpool POOL -L 1G snapper_thinp
  Thin pool transaction_id=7, while expected: 18446744073709551615.
  Unable to deactivate open snapper_thinp-POOL_tmeta (253:2)
  Unable to deactivate open snapper_thinp-POOL_tdata (253:3)
  Failed to deactivate snapper_thinp-POOL-tpool
  Aborting. Failed to activate thin POOL.

Oddly, this appears to happen only when the VG contains 4 PVs, and not when it contains 3 or 5 PVs, and never appears to happen on the first iteration of the test, so I wonder if something is being left on disk from a prior corruption and restoration attempt? I'll attach the verbose output from a failed command.


Version-Release number of selected component (if applicable):
3.8.0-0.40.el7.x86_64

lvm2-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
lvm2-libs-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
lvm2-cluster-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-libs-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-event-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-event-libs-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
cmirror-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013


How reproducible:
Often

Comment 1 Zdenek Kabelac 2013-05-24 07:54:34 UTC

Hmmm - this more or less looks like initialization of thin pool metadata was wrong.

From the attachment trace:

#metadata/lv_manip.c:4314     Clearing start of logical volume "POOL"
#device/dev-cache.c:332         /dev/snapper_thinp/POOL: Added to device cache
#device/dev-io.c:526         Opened /dev/snapper_thinp/POOL RW O_DIRECT
#device/dev-io.c:766         Wiping /dev/snapper_thinp/POOL at sector 0 length 8 sectors
#device/dev-io.c:137         /dev/snapper_thinp/POOL: block size is 4096 bytes
#device/dev-io.c:579         Closed /dev/snapper_thinp/POOL

metadata header should have zeroed the first 4KB of data in partition - before it's passed into thinpool target construction.

id: 18446744073709551615 is clearly 0xffffffffffffffff

which should be the valid id for the initial start - but 7 is being returned.

Since you've mentioned it's only with exact configuration of PV - isn't possible you get PV data from some 'already' running thin pool metadata LV.
i.e. some bad caching of data device for virtual storage ?
Or is this happen with real-hw?


Anyway - Mike do you have any idea how that could happen ?

Comment 2 Corey Marthaler 2013-05-30 22:35:56 UTC

It's my understanding (which is pretty limited when it come to virt storage) that with the <shareable/> element in the storage definition, it shouldn't be using cache.

Comment 3 Zdenek Kabelac 2013-09-03 08:38:28 UTC

Do you still hit this bug?
Or was this possibly some storage issue?

Comment 4 Zdenek Kabelac 2013-10-09 09:35:15 UTC

Closing as unreproducible with assumption of storage fault.
In case the problem somehow reoccurs, please reopen this bug, and we will need closer inspection on the hw.

Comment 5 Zdenek Kabelac 2014-02-24 20:56:53 UTC

Maybe it might be related to this bugfix:

https://www.redhat.com/archives/lvm-devel/2014-January/msg00056.html