Bug 966776

Summary: pool create attempt failure: Thin pool transaction_id=7, while expected: 18446744073709551615
Product: Red Hat Enterprise Linux 7 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
lvm2 sub component: Default / Unclassified QA Contact: cluster-qe <cluster-qe>
Status: CLOSED WORKSFORME Docs Contact:
Severity: high    
Priority: high CC: agk, cmarthal, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac
Version: 7.0Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-09 09:35:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
pool create attempt none

Description Corey Marthaler 2013-05-23 22:55:56 UTC
Created attachment 752417 [details]
pool create attempt

Description of problem:
While doing thin_restore testing I've seen the pool create attempts fail from time to time.

lvcreate --thinpool POOL -L 1G snapper_thinp
  Thin pool transaction_id=7, while expected: 18446744073709551615.
  Unable to deactivate open snapper_thinp-POOL_tmeta (253:2)
  Unable to deactivate open snapper_thinp-POOL_tdata (253:3)
  Failed to deactivate snapper_thinp-POOL-tpool
  Aborting. Failed to activate thin POOL.

Oddly, this appears to happen only when the VG contains 4 PVs, and not when it contains 3 or 5 PVs, and never appears to happen on the first iteration of the test, so I wonder if something is being left on disk from a prior corruption and restoration attempt? I'll attach the verbose output from a failed command.


Version-Release number of selected component (if applicable):
3.8.0-0.40.el7.x86_64

lvm2-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
lvm2-libs-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
lvm2-cluster-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-libs-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-event-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-event-libs-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
cmirror-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013


How reproducible:
Often

Comment 1 Zdenek Kabelac 2013-05-24 07:54:34 UTC
Hmmm - this more or less looks like initialization of thin pool metadata was wrong.

From the attachment trace:

#metadata/lv_manip.c:4314     Clearing start of logical volume "POOL"
#device/dev-cache.c:332         /dev/snapper_thinp/POOL: Added to device cache
#device/dev-io.c:526         Opened /dev/snapper_thinp/POOL RW O_DIRECT
#device/dev-io.c:766         Wiping /dev/snapper_thinp/POOL at sector 0 length 8 sectors
#device/dev-io.c:137         /dev/snapper_thinp/POOL: block size is 4096 bytes
#device/dev-io.c:579         Closed /dev/snapper_thinp/POOL

metadata header should have zeroed the first 4KB of data in partition - before it's passed into thinpool target construction.

id: 18446744073709551615 is clearly 0xffffffffffffffff

which should be the valid id for the initial start - but 7 is being returned.

Since you've mentioned it's only with exact configuration of PV - isn't possible you get PV data from some 'already' running thin pool metadata LV.
i.e. some bad caching of data device for virtual storage ?
Or is this happen with real-hw?


Anyway - Mike do you have any idea how that could happen ?

Comment 2 Corey Marthaler 2013-05-30 22:35:56 UTC
It's my understanding (which is pretty limited when it come to virt storage) that with the <shareable/> element in the storage definition, it shouldn't be using cache.

Comment 3 Zdenek Kabelac 2013-09-03 08:38:28 UTC
Do you still hit this bug?
Or was this possibly some storage issue?

Comment 4 Zdenek Kabelac 2013-10-09 09:35:15 UTC
Closing as unreproducible with assumption of storage fault.
In case the problem somehow reoccurs, please reopen this bug, and we will need closer inspection on the hw.

Comment 5 Zdenek Kabelac 2014-02-24 20:56:53 UTC
Maybe it might be related to this bugfix:

https://www.redhat.com/archives/lvm-devel/2014-January/msg00056.html