Bug 966776 - pool create attempt failure: Thin pool transaction_id=7, while expected: 18446744073709551615
pool create attempt failure: Thin pool transaction_id=7, while expected: 1844...
Status: CLOSED WORKSFORME
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2 (Show other bugs)
7.0
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Zdenek Kabelac
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-23 18:55 EDT by Corey Marthaler
Modified: 2015-05-11 17:11 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-10-09 05:35:15 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
pool create attempt (162.94 KB, text/plain)
2013-05-23 18:55 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2013-05-23 18:55:56 EDT
Created attachment 752417 [details]
pool create attempt

Description of problem:
While doing thin_restore testing I've seen the pool create attempts fail from time to time.

lvcreate --thinpool POOL -L 1G snapper_thinp
  Thin pool transaction_id=7, while expected: 18446744073709551615.
  Unable to deactivate open snapper_thinp-POOL_tmeta (253:2)
  Unable to deactivate open snapper_thinp-POOL_tdata (253:3)
  Failed to deactivate snapper_thinp-POOL-tpool
  Aborting. Failed to activate thin POOL.

Oddly, this appears to happen only when the VG contains 4 PVs, and not when it contains 3 or 5 PVs, and never appears to happen on the first iteration of the test, so I wonder if something is being left on disk from a prior corruption and restoration attempt? I'll attach the verbose output from a failed command.


Version-Release number of selected component (if applicable):
3.8.0-0.40.el7.x86_64

lvm2-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
lvm2-libs-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
lvm2-cluster-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-libs-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-event-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
device-mapper-event-libs-1.02.78-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013
cmirror-2.02.99-0.34.el7    BUILT: Thu May 16 19:28:08 CDT 2013


How reproducible:
Often
Comment 1 Zdenek Kabelac 2013-05-24 03:54:34 EDT
Hmmm - this more or less looks like initialization of thin pool metadata was wrong.

From the attachment trace:

#metadata/lv_manip.c:4314     Clearing start of logical volume "POOL"
#device/dev-cache.c:332         /dev/snapper_thinp/POOL: Added to device cache
#device/dev-io.c:526         Opened /dev/snapper_thinp/POOL RW O_DIRECT
#device/dev-io.c:766         Wiping /dev/snapper_thinp/POOL at sector 0 length 8 sectors
#device/dev-io.c:137         /dev/snapper_thinp/POOL: block size is 4096 bytes
#device/dev-io.c:579         Closed /dev/snapper_thinp/POOL

metadata header should have zeroed the first 4KB of data in partition - before it's passed into thinpool target construction.

id: 18446744073709551615 is clearly 0xffffffffffffffff

which should be the valid id for the initial start - but 7 is being returned.

Since you've mentioned it's only with exact configuration of PV - isn't possible you get PV data from some 'already' running thin pool metadata LV.
i.e. some bad caching of data device for virtual storage ?
Or is this happen with real-hw?


Anyway - Mike do you have any idea how that could happen ?
Comment 2 Corey Marthaler 2013-05-30 18:35:56 EDT
It's my understanding (which is pretty limited when it come to virt storage) that with the <shareable/> element in the storage definition, it shouldn't be using cache.
Comment 3 Zdenek Kabelac 2013-09-03 04:38:28 EDT
Do you still hit this bug?
Or was this possibly some storage issue?
Comment 4 Zdenek Kabelac 2013-10-09 05:35:15 EDT
Closing as unreproducible with assumption of storage fault.
In case the problem somehow reoccurs, please reopen this bug, and we will need closer inspection on the hw.
Comment 5 Zdenek Kabelac 2014-02-24 15:56:53 EST
Maybe it might be related to this bugfix:

https://www.redhat.com/archives/lvm-devel/2014-January/msg00056.html

Note You need to log in before you can comment on or make changes to this bug.