RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1199837 - lvm jump TransactionID without getting confirmation from kernel
Summary: lvm jump TransactionID without getting confirmation from kernel
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.1
Hardware: All
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Zdenek Kabelac
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1199940 1243913
Blocks: 1295577 1313485
TreeView+ depends on / blocked
 
Reported: 2015-03-08 20:38 UTC by Jack Waterworth
Modified: 2021-09-03 12:52 UTC (History)
9 users (show)

Fixed In Version: lvm2-2.02.160-1.el7
Doc Type: Bug Fix
Doc Text:
Cause: In some complex code path lvm2 missed to commit metadata and in case further problem arrived, it's been possible to diverge by 1 transaction_id between lvm2 and kernel metadata. Consequence: Different transaction_id stops further usage of thin pool and requires manual repair operation even thought the user would not expect such problem. Fix: lvm2 improved the logic for update of transaction_id. Result: Using common lvm2 operation should not result in a thin-pool with asynchronous transaction_id.
Clone Of:
Environment:
Last Closed: 2016-11-04 04:09:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1445 0 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2016-11-03 13:46:41 UTC

Description Jack Waterworth 2015-03-08 20:38:51 UTC
Description of problem:
After exhausting space within a thin volume, extending and rebooting the machine caused a mismatch in transactionID

Version-Release number of selected component (if applicable):
lvm2-2.02.115-3.el7.x86_64
kernel-3.10.0-123.4.4.el7.x86_64
kernel-3.10.0-123.8.1.el7.x86_64
kernel-3.10.0-229.el7.x86_64

How reproducible:
unknown

Steps to Reproduce:
1. Fill thinpool to 100% data usage
2. Extend thinpool
3. Reboot server

Actual results:
thinpool is unable to be activated:
[root@jack-tank ~]# vgchange -ay data
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  Thin pool transaction_id is 7345, while expected 7346.
  11 logical volume(s) in volume group "data" now active
[root@jack-tank ~]# lvs /dev/data/thin_pool
  LV        VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  thin_pool data twi---tz-- 150.00g                                                    
[root@jack-tank ~]# lvchange -ay data/thin_pool
  Thin pool transaction_id is 7345, while expected 7346.
[root@jack-tank ~]# lvs /dev/data/thin_pool
  LV        VG   Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  thin_pool data twi---tz-- 150.00g                                                    
[root@jack-tank ~]#


Expected results:
transaction_id should match expected id.

Additional info:
i'm not sure what I did to get into this state.  I attempt to increase the size of the thin_pool volume by adding 50G, but this did not decrease the data%. I rebooted to see if that would clear up the discrepancy. After that I was unable to activate my volumes.

Comment 3 Zdenek Kabelac 2015-03-09 08:20:15 UTC
From metadata archive it looks like lvm2 didn't properly validate removal of  data/vm_f21_server LV and jumped one step further while thin-pool has been already in overfilled stated.

I'll try to write a reproducer.

(As a side note it's also unclear why there are archived files from vgdisplay command)

Comment 4 Zdenek Kabelac 2015-03-09 14:21:54 UTC
So the reason for 'vgdisplay' archiving is missed 'backup()' creating in error path.  Since lvm2 updates metadata but likely fail to lvremove and ommits doing backup in this error path - so the next following lvm command (i.e. vgdisplay) is noticing there is missed backup and will the archiving and backup during this command - so this needs fix.

Comment 5 Zdenek Kabelac 2016-07-01 07:06:00 UTC
Any transaction id update is now provided with instant check that confirms transaction Id has really the expected number.

While the source code is a moving target - and possibility of some missed updated cannot be fully eliminated - we should be now able to spot such forgotten update and stop further actions by 2 new extra levels of validation.

Saying this - I've not seen any problems with some unexpected transaction_id change for a while - so hoping we eliminated primary source of trouble.

So hopefully there is no reproducer for this bug.

Comment 7 Roman Bednář 2016-09-15 14:59:44 UTC
Verified. No transaction id error occured when trying to reproduce the bug.

# lvs -a
  LV              VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root            rhel_virt-151 -wi-ao----   6.74g                                                    
  swap            rhel_virt-151 -wi-ao---- 828.00m                                                    
  POOL            vg            twi-aotzD-  68.00m             100.00 1.76                            
  [POOL_tdata]    vg            Twi-ao----  68.00m                                                    
  [POOL_tmeta]    vg            ewi-ao----   4.00m                                                    
  [lvol0_pmspare] vg            ewi-------   4.00m                                                    
  test_lv         vg            Vwi-a-tz-- 240.00m POOL        28.33  
                                
# vgs
  VG            #PV #LV #SN Attr   VSize   VFree  
  rhel_virt-151   1   2   0 wz--n-   7.59g  40.00m
  vg              2   2   0 wz--n- 192.00m 116.00m

# lvextend -r vg/POOL -L+50M
  Ignoring --resizefs as volume vg/POOL does not have a filesystem.
  Rounding size to boundary between physical extents: 52.00 MiB.
  WARNING: Sum of all thin volume sizes (240.00 MiB) exceeds the size of thin pools and the size of whole volume group (192.00 MiB)!
  Size of logical volume vg/POOL_tdata changed from 68.00 MiB (17 extents) to 120.00 MiB (30 extents).
  Logical volume vg/POOL_tdata successfully resized.

# reboot
...
# lvs -a
  LV              VG            Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root            rhel_virt-151 -wi-ao----   6.74g                                                    
  swap            rhel_virt-151 -wi-ao---- 828.00m                                                    
  POOL            vg            twi-aotz-- 120.00m             56.67  1.76                            
  [POOL_tdata]    vg            Twi-ao---- 120.00m                                                    
  [POOL_tmeta]    vg            ewi-ao----   4.00m                                                    
  [lvol0_pmspare] vg            ewi-------   4.00m                                                    
  test_lv         vg            Vwi-a-tz-- 240.00m POOL        28.33     



3.10.0-501.el7.x86_64

lvm2-2.02.165-1.el7    BUILT: Wed Sep  7 18:04:22 CEST 2016
lvm2-libs-2.02.165-1.el7    BUILT: Wed Sep  7 18:04:22 CEST 2016
lvm2-cluster-2.02.165-1.el7    BUILT: Wed Sep  7 18:04:22 CEST 2016
device-mapper-1.02.134-1.el7    BUILT: Wed Sep  7 18:04:22 CEST 2016
device-mapper-libs-1.02.134-1.el7    BUILT: Wed Sep  7 18:04:22 CEST 2016
device-mapper-event-1.02.134-1.el7    BUILT: Wed Sep  7 18:04:22 CEST 2016
device-mapper-event-libs-1.02.134-1.el7    BUILT: Wed Sep  7 18:04:22 CEST 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 12:29:13 CEST 2016
cmirror-2.02.165-1.el7    BUILT: Wed Sep  7 18:04:22 CEST 2016

Comment 9 errata-xmlrpc 2016-11-04 04:09:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html


Note You need to log in before you can comment on or make changes to this bug.