Bug 1199837
| Summary: | lvm jump TransactionID without getting confirmation from kernel | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jack Waterworth <jwaterwo> |
| Component: | lvm2 | Assignee: | Zdenek Kabelac <zkabelac> |
| lvm2 sub component: | LVM Metadata / lvmetad | QA Contact: | cluster-qe <cluster-qe> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | low | ||
| Priority: | unspecified | CC: | agk, coughlan, heinzm, jbrassow, msnitzer, prajnoha, prockai, rbednar, zkabelac |
| Version: | 7.1 | ||
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | lvm2-2.02.160-1.el7 | Doc Type: | Bug Fix |
| Doc Text: |
Cause:
In some complex code path lvm2 missed to commit metadata and in case further problem arrived, it's been possible to diverge by 1 transaction_id between lvm2 and kernel metadata.
Consequence:
Different transaction_id stops further usage of thin pool and requires manual repair operation even thought the user would not expect such problem.
Fix:
lvm2 improved the logic for update of transaction_id.
Result:
Using common lvm2 operation should not result in a thin-pool with asynchronous transaction_id.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-04 04:09:29 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1199940, 1243913 | ||
| Bug Blocks: | 1295577, 1313485 | ||
|
Description
Jack Waterworth
2015-03-08 20:38:51 UTC
From metadata archive it looks like lvm2 didn't properly validate removal of data/vm_f21_server LV and jumped one step further while thin-pool has been already in overfilled stated. I'll try to write a reproducer. (As a side note it's also unclear why there are archived files from vgdisplay command) So the reason for 'vgdisplay' archiving is missed 'backup()' creating in error path. Since lvm2 updates metadata but likely fail to lvremove and ommits doing backup in this error path - so the next following lvm command (i.e. vgdisplay) is noticing there is missed backup and will the archiving and backup during this command - so this needs fix. Any transaction id update is now provided with instant check that confirms transaction Id has really the expected number. While the source code is a moving target - and possibility of some missed updated cannot be fully eliminated - we should be now able to spot such forgotten update and stop further actions by 2 new extra levels of validation. Saying this - I've not seen any problems with some unexpected transaction_id change for a while - so hoping we eliminated primary source of trouble. So hopefully there is no reproducer for this bug. Verified. No transaction id error occured when trying to reproduce the bug.
# lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root rhel_virt-151 -wi-ao---- 6.74g
swap rhel_virt-151 -wi-ao---- 828.00m
POOL vg twi-aotzD- 68.00m 100.00 1.76
[POOL_tdata] vg Twi-ao---- 68.00m
[POOL_tmeta] vg ewi-ao---- 4.00m
[lvol0_pmspare] vg ewi------- 4.00m
test_lv vg Vwi-a-tz-- 240.00m POOL 28.33
# vgs
VG #PV #LV #SN Attr VSize VFree
rhel_virt-151 1 2 0 wz--n- 7.59g 40.00m
vg 2 2 0 wz--n- 192.00m 116.00m
# lvextend -r vg/POOL -L+50M
Ignoring --resizefs as volume vg/POOL does not have a filesystem.
Rounding size to boundary between physical extents: 52.00 MiB.
WARNING: Sum of all thin volume sizes (240.00 MiB) exceeds the size of thin pools and the size of whole volume group (192.00 MiB)!
Size of logical volume vg/POOL_tdata changed from 68.00 MiB (17 extents) to 120.00 MiB (30 extents).
Logical volume vg/POOL_tdata successfully resized.
# reboot
...
# lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root rhel_virt-151 -wi-ao---- 6.74g
swap rhel_virt-151 -wi-ao---- 828.00m
POOL vg twi-aotz-- 120.00m 56.67 1.76
[POOL_tdata] vg Twi-ao---- 120.00m
[POOL_tmeta] vg ewi-ao---- 4.00m
[lvol0_pmspare] vg ewi------- 4.00m
test_lv vg Vwi-a-tz-- 240.00m POOL 28.33
3.10.0-501.el7.x86_64
lvm2-2.02.165-1.el7 BUILT: Wed Sep 7 18:04:22 CEST 2016
lvm2-libs-2.02.165-1.el7 BUILT: Wed Sep 7 18:04:22 CEST 2016
lvm2-cluster-2.02.165-1.el7 BUILT: Wed Sep 7 18:04:22 CEST 2016
device-mapper-1.02.134-1.el7 BUILT: Wed Sep 7 18:04:22 CEST 2016
device-mapper-libs-1.02.134-1.el7 BUILT: Wed Sep 7 18:04:22 CEST 2016
device-mapper-event-1.02.134-1.el7 BUILT: Wed Sep 7 18:04:22 CEST 2016
device-mapper-event-libs-1.02.134-1.el7 BUILT: Wed Sep 7 18:04:22 CEST 2016
device-mapper-persistent-data-0.6.3-1.el7 BUILT: Fri Jul 22 12:29:13 CEST 2016
cmirror-2.02.165-1.el7 BUILT: Wed Sep 7 18:04:22 CEST 2016
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1445.html |