Bug 1289486
Summary: | RHEL7: lvresize on root volume hangs or does not complete, lvmetad blocked due to suspended LVM device | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Simon Reber <sreber> |
Component: | lvm2 | Assignee: | LVM and device-mapper development team <lvm-team> |
lvm2 sub component: | LVM Metadata / lvmetad | QA Contact: | cluster-qe <cluster-qe> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | high | CC: | agk, cmarthal, dwysocha, heinzm, jbrassow, lvm-team, mkarg, msnitzer, prajnoha, prockai, rbednar, teigland, zkabelac |
Version: | 7.1 | ||
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.02.160-1.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-04 04:13:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Simon Reber
2015-12-08 09:42:21 UTC
What is the sequence of commands that are run when there's a problem? Is the lvextend/lvresize command deadlocking or xfs_growfs? Is xfs_growfs being run after the lvextend/lvresize command is finished? Can we tell if the dm device is suspended? (In reply to David Teigland from comment #2) > What is the sequence of commands that are run when there's a problem? > Is the lvextend/lvresize command deadlocking or xfs_growfs? > Is xfs_growfs being run after the lvextend/lvresize command is finished? > Can we tell if the dm device is suspended? They simply run `lvresize -L +6G -r /dev/mapper/path_to_lv` so I'm unable to tell at what command the problem starts. Not sure on whether the dm device is suspended. Maybe we can find this information within the provided vmcore?! It appears that lvresize wrote the VG with the new size, suspended the dm device to apply the change to the kernel, then vanished somehow, leaving the dm device suspended. lvresize didn't leave a core file did it? Capturing the output of lvresize with -vvvv would probably help. I think we've identified the problem. Until a fix is ready, disabling the use of lvmetad should avoid this problem. Set "use_lvmetad=0" in lvm.conf. lvmetad does not provide much or any benefit in environments like this with so few devices. The problem seems to be that lvresize communicates with lvmetad while the dm device is suspended. The communication and lvmetad involve allocating memory, which in this case triggers memory reclaim on the suspended device, which involves writing to the suspended device, which blocks because the device is suspended, causing a deadlock. (In reply to David Teigland from comment #12) > I think we've identified the problem. Until a fix is ready, disabling the > use of lvmetad should avoid this problem. Set "use_lvmetad=0" in lvm.conf. > lvmetad does not provide much or any benefit in environments like this with > so few devices. > > The problem seems to be that lvresize communicates with lvmetad while the > dm device is suspended. The communication and lvmetad involve allocating > memory, which in this case triggers memory reclaim on the suspended device, > which involves writing to the suspended device, which blocks because the > device is suspended, causing a deadlock. Customer just confirmed today, that after applying the suggested work-around ("use_lvmetad=0") the problem did not re-occur (he did re-size the root filesystems on 10 systems today). So I'm wondering how we could permanently solve this, without the need of the work-around. I assume that it will take time but was wondering if we already have an idea how to-do it. The solution is to send the message to lvmetad (lvmetad_vg_update()) after LVs have been resumed. Unfortunately, the way LVs are resumed is not very straight forward -- it's a hidden side effect of VG locking. The update should happen after resume and before unlock. fixed here https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=886da20c4dce3f3ab5c9fdc22580a98b49c6e269 Marking verified with latest rpms. No errors or system hang occured during several lvextend runs on XFS rootfs, while lvmetad running. 3.10.0-506.el7.x86_64 lvm2-2.02.165-2.el7.x86_64.rpm # dmidecode | grep -i vmware Manufacturer: VMware, Inc. Product Name: VMware Virtual Platform Serial Number: VMware-42 36 bc 33 87 ca f8 12-54 25 53 c1 fc 49 c9 55 Description: VMware SVGA II # systemctl is-active lvm2-lvmetad.service active # vgs VG #PV #LV #SN Attr VSize VFree rhel 1 2 0 wz--n- 15.00g 600.00m # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root rhel -wi-ao---- 13.39g swap rhel -wi-a----- 1.02g # lvextend -L+100M -r rhel/root Size of logical volume rhel/root changed from 13.39 GiB (3429 extents) to 13.49 GiB (3454 extents). Logical volume rhel/root successfully resized. meta-data=/dev/mapper/rhel-root isize=512 agcount=4, agsize=877824 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0 spinodes=0 data = bsize=4096 blocks=3511296, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data blocks changed from 3511296 to 3536896 # lvs LV VG Attr LSize ...... root rhel -wi-ao---- 3.49g swap rhel -wi-a----- 1.02g # lvextend -L+100M -r rhel/root Size of logical volume rhel/root changed from 13.49 GiB (3454 extents) to 13.59 GiB (3479 extents). Logical volume rhel/root successfully resized. meta-data=/dev/mapper/rhel-root isize=512 agcount=5, agsize=877824 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0 spinodes=0 data = bsize=4096 blocks=3536896, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data blocks changed from 3536896 to 3562496 # lvs rhel/root LV VG Attr LSize ..... root rhel -wi-ao---- 3.59g # lvextend -L+100M -r rhel/root Size of logical volume rhel/root changed from 13.59 GiB (3479 extents) to 13.69 GiB (3504 extents). Logical volume rhel/root successfully resized. meta-data=/dev/mapper/rhel-root isize=512 agcount=5, agsize=877824 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0 spinodes=0 data = bsize=4096 blocks=3562496, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data blocks changed from 3562496 to 3588096 # lvs rhel/root LV VG Attr LSize ...... root rhel -wi-ao---- 3.69g Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-1445.html |