Red Hat Bugzilla – Bug 1278920
deadlock when removing snapshot of root LV from lvremove failing to mlock() itself into memory
Last modified: 2016-05-10 21:18:55 EDT
A customer periodically takes a snapshot of the logical volume containing the root filesystem. Several RHEL 6.7 systems have hung and had all applications become unresponsive when using lvremove to remove the snapshot.
A vmcore was captured, and lvremove was found to be stuck waiting for a page fault. At the time lvremove triggered a page fault, it had already suspended the DM devices for the root logical volume and snapshot. lvremove was deadlocked as the pagefault needed data from the root filesystem, but the root filesystem couldn't be read until lvremove finished its operations and resumed the root logical volume.
The issue appears to be a regression starting with the 2.02.118 releases of lvm2 for RHEL 6.7. lvremove did not have any of its memory mlocked into physical memory. Under a test with strace running, lvremove was found to be passing a length of 0 to all calls to mlock():
mlock(0x7fc06c125000, 0) = 0
mlock(0x7fc06c33b000, 0) = 0
mlock(0x7fc06c33c000, 0) = 0
Tested older versions, including lvm2-2.02.111-2.el6_6.6, did not show this behavior. Instead, mlock was being passed proper lengths for the regions to lock into memory with RHEL6.6 and older versions of lvm2.
mlock(0x400000, 1044480) = 0
mlock(0x6fe000, 49152) = 0
mlock(0x70a000, 98304) = 0
The bug appears to be from a change to _maps_line() in lib/mm/memlock.c related to valgrind defines, specifically the code shortly before mlock() is called:
* Valgrind is continually eating memory while executing code
* so we need to deactivate check of locked memory size
sz -= sz; /* = 0, but avoids getting warning about dead assigment */
With HAVE_VALGRIND defined and VALGRIND_POOL now defined from an option passed to ./configure in lvm2.spec, the "sz -=sz;" line is always invoked and sets a 0 size. This 0 size is then passed to mlock(), breaking the use of mlock(). With lvremove not locked into memory, it can page fault in the middle of its critical section and deadlock itself and hang anything else needing the root filesystem.
Version-Release number of selected component (if applicable):
Deadlock is very random from needing a page fault at a critical time.
Steps to Reproduce:
1. Create a snapshot of an logical volume
2. Run "lvremove" under strace to remove the snapshot.
3. strace data will show mlock() calls with a length parameter of 0 when the bug occurs.
lvremove can deadlock when removing a snapshot for a logical volume containing the root filesystem.
lvremove should remove the snapshot without risk of a deadlock.
I'm quite confused what is this BZ about.
Running 'lvm2' code within 'valgrind' MUST not mlock any memory.
Thus it eliminates locking size to 0 - this is 'expected' and 'wanted'.
Using 0 is not 'breaking' mlock - it disables mlock.
So passing 0 is not a problem - it's the behaviour for lvm2 binary executed from valgrind.
The spec file should not have set that option.
--enable-valgrind-pool somehow slipped to the build.
This option shall not appear in final build as it's current implementation eats memory (even in critical section) and it's not protected with runtime detection.
To be clear, this is a straightforward rebuild with a corrected spec file. No code change. "Steps to reproduce" in the original description no longer showing zero will be sufficient to show the problem has gone away.
A temporary workaround of setting lvm.conf configuration of activation/use_mlockall=1 has been provided, but this is not ideal as it uses more memory and can be slower, and should be reverted once the fixed package is available.
Marking as verified.
lvremove strace output:
old version: lvm2-2.02.118-3.el6_7.3
mlock(0x7fc530929000, 0) = 0
mlock(0x7fc53092a000, 0) = 0
mlock(0x7fc530b41000, 0) = 0
new version: lvm2-2.02.118-3.el6_7.4
mlock(0x7f0a7a024000, 4096) = 0
mlock(0x7f0a7a025000, 94208) = 0
mlock(0x7f0a7a23c000, 4096) = 0
Marking as verified.
lvm2-2.02.140-3.el6 BUILT: Thu Jan 21 12:40:10 CET 2016
lvm2-libs-2.02.140-3.el6 BUILT: Thu Jan 21 12:40:10 CET 2016
lvm2-cluster-2.02.140-3.el6 BUILT: Thu Jan 21 12:40:10 CET 2016
udev-147-2.69.el6 BUILT: Thu Jan 28 15:41:45 CET 2016
device-mapper-1.02.114-3.el6 BUILT: Thu Jan 21 12:40:10 CET 2016
device-mapper-libs-1.02.114-3.el6 BUILT: Thu Jan 21 12:40:10 CET 2016
device-mapper-event-1.02.114-3.el6 BUILT: Thu Jan 21 12:40:10 CET 2016
device-mapper-event-libs-1.02.114-3.el6 BUILT: Thu Jan 21 12:40:10 CET 2016
device-mapper-persistent-data-0.6.0-2.el6 BUILT: Thu Jan 21 09:40:25 CET 2016
cmirror-2.02.140-3.el6 BUILT: Thu Jan 21 12:40:10 CET 2016
lvremove strace output:
mlock(0x7f81a38fb000, 4096) = 0
mlock(0x7f81a38fc000, 536576) = 0
mlock(0x7f81a3b7e000, 4096) = 0
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.