Bug 862253
Summary: | [lvmetad] deadlock | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Marian Csontos <mcsontos> |
Component: | lvm2 | Assignee: | Petr Rockai <prockai> |
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.4 | CC: | agk, cmarthal, coughlan, dwysocha, heinzm, jbrassow, msnitzer, prajnoha, prockai, thornber, zkabelac |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.02.98-1.el6 | Doc Type: | Bug Fix |
Doc Text: |
Under relatively heavy load (many parallel LVM commands running) lvmetad could deadlock and cause other LVM commands to stop responding. The problem has been tracked down to a race condition in lvmetad's multi-threaded code and has been fixed.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-02-21 08:14:12 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Marian Csontos
2012-10-02 13:04:50 UTC
Should be fixed upstream now. What was wrong? How was it fixed? commit 6e312c56adb04e709819868c6fa89f5984013a65 lvmetad: Avoid overlapping locks that could cause a deadlock (BZ 862253). The bug was present in 6.3 as well, but apparently never tripped. The fix was relatively easy (see patch: git log 6e312c56adb04e709819868c6fa89f5984013a65 -p -n 1). To QA: The reproducer should be the original bug description, i.e. run "pvs" in parallel while lvmetad is running. When lvmetad stops responding, you have hit the bug. (I didn't manage to reproduce the problem myself, but I could see the problem in the source code. Marian should be able to reproduce this. I'd say if you stress-test lvmetad with parallel access from multiple LVM commands for a while and don't hit a deadlock, it's as verified as it gets). I was unable to hit a deadlock, but I was able to hit what appears to be bug 889361 fairly easily. # run 20 pvs in parallel: [...] Skipping volume group raid_sanity Volume group "raid_sanity" not found Request to list PVs in lvmetad gave response Connection reset by peer. Jan 24 15:30:10 hayes-03 kernel: lvmetad[25889] general protection ip:3a97a76205 sp:7f1b1cdf9b90 error:0 in libc-2.12.so[3a97a00000+18a000] [root@hayes-03 ~]# pvscan WARNING: Failed to connect to lvmetad: Connection refused. Falling back to internal scanning. PV /dev/etherd/e1.1p1 VG raid_sanity lvm2 [908.23 GiB / 907.93 GiB free] PV /dev/etherd/e1.1p10 VG raid_sanity lvm2 [908.23 GiB / 908.23 GiB free] PV /dev/etherd/e1.1p2 VG raid_sanity lvm2 [908.23 GiB / 908.23 GiB free] Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0501.html |