Bug 1006945

Summary:	Udev resource limit introduced in recent versions of udev causing deadlock with lvm2+lvmetad+autoactivation
Product:	Red Hat Enterprise Linux 6	Reporter:	Peter Rajnoha <prajnoha>
Component:	lvm2	Assignee:	Peter Rajnoha <prajnoha>
Status:	CLOSED ERRATA	QA Contact:	Cluster QE <mspqa-list>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	6.5	CC:	agk, cmarthal, dwysocha, heinzm, jbrassow, msnitzer, nperic, prajnoha, prockai, thornber, zkabelac
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	lvm2-2.02.100-3.el6	Doc Type:	Bug Fix
Doc Text:	This bug appeared during testing of unreleased version and it has been fixed. No doc text needed.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2013-11-21 23:27:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Peter Rajnoha 2013-09-11 14:32:05 UTC

Recent versions of udev introduced a limitation for number of processes forked to process udev events. This was done to avoid another problem where timeouts occured in a system with numerous devices activated/changed at once (see bug #918511 for more info).

However, with such a limit, we could hit a deadlock when LVM2 is used with lvmetad and so the LVM autoactivation takes place. The LVM autoactivation is trigered directly in udev rules as part of the pvscan --cache -aay call (/lib/udev/rules.d/69-dm-lvmetad.rules). In addition, the LVM activation is itself synced with udev and requires the additional udev processing for which the pvscan waits and it does not finish until this processing is done.

In this situation, we can easily reach a deadlock where the pvscan --cache -aay autoactivates LVM volumes and it needs to wait for udev processing of just activated volumes AND at the same time the udev process limit is reached. Here, the pvscan --cache -aay waits, consuming the resource and not releasing it until the pvscan --cache -aay and its autoactivation finishes. But this will never finish as to finish the autoactivation, we need udev to finish the processing of newly autoactivated LVM volumes. This processing can't happen as we're at the limit -> DEADLOCK!

The limit can be reached quite easily as on a system with 1 CPU the limit is max 10 udev processes in parallel.

This is already patched upstream:

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=008c33a21ba0e0fb27319b63c7bad8fb8136f804

And accompanying patches:

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=5face2010dfb4e9fef43b867b4736a90abe51c00

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=96880102a3b792cadbfc97e44c09c5714a96ca9e

https://git.fedorahosted.org/cgit/lvm2.git/commit/?id=10a5838a600bbadfe9e20f686f86ffcd9d0c473e

Comment 2 Peter Rajnoha 2013-10-31 13:11:04 UTC

To QA: to test this issue, the following steps should prove the fix:

-> change /lib/udev/rules.d/69-dm-lvm-metad.rules and replace:
RUN+="$env{LVM_SBIN_PATH}/lvm pvscan --cache --background --activate ay --major $major --minor $minor", ENV{LVM_SCANNED}="1"
with
RUN+="$env{LVM_SBIN_PATH}/lvm pvscan --cache --activate ay --major $major --minor $minor", ENV{LVM_SCANNED}="1"
(iow, remove the --background option)


-> kill existing udevd and run a new one with debug log
[root@rhel6-a ~]# killall udevd
[root@rhel6-a ~]# udevd --debug


-> change max children to 1 (to create the tightest udev processing bottleneck)
[root@rhel6-a ~]# udevadm control --max-childs=1


-> check udev log that the max childre is really 1
1383224102.414987 [2088] handle_ctrl_msg: udevd message (SET_MAX_CHILDS) received, max_childs=1


-> create VG/LV
[root@rhel6-a ~]# vgcreate vg /dev/sda
  Volume group "vg" successfully created
[root@rhel6-a ~]# lvcreate -l1 vg
  Logical volume "lvol0" created
[root@rhel6-a ~]# lvchange -an vg


-> check there are no dm udev_sync cookies in the system
[root@rhel6-a ~]# dmsetup udevcookies
Cookie       Semid      Value      Last semop time           Last change time


-> trigger the add event (like it happens on device hotplug or coldplug)
[root@rhel6-a ~]# echo add > /sys/block/sda/uevent 


-> now, the LV gets activated via the pvscan --cache called in udev rules, BUT
there's a dangling symlink...


[root@rhel6-a ~]# dmsetup udevcookies
Cookie       Semid      Value      Last semop time           Last change time
0xd4d08b2    589831     1          Thu Oct 31 13:50:51 2013  Thu Oct 31 13:50:51 2013

-> also, there's blocked pvscan --cache call (from the udev rule) because udev can't do more than one process at a time
   (the pvscan --cache waits for udev notifying it about the LV activation completion, but LV activation is *another* event
    and that can't be handled because of the udev limit)


[root@rhel6-a ~]# ps aux | grep lvm
root      1028  0.0  1.5 184144 16296 ?        Ssl  13:41   0:00 lvmetad
root      2036  0.0  1.7  36148 17396 pts/0    S<+  13:50   0:00 /sbin/lvm pvscan --cache --activate ay --major 8 --minor 0
root      2062  0.0  0.0   6392   664 pts/1    S+   13:52   0:00 grep lvm


-> after ~3 minutes, the timeout happens (code 0x100 is timeout)
1383224031.389869 [1921] handle_signal: worker [1927] exit
1383224031.389889 [1921] handle_signal: worker [1927] unexpectedly returned with status 0x0100
1383224031.389895 [1921] handle_signal: worker [1927] failed while handling '/devices/pci0000:00/0000:00:07.0/host2/target2:0:0/2:0:0:0/block/sda'
1383224031.390329 [1921] event_queue_delete: seq 1516 done with -32
1383224031.390339 [1921] worker_unref: worker [1927] cleaned up


-> and leaving dangling cookies in the system as consequence and evidence
Cookie       Semid      Value      Last semop time           Last change time
0xd4d08b2    589831     1          Thu Oct 31 13:50:51 2013  Thu Oct 31 13:50:51 2013


[root@rhel6-a ~]# rpm -q udev
udev-147-2.51.el6.x86_64


Now, to test the fix, do the same with the new --background option back in the
udev rule mentioned at the beginning. At this time, here should be no dangling
dm udev_sync cookie seen in the dmsetup udevcookies (not taking into accound any
old ones from previous test) and no timeout in udev...

Comment 3 Peter Rajnoha 2013-10-31 13:27:37 UTC

(or you can also do the same with echo 1 > /sys/block/sda/device/delete and then scsi-rescan --forcescan for the device to be plugged "more officially" then just issuing the "echo add" artificially)

Comment 4 Nenad Peric 2013-10-31 14:39:29 UTC

Tested with and without the fix prduced expected results.
Without the fix a background process is stuck and a semaphore/cookie is still present and waiting:
[root@virt-008 ~]# dmsetup udevcookies
Cookie       Semid      Value      Last semop time           Last change time
0xd4d1bdb    294916     4          Thu Oct 31 15:12:16 2013  Thu Oct 31 15:12:16 2013

[root@virt-008 ~]# ps -ef | grep lvm
root       743     1  0 14:56 ?        00:00:00 lvmetad
root      2371     1  0 15:12 ttyS0    00:00:00 /sbin/lvm pvscan --cache --activate ay --major 8 --minor 209

After a few mins (udev debug log):

1383228916.230809 [2310] handle_signal: worker [2329] unexpectedly returned with status 0x0100


With the fix applied, it works differently:

[root@virt-008 ~]# dmsetup udevcookies
Cookie       Semid      Value      Last semop time           Last change time
[root@virt-008 ~]# ps -ef | grep lvm
root       743     1  0 14:56 ?        00:00:00 lvmetad



Marking VERIFIED with lvm2-2.02.100-8.el6.x86_64

Comment 5 errata-xmlrpc 2013-11-21 23:27:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1704.html