Bug 696651 - libvirtd segfaults on startup inside udev_enumerate_get_list_entry()
Summary: libvirtd segfaults on startup inside udev_enumerate_get_list_entry()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: udev
Version: 6.0
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Harald Hoyer
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
: 664962 (view as bug list)
Depends On:
Blocks: 737469 743047
TreeView+ depends on / blocked
 
Reported: 2011-04-14 14:26 UTC by Marco d'Itri
Modified: 2018-11-26 19:11 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 16:26:56 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1649 0 normal SHIPPED_LIVE udev bug fix and enhancement update 2011-12-06 00:50:29 UTC

Description Marco d'Itri 2011-04-14 14:26:50 UTC
Description of problem:

libvirtd segfaults immediately.
This is happening since a reboot of the host, it used to work.
The bug is a showstopper for us, since no guests can be started on the server.

Version-Release number of selected component (if applicable):

libvirt-0.8.1-27.el6_0.5.x86_64
libudev-147-2.29.el6.x86_64


How reproducible:

100%

Steps to Reproduce:

root@cs10-a:/var/log# gdb /usr/sbin/libvirtd
GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6_0.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/libvirtd...Reading symbols from /usr/lib/debug/usr/sbin/libvirtd.debug...done.
done.
(gdb) run --listen
Starting program: /usr/sbin/libvirtd --listen
warning: "/usr/lib/debug/usr/lib64/libavahi-common.so.3.5.1.debug": separate debug info file has no debug info
warning: "/usr/lib/debug/usr/lib64/libavahi-client.so.3.2.5.debug": separate debug info file has no debug info
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff0289700 (LWP 47778)]
[New Thread 0x7fffef888700 (LWP 47779)]
[New Thread 0x7fffeee87700 (LWP 47780)]
[New Thread 0x7fffeda85700 (LWP 47782)]
[New Thread 0x7fffed084700 (LWP 47783)]

Program received signal SIGSEGV, Segmentation fault.
udev_enumerate_get_list_entry (udev_enumerate=0x70ae70)
    at libudev/libudev-enumerate.c:264
264                             if (prev != NULL &&
Missing separate debuginfos, use: debuginfo-install dbus-libs-1.2.24-4.el6_0.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.8.2-3.el6_0.6.x86_64 libcom_err-1.41.12-3.el6.x86_64 libgpg-error-1.7-3.el6.x86_64 libidn-1.18-2.el6.x86_64 libsepol-2.0.41-3.el6.x86_64 libssh2-1.2.2-7.el6.x86_64 libtasn1-2.3-3.el6.x86_64 nspr-4.8.6-1.el6.x86_64 nss-3.12.8-1.el6_0.x86_64 nss-softokn-freebl-3.12.8-1.el6_0.x86_64 nss-util-3.12.8-1.el6_0.x86_64 openldap-2.4.19-15.el6_0.2.x86_64 openssl-1.0.0-4.el6_0.2.x86_64
(gdb) where
#0  udev_enumerate_get_list_entry (udev_enumerate=0x70ae70)
    at libudev/libudev-enumerate.c:264
#1  0x000000000049cf8b in udevEnumerateDevices (
    privileged=<value optimized out>) at node_device/node_device_udev.c:1375
#2  udevDeviceMonitorStartup (privileged=<value optimized out>)
    at node_device/node_device_udev.c:1666
#3  0x00007ffff74e3180 in virStateInitialize (privileged=1) at libvirt.c:980
#4  0x000000000041d141 in main (argc=<value optimized out>,
    argv=<value optimized out>) at libvirtd.c:3250
(gdb) print *udev_enumerate
$2 = {udev = 0x7088c0, refcount = 1, sysattr_match_list = {next = 0x70ae80, 
    prev = 0x70ae80}, sysattr_nomatch_list = {next = 0x70ae90, 
    prev = 0x70ae90}, subsystem_match_list = {next = 0x70aea0, 
    prev = 0x70aea0}, subsystem_nomatch_list = {next = 0x70aeb0, 
    prev = 0x70aeb0}, sysname_match_list = {next = 0x70aec0, prev = 0x70aec0}, 
  properties_match_list = {next = 0x70aed0, prev = 0x70aed0}, devices_list = {
    next = 0x70af10, prev = 0x904660}, devices = 0x7ffff7f78010, 
  devices_cur = 8193, devices_max = 16384, devices_uptodate = false}
(gdb) print *entry
$4 = {syspath = 0x7bbaf0 "/sys/devices/virtual/block/dm-423", len = 33}
(gdb) print prev
$5 = <value optimized out>

Comment 2 Dave Allan 2011-04-14 17:51:48 UTC
Please do not reboot the machine until you've tried the following, as we've seen a similar crash that could not be reproduced after the host was rebooted.  (BZ 664962)

Can you enable debugging of node_device_udev and reproduce and upload the
libvirtd.log?

The commands to enable udev debugging are:

service libvirtd stop
export LIBVIRT_DEBUG=3
export LIBVIRT_LOG_OUTPUTS="1:file:/var/log/libvirt_debug.log"
export LIBVIRT_LOG_FILTERS="1:udev"
/sbin/libvirtd

See also:

http://libvirt.org/logging.html

Comment 3 Marco d'Itri 2011-04-15 00:07:22 UTC
This is the whole content of the log file (I redacted the serial numbers):

02:02:39.196: debug : udevNodeRegister:1727 : Registering udev node device backend
02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'product_name' value 'ProLiant BL685c G7' for device with sysname 'id'
02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'sys_vendor' value 'HP' for device with sysname 'id'
02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'product_version' value '' for device with sysname 'id'
02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'product_serial' value '(elided)      ' for device with sysname 'id'
02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'bios_vendor' value 'HP' for device with sysname 'id'
02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'bios_version' value 'A20' for device with sysname 'id'
02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'bios_date' value '09/30/2010' for device with sysname 'id'


And these are the attributes involved:

root@cs10-a:/sys/devices/virtual/dmi/id#  head *
==> bios_date <==
09/30/2010

==> bios_vendor <==
HP

==> bios_version <==
A20

==> chassis_asset_tag <==
        

==> chassis_serial <==
(elided)   

==> chassis_type <==
23

==> chassis_vendor <==
HP

==> chassis_version <==


==> modalias <==
dmi:bvnHP:bvrA20:bd09/30/2010:svnHP:pnProLiantBL685cG7:pvr:cvnHP:ct23:cvr:

==> power <==
head: error reading `power': Is a directory

==> product_name <==
ProLiant BL685c G7

==> product_serial <==
(elided)      

==> product_uuid <==
(elided)

==> product_version <==


==> subsystem <==
head: error reading `subsystem': Is a directory

==> sys_vendor <==
HP

==> uevent <==
MODALIAS=dmi:bvnHP:bvrA20:bd09/30/2010:svnHP:pnProLiantBL685cG7:pvr:cvnHP:ct23:cvr:

Comment 4 Dave Allan 2011-04-15 21:11:31 UTC
The segfault is happening when traversing the list of devices internal to udev, and the debug log doesn't indicate any failures of parsing your devices or anything unusual in your configuration.  Would you be willing to provide a core dump?  Non-publicly would be fine.

Comment 5 Marco d'Itri 2011-04-15 22:23:33 UTC
I mailed it to you.

Comment 6 Dave Allan 2011-04-18 20:48:06 UTC
Thanks.  As I mentioned earlier, I think rebooting the machine will fix the problem, so if you need to, please do and let us know the result.

Comment 8 Marco d'Itri 2011-04-18 21:19:54 UTC
Confirmed: libvirtd works again after a reboot.

Comment 12 Harald Hoyer 2011-04-20 16:28:40 UTC
Can you reproduce this? And could you try this patch here?

http://git.kernel.org/?p=linux/hotplug/udev.git;a=commitdiff;h=c54b43e2c233e724f840c4f6a0a81bdd549e40bb

Comment 13 Marco d'Itri 2011-04-20 16:34:03 UTC
I could reproduce the bug at will before rebooting, but not anymore after rebooting the system.
It could be the same issue since these systems have over 2500 devices+partitions.

Comment 15 Marco d'Itri 2011-04-21 20:02:30 UTC
Confirmed, Harald's patch fixes the bug.

Comment 16 Marco d'Itri 2011-04-25 23:35:12 UTC
Now libvirtd segfaulted on me two times at boot time, but I have always been able to restart it after the system boot.

Comment 17 Dave Allan 2011-04-28 02:19:04 UTC
Marco, do you mean that even with Harald's patch, you're seeing a segfault?

Comment 18 Marco d'Itri 2011-05-02 01:35:31 UTC
Yes, but now it happened only at boot time and again not every time. But I have limited data points since I cannot reboot these servers at will.
At least with Harald's patch I can restart libvirtd after the boot.

Comment 19 Marco d'Itri 2011-05-02 17:25:41 UTC
Actually, I am not so sure right now that the problem is still present since I noticed that I did not install the new libudev package on some servers. Let's wait a few days and sorry for the noise.

Comment 20 Marco d'Itri 2011-05-07 22:51:43 UTC
False alarm, I can confirm that the patch fully fixes the bug.

Comment 21 Phil Knirsch 2011-05-09 08:39:25 UTC
Thanks for the update, Marco!

Regards, Phil

Comment 22 Dave Allan 2011-05-10 19:09:38 UTC
*** Bug 664962 has been marked as a duplicate of this bug. ***

Comment 31 errata-xmlrpc 2011-12-06 16:26:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1649.html


Note You need to log in before you can comment on or make changes to this bug.