Hide Forgot
Description of problem: libvirtd segfaults immediately. This is happening since a reboot of the host, it used to work. The bug is a showstopper for us, since no guests can be started on the server. Version-Release number of selected component (if applicable): libvirt-0.8.1-27.el6_0.5.x86_64 libudev-147-2.29.el6.x86_64 How reproducible: 100% Steps to Reproduce: root@cs10-a:/var/log# gdb /usr/sbin/libvirtd GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6_0.1) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/libvirtd...Reading symbols from /usr/lib/debug/usr/sbin/libvirtd.debug...done. done. (gdb) run --listen Starting program: /usr/sbin/libvirtd --listen warning: "/usr/lib/debug/usr/lib64/libavahi-common.so.3.5.1.debug": separate debug info file has no debug info warning: "/usr/lib/debug/usr/lib64/libavahi-client.so.3.2.5.debug": separate debug info file has no debug info [Thread debugging using libthread_db enabled] [New Thread 0x7ffff0289700 (LWP 47778)] [New Thread 0x7fffef888700 (LWP 47779)] [New Thread 0x7fffeee87700 (LWP 47780)] [New Thread 0x7fffeda85700 (LWP 47782)] [New Thread 0x7fffed084700 (LWP 47783)] Program received signal SIGSEGV, Segmentation fault. udev_enumerate_get_list_entry (udev_enumerate=0x70ae70) at libudev/libudev-enumerate.c:264 264 if (prev != NULL && Missing separate debuginfos, use: debuginfo-install dbus-libs-1.2.24-4.el6_0.x86_64 keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.8.2-3.el6_0.6.x86_64 libcom_err-1.41.12-3.el6.x86_64 libgpg-error-1.7-3.el6.x86_64 libidn-1.18-2.el6.x86_64 libsepol-2.0.41-3.el6.x86_64 libssh2-1.2.2-7.el6.x86_64 libtasn1-2.3-3.el6.x86_64 nspr-4.8.6-1.el6.x86_64 nss-3.12.8-1.el6_0.x86_64 nss-softokn-freebl-3.12.8-1.el6_0.x86_64 nss-util-3.12.8-1.el6_0.x86_64 openldap-2.4.19-15.el6_0.2.x86_64 openssl-1.0.0-4.el6_0.2.x86_64 (gdb) where #0 udev_enumerate_get_list_entry (udev_enumerate=0x70ae70) at libudev/libudev-enumerate.c:264 #1 0x000000000049cf8b in udevEnumerateDevices ( privileged=<value optimized out>) at node_device/node_device_udev.c:1375 #2 udevDeviceMonitorStartup (privileged=<value optimized out>) at node_device/node_device_udev.c:1666 #3 0x00007ffff74e3180 in virStateInitialize (privileged=1) at libvirt.c:980 #4 0x000000000041d141 in main (argc=<value optimized out>, argv=<value optimized out>) at libvirtd.c:3250 (gdb) print *udev_enumerate $2 = {udev = 0x7088c0, refcount = 1, sysattr_match_list = {next = 0x70ae80, prev = 0x70ae80}, sysattr_nomatch_list = {next = 0x70ae90, prev = 0x70ae90}, subsystem_match_list = {next = 0x70aea0, prev = 0x70aea0}, subsystem_nomatch_list = {next = 0x70aeb0, prev = 0x70aeb0}, sysname_match_list = {next = 0x70aec0, prev = 0x70aec0}, properties_match_list = {next = 0x70aed0, prev = 0x70aed0}, devices_list = { next = 0x70af10, prev = 0x904660}, devices = 0x7ffff7f78010, devices_cur = 8193, devices_max = 16384, devices_uptodate = false} (gdb) print *entry $4 = {syspath = 0x7bbaf0 "/sys/devices/virtual/block/dm-423", len = 33} (gdb) print prev $5 = <value optimized out>
Please do not reboot the machine until you've tried the following, as we've seen a similar crash that could not be reproduced after the host was rebooted. (BZ 664962) Can you enable debugging of node_device_udev and reproduce and upload the libvirtd.log? The commands to enable udev debugging are: service libvirtd stop export LIBVIRT_DEBUG=3 export LIBVIRT_LOG_OUTPUTS="1:file:/var/log/libvirt_debug.log" export LIBVIRT_LOG_FILTERS="1:udev" /sbin/libvirtd See also: http://libvirt.org/logging.html
This is the whole content of the log file (I redacted the serial numbers): 02:02:39.196: debug : udevNodeRegister:1727 : Registering udev node device backend 02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'product_name' value 'ProLiant BL685c G7' for device with sysname 'id' 02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'sys_vendor' value 'HP' for device with sysname 'id' 02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'product_version' value '' for device with sysname 'id' 02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'product_serial' value '(elided) ' for device with sysname 'id' 02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'bios_vendor' value 'HP' for device with sysname 'id' 02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'bios_version' value 'A20' for device with sysname 'id' 02:02:39.202: debug : udevGetDeviceSysfsAttr:225 : Found sysfs attribute 'bios_date' value '09/30/2010' for device with sysname 'id' And these are the attributes involved: root@cs10-a:/sys/devices/virtual/dmi/id# head * ==> bios_date <== 09/30/2010 ==> bios_vendor <== HP ==> bios_version <== A20 ==> chassis_asset_tag <== ==> chassis_serial <== (elided) ==> chassis_type <== 23 ==> chassis_vendor <== HP ==> chassis_version <== ==> modalias <== dmi:bvnHP:bvrA20:bd09/30/2010:svnHP:pnProLiantBL685cG7:pvr:cvnHP:ct23:cvr: ==> power <== head: error reading `power': Is a directory ==> product_name <== ProLiant BL685c G7 ==> product_serial <== (elided) ==> product_uuid <== (elided) ==> product_version <== ==> subsystem <== head: error reading `subsystem': Is a directory ==> sys_vendor <== HP ==> uevent <== MODALIAS=dmi:bvnHP:bvrA20:bd09/30/2010:svnHP:pnProLiantBL685cG7:pvr:cvnHP:ct23:cvr:
The segfault is happening when traversing the list of devices internal to udev, and the debug log doesn't indicate any failures of parsing your devices or anything unusual in your configuration. Would you be willing to provide a core dump? Non-publicly would be fine.
I mailed it to you.
Thanks. As I mentioned earlier, I think rebooting the machine will fix the problem, so if you need to, please do and let us know the result.
Confirmed: libvirtd works again after a reboot.
Can you reproduce this? And could you try this patch here? http://git.kernel.org/?p=linux/hotplug/udev.git;a=commitdiff;h=c54b43e2c233e724f840c4f6a0a81bdd549e40bb
I could reproduce the bug at will before rebooting, but not anymore after rebooting the system. It could be the same issue since these systems have over 2500 devices+partitions.
Confirmed, Harald's patch fixes the bug.
Now libvirtd segfaulted on me two times at boot time, but I have always been able to restart it after the system boot.
Marco, do you mean that even with Harald's patch, you're seeing a segfault?
Yes, but now it happened only at boot time and again not every time. But I have limited data points since I cannot reboot these servers at will. At least with Harald's patch I can restart libvirtd after the boot.
Actually, I am not so sure right now that the problem is still present since I noticed that I did not install the new libudev package on some servers. Let's wait a few days and sorry for the noise.
False alarm, I can confirm that the patch fully fixes the bug.
Thanks for the update, Marco! Regards, Phil
*** Bug 664962 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1649.html