Created attachment 804059 [details] Full logs from /var/log/messages Description of problem: When starting a Xen domain with libvirt + libxl, it crashes after creating the domain. The domain is left in a paused state, and works fine if manually unpaused with xl unpause. virt-manager never shows the domain as running. Version-Release number of selected component (if applicable): $ rpm -qa libvirt* libvirt-sandbox-0.2.0-1.fc19.x86_64 libvirt-daemon-1.1.2-4.fc19.x86_64 libvirt-python-1.1.2-4.fc19.x86_64 libvirt-daemon-driver-qemu-1.1.2-4.fc19.x86_64 libvirt-daemon-driver-uml-1.1.2-4.fc19.x86_64 libvirt-daemon-driver-libxl-1.1.2-4.fc19.x86_64 libvirt-daemon-lxc-1.1.2-4.fc19.x86_64 libvirt-sandbox-libs-0.2.0-1.fc19.x86_64 libvirt-daemon-driver-nwfilter-1.1.2-4.fc19.x86_64 libvirt-daemon-qemu-1.1.2-4.fc19.x86_64 libvirt-glib-0.1.6-1.fc19.x86_64 libvirt-daemon-config-network-1.1.2-4.fc19.x86_64 libvirt-daemon-driver-storage-1.1.2-4.fc19.x86_64 libvirt-daemon-config-nwfilter-1.1.2-4.fc19.x86_64 libvirt-daemon-driver-nodedev-1.1.2-4.fc19.x86_64 libvirt-daemon-driver-xen-1.1.2-4.fc19.x86_64 libvirt-daemon-driver-interface-1.1.2-4.fc19.x86_64 libvirt-daemon-kvm-1.1.2-4.fc19.x86_64 libvirt-gobject-0.1.6-1.fc19.x86_64 libvirt-gconfig-0.1.6-1.fc19.x86_64 libvirt-daemon-driver-secret-1.1.2-4.fc19.x86_64 libvirt-1.1.2-4.fc19.x86_64 libvirt-daemon-driver-network-1.1.2-4.fc19.x86_64 libvirt-daemon-driver-lxc-1.1.2-4.fc19.x86_64 libvirt-daemon-xen-1.1.2-4.fc19.x86_64 libvirt-client-1.1.2-4.fc19.x86_64 libvirt-daemon-driver-vbox-1.1.2-4.fc19.x86_64 How reproducible: Always Steps to Reproduce: 1. Open virt-manager 2. Connect to localhost (xen) 3. Start a domain Actual results: Domain is created in a paused state, virt-manager shows errors about losing connection to the daemon. Logs show libvirtd crashed. Expected results: Domain creation. Additional info: Sep 27 09:08:30 saboo libvirtd[24880]: *** Error in `/usr/sbin/libvirtd': free(): invalid next size (fast): 0x00007f82c8003210 *** Sep 27 09:08:30 saboo libvirtd[24880]: ======= Backtrace: ========= Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libc.so.6(+0x365b27d0e8)[0x7f82f5a7a0e8] Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(virFree+0x1a)[0x7f82f8f07d5a] Sep 27 09:08:30 saboo libvirtd[24880]: /usr/lib64/libvirt/connection-driver/libvirt_driver_libxl.so(+0x14b6c)[0x7f82e032bb6c] Sep 27 09:08:30 saboo libvirtd[24880]: /usr/lib64/libvirt/connection-driver/libvirt_driver_libxl.so(+0x154d4)[0x7f82e032c4d4] Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(virDomainCreate+0xf7)[0x7f82f8fdb6b7] Sep 27 09:08:30 saboo libvirtd[24880]: /usr/sbin/libvirtd(+0x350c7)[0x7f82f9a1a0c7] Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(virNetServerProgramDispatch+0x3ba)[0x7f82f90314aa] Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(+0x3a33f822d8)[0x7f82f902c2d8] Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(+0x3a33ea0c15)[0x7f82f8f4ac15] Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(+0x3a33ea0691)[0x7f82f8f4a691] Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libpthread.so.0(+0x365ba07c53)[0x7f82f61ccc53] Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libc.so.6(clone+0x6d)[0x7f82f5af2d3d] more details attached
Created attachment 804060 [details] libxl log
Created attachment 804063 [details] Domain config xml
Ping?
Hi Jeremy, Would you mind mentioning this BZ on the upstream libvir-list? I think that's the best way to get the attention of the folks who know the libxl driver the best. Thanks, Dave
OK, done.
If you are able to run libvirtd under valgrind, that'd be very helpful - it ought to tell us just what memory alloc is being done wrong. eg just do this as root $ systemctl stop libvirtd.service $ valgrind /usr/sbin/libvirtd and then reproduce the issue with virt-manager.
Created attachment 815612 [details] Valgrind output I think the "Conditional jump or move depends on uninitialised value(s)" are generally spurious because Valgrind doesn't know about the Xen syscalls/hypercalls, but these look real: ==3945== Invalid read of size 4 ==3945== at 0x1E44FD2D: libxl__get_numa_candidate (libxl_numa.c:202) ==3945== by 0x1E443FA0: libxl__build_pre (libxl_dom.c:167) ==3945== by 0x1E43A675: libxl__domain_build (libxl_create.c:340) ==3945== by 0x1E43B748: domcreate_bootloader_done (libxl_create.c:765) ==3945== by 0x1E457496: bootloader_local_detached_cb (libxl_bootloader.c:281) ==3945== by 0x1E42CFBC: local_device_detach_cb (libxl.c:2480) ==3945== by 0x1E431F86: libxl__device_disk_local_initiate_detach (libxl.c:2455) ==3945== by 0x1E457B55: bootloader_callback (libxl_bootloader.c:265) ==3945== by 0x1E458A8A: libxl__bootloader_run (libxl_bootloader.c:392) ==3945== by 0x1E43BCF7: do_domain_create (libxl_create.c:704) ==3945== by 0x1E43BE30: libxl_domain_create_new (libxl_create.c:1226) ==3945== by 0x1E1FA7C0: libxlVmStart (libxl_driver.c:617) ==3945== Address 0x21e8c208 is 8 bytes after a block of size 48 alloc'd ==3945== at 0x4A08121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==3945== by 0x1E44A923: libxl__zalloc (libxl_internal.c:88) ==3945== by 0x1E436A3D: libxl_get_cpu_topology (libxl.c:3762) ==3945== by 0x1E44F71C: libxl__get_numa_candidate (libxl_numa.c:310) ==3945== by 0x1E443FA0: libxl__build_pre (libxl_dom.c:167) ==3945== by 0x1E43A675: libxl__domain_build (libxl_create.c:340) ==3945== by 0x1E43B748: domcreate_bootloader_done (libxl_create.c:765) ==3945== by 0x1E457496: bootloader_local_detached_cb (libxl_bootloader.c:281) ==3945== by 0x1E42CFBC: local_device_detach_cb (libxl.c:2480) ==3945== by 0x1E431F86: libxl__device_disk_local_initiate_detach (libxl.c:2455) ==3945== by 0x1E457B55: bootloader_callback (libxl_bootloader.c:265) ==3945== by 0x1E458A8A: libxl__bootloader_run (libxl_bootloader.c:392) ==3945== Invalid read of size 1 ==3945== at 0x1E1FA410: libxlVmStart (libxl_driver.c:475) ==3945== by 0x1E1FAD9A: libxlDomainCreateWithFlags (libxl_driver.c:2633) ==3945== by 0x5187D46: virDomainCreate (libvirt.c:9439) ==3945== by 0x13BAA6: remoteDispatchDomainCreateHelper (remote_dispatch.h:2910) ==3945== by 0x51DE5B9: virNetServerProgramDispatch (virnetserverprogram.c:435) ==3945== by 0x51D93E7: virNetServerHandleJob (virnetserver.c:165) ==3945== by 0x50F5BF4: virThreadPoolWorker (virthreadpool.c:144) ==3945== by 0x50F5670: virThreadHelper (virthreadpthread.c:161) ==3945== by 0x8046C52: start_thread (pthread_create.c:308) ==3945== by 0x8758E1C: clone (clone.S:113) ==3945== Address 0x23424d81 is 0 bytes after a block of size 1 alloc'd ==3945== at 0x4A08121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==3945== by 0x50B1F8C: virAllocN (viralloc.c:189) ==3945== by 0x1E1FA3CA: libxlVmStart (libxl_driver.c:468) ==3945== by 0x1E1FAD9A: libxlDomainCreateWithFlags (libxl_driver.c:2633) ==3945== by 0x5187D46: virDomainCreate (libvirt.c:9439) ==3945== by 0x13BAA6: remoteDispatchDomainCreateHelper (remote_dispatch.h:2910) ==3945== by 0x51DE5B9: virNetServerProgramDispatch (virnetserverprogram.c:435) ==3945== by 0x51D93E7: virNetServerHandleJob (virnetserver.c:165) ==3945== by 0x50F5BF4: virThreadPoolWorker (virthreadpool.c:144) ==3945== by 0x50F5670: virThreadHelper (virthreadpthread.c:161) ==3945== by 0x8046C52: start_thread (pthread_create.c:308) ==3945== by 0x8758E1C: clone (clone.S:113) ==3945== ==3945== Invalid read of size 1 ==3945== at 0x1E1FA3F0: libxlVmStart (libxl_driver.c:474) ==3945== by 0x1E1FAD9A: libxlDomainCreateWithFlags (libxl_driver.c:2633) ==3945== by 0x5187D46: virDomainCreate (libvirt.c:9439) ==3945== by 0x13BAA6: remoteDispatchDomainCreateHelper (remote_dispatch.h:2910) ==3945== by 0x51DE5B9: virNetServerProgramDispatch (virnetserverprogram.c:435) ==3945== by 0x51D93E7: virNetServerHandleJob (virnetserver.c:165) ==3945== by 0x50F5BF4: virThreadPoolWorker (virthreadpool.c:144) ==3945== by 0x50F5670: virThreadHelper (virthreadpthread.c:161) ==3945== by 0x8046C52: start_thread (pthread_create.c:308) ==3945== by 0x8758E1C: clone (clone.S:113) ==3945== Address 0x22933028 is 0 bytes after a block of size 24 alloc'd ==3945== at 0x4A08121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==3945== by 0x50B1F23: virAlloc (viralloc.c:142) ==3945== by 0x50B39EF: virBitmapNew (virbitmap.c:78) ==3945== by 0x51224E6: virDomainDefParseXML (domain_conf.c:11199) ==3945== by 0x512957A: virDomainDefParseNode (domain_conf.c:12648) ==3945== by 0x51296A1: virDomainDefParse (domain_conf.c:12590) ==3945== by 0x5129ED1: virDomainObjListLoadAllConfigs (domain_conf.c:17209) ==3945== by 0x1E1FB494: libxlStateInitialize.part.7 (libxl_driver.c:907) ==3945== by 0x1E1FB7F9: libxlStateInitialize (libxl_driver.c:823) ==3945== by 0x51739A9: virStateInitialize (libvirt.c:834) ==3945== by 0x11C36A: daemonRunStateInit (libvirtd.c:906) ==3945== by 0x50F5670: virThreadHelper (virthreadpthread.c:161)
I posted a patch that might help with this: https://www.redhat.com/archives/libvir-list/2013-October/msg01075.html
Created attachment 815860 [details] Updated valgrind output I updated Valgrind to current svn and tried again. I don't think I reproduced the original problem, but it does show very many instances of use-after-free - it looks like there's some problem cleaning up on shutdown perhaps?
The use-after-free seems to correlate with: # valgrind '--track-origins=yes' '--log-file=virtd-vg.log' /usr/sbin/libvirtd 2013-10-24 17:00:53.308+0000: 18149: info : libvirt version: 1.1.3, package: 2.fc19 (Unknown, 2013-10-07-03:34:07, colepc.home) 2013-10-24 17:00:53.308+0000: 18149: error : libxlVmStart:610 : internal error: libxenlight failed to get free memory for domain 'Wheezy' 2013-10-24 17:00:53.340+0000: 18149: error : libxlAutostartDomain:124 : Failed to autostart VM 'Wheezy': internal error: libxenlight failed to get free memory for domain 'Wheezy' 2013-10-24 17:03:02.030+0000: 18139: error : libxlVmStart:610 : internal error: libxenlight failed to get free memory for domain 'Wheezy' ^C 2013-10-24 17:09:18.681+0000: 18138: error : netcfStateCleanup:112 : internal error: Attempt to close netcf state driver with open connections I'm not sure why the memory allocations are failing - I wonder if that's an artifact of the new Valgrind - but having failed it puts libvirtd into a bad state.
Jeremy, does the patch Martin mentioned in comment 8 resolve the crash?
I haven't had a chance to test it yet.
Created attachment 817548 [details] Patch fixing the problem Unfortunately Martin's patch doesn't help, because the original code was seriously messed up and not a simple array bounds check from being correct. This patch seems to fix the problem for me.
Can you please also post your patch to libvir-list? Patch review goes faster when all patches go through the same interface where a simple 'git am' imports the patch, rather than making developers have to manually download from bugzilla.
Patch is upstream now: commit ba1bf10063a0205c1de12b209b0282833710214f Author: Jeremy Fitzhardinge <jeremy> Date: Wed Oct 30 10:38:08 2013 -0700 libxl: fix dubious cpumask handling in libxlDomainSetVcpuAffinities
libvirt-1.0.5.7-2.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/libvirt-1.0.5.7-2.fc19
Package libvirt-1.0.5.7-2.fc19: * should fix your issue, * was pushed to the Fedora 19 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing libvirt-1.0.5.7-2.fc19' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-21644/libvirt-1.0.5.7-2.fc19 then log in and leave karma (feedback).
Looks good.
libvirt-1.0.5.7-2.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.