Bug 1013045

Summary: Crash on xen domain startup: *** Error in `/usr/sbin/libvirtd': free(): invalid next size (fast): 0x00007f82c8003210 ***
Product: [Fedora] Fedora Reporter: Jeremy Fitzhardinge <jeremy>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 19CC: berrange, clalancette, crobinso, dallan, eblake, itamar, jeremy, jfehlig, jforbes, jyang, laine, libvirt-maint, mkletzan, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.0.5.7-2.fc19 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-21 04:41:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Full logs from /var/log/messages
none
libxl log
none
Domain config xml
none
Valgrind output
none
Updated valgrind output
none
Patch fixing the problem none

Description Jeremy Fitzhardinge 2013-09-27 16:55:27 UTC
Created attachment 804059 [details]
Full logs from /var/log/messages

Description of problem:
When starting a Xen domain with libvirt + libxl, it crashes after creating the domain.  The domain is left in a paused state, and works fine if manually unpaused with xl unpause. virt-manager never shows the domain as running.

Version-Release number of selected component (if applicable):
$ rpm -qa libvirt*
libvirt-sandbox-0.2.0-1.fc19.x86_64
libvirt-daemon-1.1.2-4.fc19.x86_64
libvirt-python-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-qemu-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-uml-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-libxl-1.1.2-4.fc19.x86_64
libvirt-daemon-lxc-1.1.2-4.fc19.x86_64
libvirt-sandbox-libs-0.2.0-1.fc19.x86_64
libvirt-daemon-driver-nwfilter-1.1.2-4.fc19.x86_64
libvirt-daemon-qemu-1.1.2-4.fc19.x86_64
libvirt-glib-0.1.6-1.fc19.x86_64
libvirt-daemon-config-network-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-storage-1.1.2-4.fc19.x86_64
libvirt-daemon-config-nwfilter-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-nodedev-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-xen-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-interface-1.1.2-4.fc19.x86_64
libvirt-daemon-kvm-1.1.2-4.fc19.x86_64
libvirt-gobject-0.1.6-1.fc19.x86_64
libvirt-gconfig-0.1.6-1.fc19.x86_64
libvirt-daemon-driver-secret-1.1.2-4.fc19.x86_64
libvirt-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-network-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-lxc-1.1.2-4.fc19.x86_64
libvirt-daemon-xen-1.1.2-4.fc19.x86_64
libvirt-client-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-vbox-1.1.2-4.fc19.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Open virt-manager
2. Connect to localhost (xen)
3. Start a domain

Actual results:
Domain is created in a paused state, virt-manager shows errors about losing connection to the daemon. Logs show libvirtd crashed.

Expected results:
Domain creation.

Additional info:
Sep 27 09:08:30 saboo libvirtd[24880]: *** Error in `/usr/sbin/libvirtd': free(): invalid next size (fast): 0x00007f82c8003210 ***
Sep 27 09:08:30 saboo libvirtd[24880]: ======= Backtrace: =========
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libc.so.6(+0x365b27d0e8)[0x7f82f5a7a0e8]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(virFree+0x1a)[0x7f82f8f07d5a]
Sep 27 09:08:30 saboo libvirtd[24880]: /usr/lib64/libvirt/connection-driver/libvirt_driver_libxl.so(+0x14b6c)[0x7f82e032bb6c]
Sep 27 09:08:30 saboo libvirtd[24880]: /usr/lib64/libvirt/connection-driver/libvirt_driver_libxl.so(+0x154d4)[0x7f82e032c4d4]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(virDomainCreate+0xf7)[0x7f82f8fdb6b7]
Sep 27 09:08:30 saboo libvirtd[24880]: /usr/sbin/libvirtd(+0x350c7)[0x7f82f9a1a0c7]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(virNetServerProgramDispatch+0x3ba)[0x7f82f90314aa]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(+0x3a33f822d8)[0x7f82f902c2d8]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(+0x3a33ea0c15)[0x7f82f8f4ac15]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(+0x3a33ea0691)[0x7f82f8f4a691]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libpthread.so.0(+0x365ba07c53)[0x7f82f61ccc53]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libc.so.6(clone+0x6d)[0x7f82f5af2d3d]

more details attached

Comment 1 Jeremy Fitzhardinge 2013-09-27 16:59:01 UTC
Created attachment 804060 [details]
libxl log

Comment 2 Jeremy Fitzhardinge 2013-09-27 17:02:01 UTC
Created attachment 804063 [details]
Domain config xml

Comment 3 Jeremy Fitzhardinge 2013-10-23 17:00:26 UTC
Ping?

Comment 4 Dave Allan 2013-10-23 17:31:13 UTC
Hi Jeremy, Would you mind mentioning this BZ on the upstream libvir-list?  I think that's the best way to get the attention of the folks who know the libxl driver the best.  Thanks, Dave

Comment 5 Jeremy Fitzhardinge 2013-10-23 17:53:14 UTC
OK, done.

Comment 6 Daniel Berrangé 2013-10-23 23:25:01 UTC
If you are able to run libvirtd under valgrind, that'd be very helpful - it ought to tell us just what memory alloc is being done wrong.

eg just do this as root

 $ systemctl stop libvirtd.service
 $ valgrind /usr/sbin/libvirtd

and then reproduce the issue with virt-manager.

Comment 7 Jeremy Fitzhardinge 2013-10-24 05:44:18 UTC
Created attachment 815612 [details]
Valgrind output

I think the "Conditional jump or move depends on uninitialised value(s)" are generally spurious because Valgrind doesn't know about the Xen syscalls/hypercalls, but these look real:

==3945== Invalid read of size 4
==3945==    at 0x1E44FD2D: libxl__get_numa_candidate (libxl_numa.c:202)
==3945==    by 0x1E443FA0: libxl__build_pre (libxl_dom.c:167)
==3945==    by 0x1E43A675: libxl__domain_build (libxl_create.c:340)
==3945==    by 0x1E43B748: domcreate_bootloader_done (libxl_create.c:765)
==3945==    by 0x1E457496: bootloader_local_detached_cb (libxl_bootloader.c:281)
==3945==    by 0x1E42CFBC: local_device_detach_cb (libxl.c:2480)
==3945==    by 0x1E431F86: libxl__device_disk_local_initiate_detach (libxl.c:2455)
==3945==    by 0x1E457B55: bootloader_callback (libxl_bootloader.c:265)
==3945==    by 0x1E458A8A: libxl__bootloader_run (libxl_bootloader.c:392)
==3945==    by 0x1E43BCF7: do_domain_create (libxl_create.c:704)
==3945==    by 0x1E43BE30: libxl_domain_create_new (libxl_create.c:1226)
==3945==    by 0x1E1FA7C0: libxlVmStart (libxl_driver.c:617)
==3945==  Address 0x21e8c208 is 8 bytes after a block of size 48 alloc'd
==3945==    at 0x4A08121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3945==    by 0x1E44A923: libxl__zalloc (libxl_internal.c:88)
==3945==    by 0x1E436A3D: libxl_get_cpu_topology (libxl.c:3762)
==3945==    by 0x1E44F71C: libxl__get_numa_candidate (libxl_numa.c:310)
==3945==    by 0x1E443FA0: libxl__build_pre (libxl_dom.c:167)
==3945==    by 0x1E43A675: libxl__domain_build (libxl_create.c:340)
==3945==    by 0x1E43B748: domcreate_bootloader_done (libxl_create.c:765)
==3945==    by 0x1E457496: bootloader_local_detached_cb (libxl_bootloader.c:281)
==3945==    by 0x1E42CFBC: local_device_detach_cb (libxl.c:2480)
==3945==    by 0x1E431F86: libxl__device_disk_local_initiate_detach (libxl.c:2455)
==3945==    by 0x1E457B55: bootloader_callback (libxl_bootloader.c:265)
==3945==    by 0x1E458A8A: libxl__bootloader_run (libxl_bootloader.c:392)

==3945== Invalid read of size 1
==3945==    at 0x1E1FA410: libxlVmStart (libxl_driver.c:475)
==3945==    by 0x1E1FAD9A: libxlDomainCreateWithFlags (libxl_driver.c:2633)
==3945==    by 0x5187D46: virDomainCreate (libvirt.c:9439)
==3945==    by 0x13BAA6: remoteDispatchDomainCreateHelper (remote_dispatch.h:2910)
==3945==    by 0x51DE5B9: virNetServerProgramDispatch (virnetserverprogram.c:435)
==3945==    by 0x51D93E7: virNetServerHandleJob (virnetserver.c:165)
==3945==    by 0x50F5BF4: virThreadPoolWorker (virthreadpool.c:144)
==3945==    by 0x50F5670: virThreadHelper (virthreadpthread.c:161)
==3945==    by 0x8046C52: start_thread (pthread_create.c:308)
==3945==    by 0x8758E1C: clone (clone.S:113)
==3945==  Address 0x23424d81 is 0 bytes after a block of size 1 alloc'd
==3945==    at 0x4A08121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3945==    by 0x50B1F8C: virAllocN (viralloc.c:189)
==3945==    by 0x1E1FA3CA: libxlVmStart (libxl_driver.c:468)
==3945==    by 0x1E1FAD9A: libxlDomainCreateWithFlags (libxl_driver.c:2633)
==3945==    by 0x5187D46: virDomainCreate (libvirt.c:9439)
==3945==    by 0x13BAA6: remoteDispatchDomainCreateHelper (remote_dispatch.h:2910)
==3945==    by 0x51DE5B9: virNetServerProgramDispatch (virnetserverprogram.c:435)
==3945==    by 0x51D93E7: virNetServerHandleJob (virnetserver.c:165)
==3945==    by 0x50F5BF4: virThreadPoolWorker (virthreadpool.c:144)
==3945==    by 0x50F5670: virThreadHelper (virthreadpthread.c:161)
==3945==    by 0x8046C52: start_thread (pthread_create.c:308)
==3945==    by 0x8758E1C: clone (clone.S:113)
==3945== 
==3945== Invalid read of size 1
==3945==    at 0x1E1FA3F0: libxlVmStart (libxl_driver.c:474)
==3945==    by 0x1E1FAD9A: libxlDomainCreateWithFlags (libxl_driver.c:2633)
==3945==    by 0x5187D46: virDomainCreate (libvirt.c:9439)
==3945==    by 0x13BAA6: remoteDispatchDomainCreateHelper (remote_dispatch.h:2910)
==3945==    by 0x51DE5B9: virNetServerProgramDispatch (virnetserverprogram.c:435)
==3945==    by 0x51D93E7: virNetServerHandleJob (virnetserver.c:165)
==3945==    by 0x50F5BF4: virThreadPoolWorker (virthreadpool.c:144)
==3945==    by 0x50F5670: virThreadHelper (virthreadpthread.c:161)
==3945==    by 0x8046C52: start_thread (pthread_create.c:308)
==3945==    by 0x8758E1C: clone (clone.S:113)
==3945==  Address 0x22933028 is 0 bytes after a block of size 24 alloc'd
==3945==    at 0x4A08121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3945==    by 0x50B1F23: virAlloc (viralloc.c:142)
==3945==    by 0x50B39EF: virBitmapNew (virbitmap.c:78)
==3945==    by 0x51224E6: virDomainDefParseXML (domain_conf.c:11199)
==3945==    by 0x512957A: virDomainDefParseNode (domain_conf.c:12648)
==3945==    by 0x51296A1: virDomainDefParse (domain_conf.c:12590)
==3945==    by 0x5129ED1: virDomainObjListLoadAllConfigs (domain_conf.c:17209)
==3945==    by 0x1E1FB494: libxlStateInitialize.part.7 (libxl_driver.c:907)
==3945==    by 0x1E1FB7F9: libxlStateInitialize (libxl_driver.c:823)
==3945==    by 0x51739A9: virStateInitialize (libvirt.c:834)
==3945==    by 0x11C36A: daemonRunStateInit (libvirtd.c:906)
==3945==    by 0x50F5670: virThreadHelper (virthreadpthread.c:161)

Comment 8 Martin Kletzander 2013-10-24 09:51:45 UTC
I posted a patch that might help with this:

https://www.redhat.com/archives/libvir-list/2013-October/msg01075.html

Comment 9 Jeremy Fitzhardinge 2013-10-24 17:09:56 UTC
Created attachment 815860 [details]
Updated valgrind output

I updated Valgrind to current svn and tried again. I don't think I reproduced the original problem, but it does show very many instances of use-after-free - it looks like there's some problem cleaning up on shutdown perhaps?

Comment 10 Jeremy Fitzhardinge 2013-10-24 17:14:42 UTC
The use-after-free seems to correlate with:

# valgrind '--track-origins=yes' '--log-file=virtd-vg.log' /usr/sbin/libvirtd
2013-10-24 17:00:53.308+0000: 18149: info : libvirt version: 1.1.3, package: 2.fc19 (Unknown, 2013-10-07-03:34:07, colepc.home)
2013-10-24 17:00:53.308+0000: 18149: error : libxlVmStart:610 : internal error: libxenlight failed to get free memory for domain 'Wheezy'
2013-10-24 17:00:53.340+0000: 18149: error : libxlAutostartDomain:124 : Failed to autostart VM 'Wheezy': internal error: libxenlight failed to get free memory for domain 'Wheezy'
2013-10-24 17:03:02.030+0000: 18139: error : libxlVmStart:610 : internal error: libxenlight failed to get free memory for domain 'Wheezy'
^C
2013-10-24 17:09:18.681+0000: 18138: error : netcfStateCleanup:112 : internal error: Attempt to close netcf state driver with open connections

I'm not sure why the memory allocations are failing - I wonder if that's an artifact of the new Valgrind - but having failed it puts libvirtd into a bad state.

Comment 11 Dave Allan 2013-10-28 18:15:58 UTC
Jeremy, does the patch Martin mentioned in comment 8 resolve the crash?

Comment 12 Jeremy Fitzhardinge 2013-10-29 00:14:16 UTC
I haven't had a chance to test it yet.

Comment 13 Jeremy Fitzhardinge 2013-10-30 17:40:57 UTC
Created attachment 817548 [details]
Patch fixing the problem

Unfortunately Martin's patch doesn't help, because the original code was seriously messed up and not a simple array bounds check from being correct. This patch seems to fix the problem for me.

Comment 14 Eric Blake 2013-10-30 18:18:52 UTC
Can you please also post your patch to libvir-list?  Patch review goes faster when all patches go through the same interface where a simple 'git am' imports the patch, rather than making developers have to manually download from bugzilla.

Comment 15 Cole Robinson 2013-11-17 18:13:07 UTC
Patch is upstream now:

commit ba1bf10063a0205c1de12b209b0282833710214f
Author: Jeremy Fitzhardinge <jeremy>
Date:   Wed Oct 30 10:38:08 2013 -0700

    libxl: fix dubious cpumask handling in libxlDomainSetVcpuAffinities

Comment 16 Fedora Update System 2013-11-17 23:29:33 UTC
libvirt-1.0.5.7-2.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/libvirt-1.0.5.7-2.fc19

Comment 17 Fedora Update System 2013-11-19 05:22:59 UTC
Package libvirt-1.0.5.7-2.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-1.0.5.7-2.fc19'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-21644/libvirt-1.0.5.7-2.fc19
then log in and leave karma (feedback).

Comment 18 Jeremy Fitzhardinge 2013-11-19 18:36:37 UTC
Looks good.

Comment 19 Fedora Update System 2013-11-21 04:41:24 UTC
libvirt-1.0.5.7-2.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.