Bug 1013045 - Crash on xen domain startup: *** Error in `/usr/sbin/libvirtd': free(): invalid next size (fast): 0x00007f82c8003210 ***
Crash on xen domain startup: *** Error in `/usr/sbin/libvirtd': free(): inval...
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: libvirt (Show other bugs)
19
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Libvirt Maintainers
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-27 12:55 EDT by Jeremy Fitzhardinge
Modified: 2013-11-20 23:41 EST (History)
15 users (show)

See Also:
Fixed In Version: libvirt-1.0.5.7-2.fc19
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-20 23:41:24 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Full logs from /var/log/messages (162.20 KB, text/plain)
2013-09-27 12:55 EDT, Jeremy Fitzhardinge
no flags Details
libxl log (6.19 KB, text/plain)
2013-09-27 12:59 EDT, Jeremy Fitzhardinge
no flags Details
Domain config xml (1.45 KB, text/xml)
2013-09-27 13:02 EDT, Jeremy Fitzhardinge
no flags Details
Valgrind output (247.40 KB, text/x-log)
2013-10-24 01:44 EDT, Jeremy Fitzhardinge
no flags Details
Updated valgrind output (306.15 KB, text/plain)
2013-10-24 13:09 EDT, Jeremy Fitzhardinge
no flags Details
Patch fixing the problem (1.73 KB, patch)
2013-10-30 13:40 EDT, Jeremy Fitzhardinge
no flags Details | Diff

  None (edit)
Description Jeremy Fitzhardinge 2013-09-27 12:55:27 EDT
Created attachment 804059 [details]
Full logs from /var/log/messages

Description of problem:
When starting a Xen domain with libvirt + libxl, it crashes after creating the domain.  The domain is left in a paused state, and works fine if manually unpaused with xl unpause. virt-manager never shows the domain as running.

Version-Release number of selected component (if applicable):
$ rpm -qa libvirt*
libvirt-sandbox-0.2.0-1.fc19.x86_64
libvirt-daemon-1.1.2-4.fc19.x86_64
libvirt-python-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-qemu-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-uml-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-libxl-1.1.2-4.fc19.x86_64
libvirt-daemon-lxc-1.1.2-4.fc19.x86_64
libvirt-sandbox-libs-0.2.0-1.fc19.x86_64
libvirt-daemon-driver-nwfilter-1.1.2-4.fc19.x86_64
libvirt-daemon-qemu-1.1.2-4.fc19.x86_64
libvirt-glib-0.1.6-1.fc19.x86_64
libvirt-daemon-config-network-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-storage-1.1.2-4.fc19.x86_64
libvirt-daemon-config-nwfilter-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-nodedev-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-xen-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-interface-1.1.2-4.fc19.x86_64
libvirt-daemon-kvm-1.1.2-4.fc19.x86_64
libvirt-gobject-0.1.6-1.fc19.x86_64
libvirt-gconfig-0.1.6-1.fc19.x86_64
libvirt-daemon-driver-secret-1.1.2-4.fc19.x86_64
libvirt-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-network-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-lxc-1.1.2-4.fc19.x86_64
libvirt-daemon-xen-1.1.2-4.fc19.x86_64
libvirt-client-1.1.2-4.fc19.x86_64
libvirt-daemon-driver-vbox-1.1.2-4.fc19.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Open virt-manager
2. Connect to localhost (xen)
3. Start a domain

Actual results:
Domain is created in a paused state, virt-manager shows errors about losing connection to the daemon. Logs show libvirtd crashed.

Expected results:
Domain creation.

Additional info:
Sep 27 09:08:30 saboo libvirtd[24880]: *** Error in `/usr/sbin/libvirtd': free(): invalid next size (fast): 0x00007f82c8003210 ***
Sep 27 09:08:30 saboo libvirtd[24880]: ======= Backtrace: =========
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libc.so.6(+0x365b27d0e8)[0x7f82f5a7a0e8]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(virFree+0x1a)[0x7f82f8f07d5a]
Sep 27 09:08:30 saboo libvirtd[24880]: /usr/lib64/libvirt/connection-driver/libvirt_driver_libxl.so(+0x14b6c)[0x7f82e032bb6c]
Sep 27 09:08:30 saboo libvirtd[24880]: /usr/lib64/libvirt/connection-driver/libvirt_driver_libxl.so(+0x154d4)[0x7f82e032c4d4]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(virDomainCreate+0xf7)[0x7f82f8fdb6b7]
Sep 27 09:08:30 saboo libvirtd[24880]: /usr/sbin/libvirtd(+0x350c7)[0x7f82f9a1a0c7]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(virNetServerProgramDispatch+0x3ba)[0x7f82f90314aa]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(+0x3a33f822d8)[0x7f82f902c2d8]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(+0x3a33ea0c15)[0x7f82f8f4ac15]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libvirt.so.0(+0x3a33ea0691)[0x7f82f8f4a691]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libpthread.so.0(+0x365ba07c53)[0x7f82f61ccc53]
Sep 27 09:08:30 saboo libvirtd[24880]: /lib64/libc.so.6(clone+0x6d)[0x7f82f5af2d3d]

more details attached
Comment 1 Jeremy Fitzhardinge 2013-09-27 12:59:01 EDT
Created attachment 804060 [details]
libxl log
Comment 2 Jeremy Fitzhardinge 2013-09-27 13:02:01 EDT
Created attachment 804063 [details]
Domain config xml
Comment 3 Jeremy Fitzhardinge 2013-10-23 13:00:26 EDT
Ping?
Comment 4 Dave Allan 2013-10-23 13:31:13 EDT
Hi Jeremy, Would you mind mentioning this BZ on the upstream libvir-list?  I think that's the best way to get the attention of the folks who know the libxl driver the best.  Thanks, Dave
Comment 5 Jeremy Fitzhardinge 2013-10-23 13:53:14 EDT
OK, done.
Comment 6 Daniel Berrange 2013-10-23 19:25:01 EDT
If you are able to run libvirtd under valgrind, that'd be very helpful - it ought to tell us just what memory alloc is being done wrong.

eg just do this as root

 $ systemctl stop libvirtd.service
 $ valgrind /usr/sbin/libvirtd

and then reproduce the issue with virt-manager.
Comment 7 Jeremy Fitzhardinge 2013-10-24 01:44:18 EDT
Created attachment 815612 [details]
Valgrind output

I think the "Conditional jump or move depends on uninitialised value(s)" are generally spurious because Valgrind doesn't know about the Xen syscalls/hypercalls, but these look real:

==3945== Invalid read of size 4
==3945==    at 0x1E44FD2D: libxl__get_numa_candidate (libxl_numa.c:202)
==3945==    by 0x1E443FA0: libxl__build_pre (libxl_dom.c:167)
==3945==    by 0x1E43A675: libxl__domain_build (libxl_create.c:340)
==3945==    by 0x1E43B748: domcreate_bootloader_done (libxl_create.c:765)
==3945==    by 0x1E457496: bootloader_local_detached_cb (libxl_bootloader.c:281)
==3945==    by 0x1E42CFBC: local_device_detach_cb (libxl.c:2480)
==3945==    by 0x1E431F86: libxl__device_disk_local_initiate_detach (libxl.c:2455)
==3945==    by 0x1E457B55: bootloader_callback (libxl_bootloader.c:265)
==3945==    by 0x1E458A8A: libxl__bootloader_run (libxl_bootloader.c:392)
==3945==    by 0x1E43BCF7: do_domain_create (libxl_create.c:704)
==3945==    by 0x1E43BE30: libxl_domain_create_new (libxl_create.c:1226)
==3945==    by 0x1E1FA7C0: libxlVmStart (libxl_driver.c:617)
==3945==  Address 0x21e8c208 is 8 bytes after a block of size 48 alloc'd
==3945==    at 0x4A08121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3945==    by 0x1E44A923: libxl__zalloc (libxl_internal.c:88)
==3945==    by 0x1E436A3D: libxl_get_cpu_topology (libxl.c:3762)
==3945==    by 0x1E44F71C: libxl__get_numa_candidate (libxl_numa.c:310)
==3945==    by 0x1E443FA0: libxl__build_pre (libxl_dom.c:167)
==3945==    by 0x1E43A675: libxl__domain_build (libxl_create.c:340)
==3945==    by 0x1E43B748: domcreate_bootloader_done (libxl_create.c:765)
==3945==    by 0x1E457496: bootloader_local_detached_cb (libxl_bootloader.c:281)
==3945==    by 0x1E42CFBC: local_device_detach_cb (libxl.c:2480)
==3945==    by 0x1E431F86: libxl__device_disk_local_initiate_detach (libxl.c:2455)
==3945==    by 0x1E457B55: bootloader_callback (libxl_bootloader.c:265)
==3945==    by 0x1E458A8A: libxl__bootloader_run (libxl_bootloader.c:392)

==3945== Invalid read of size 1
==3945==    at 0x1E1FA410: libxlVmStart (libxl_driver.c:475)
==3945==    by 0x1E1FAD9A: libxlDomainCreateWithFlags (libxl_driver.c:2633)
==3945==    by 0x5187D46: virDomainCreate (libvirt.c:9439)
==3945==    by 0x13BAA6: remoteDispatchDomainCreateHelper (remote_dispatch.h:2910)
==3945==    by 0x51DE5B9: virNetServerProgramDispatch (virnetserverprogram.c:435)
==3945==    by 0x51D93E7: virNetServerHandleJob (virnetserver.c:165)
==3945==    by 0x50F5BF4: virThreadPoolWorker (virthreadpool.c:144)
==3945==    by 0x50F5670: virThreadHelper (virthreadpthread.c:161)
==3945==    by 0x8046C52: start_thread (pthread_create.c:308)
==3945==    by 0x8758E1C: clone (clone.S:113)
==3945==  Address 0x23424d81 is 0 bytes after a block of size 1 alloc'd
==3945==    at 0x4A08121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3945==    by 0x50B1F8C: virAllocN (viralloc.c:189)
==3945==    by 0x1E1FA3CA: libxlVmStart (libxl_driver.c:468)
==3945==    by 0x1E1FAD9A: libxlDomainCreateWithFlags (libxl_driver.c:2633)
==3945==    by 0x5187D46: virDomainCreate (libvirt.c:9439)
==3945==    by 0x13BAA6: remoteDispatchDomainCreateHelper (remote_dispatch.h:2910)
==3945==    by 0x51DE5B9: virNetServerProgramDispatch (virnetserverprogram.c:435)
==3945==    by 0x51D93E7: virNetServerHandleJob (virnetserver.c:165)
==3945==    by 0x50F5BF4: virThreadPoolWorker (virthreadpool.c:144)
==3945==    by 0x50F5670: virThreadHelper (virthreadpthread.c:161)
==3945==    by 0x8046C52: start_thread (pthread_create.c:308)
==3945==    by 0x8758E1C: clone (clone.S:113)
==3945== 
==3945== Invalid read of size 1
==3945==    at 0x1E1FA3F0: libxlVmStart (libxl_driver.c:474)
==3945==    by 0x1E1FAD9A: libxlDomainCreateWithFlags (libxl_driver.c:2633)
==3945==    by 0x5187D46: virDomainCreate (libvirt.c:9439)
==3945==    by 0x13BAA6: remoteDispatchDomainCreateHelper (remote_dispatch.h:2910)
==3945==    by 0x51DE5B9: virNetServerProgramDispatch (virnetserverprogram.c:435)
==3945==    by 0x51D93E7: virNetServerHandleJob (virnetserver.c:165)
==3945==    by 0x50F5BF4: virThreadPoolWorker (virthreadpool.c:144)
==3945==    by 0x50F5670: virThreadHelper (virthreadpthread.c:161)
==3945==    by 0x8046C52: start_thread (pthread_create.c:308)
==3945==    by 0x8758E1C: clone (clone.S:113)
==3945==  Address 0x22933028 is 0 bytes after a block of size 24 alloc'd
==3945==    at 0x4A08121: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==3945==    by 0x50B1F23: virAlloc (viralloc.c:142)
==3945==    by 0x50B39EF: virBitmapNew (virbitmap.c:78)
==3945==    by 0x51224E6: virDomainDefParseXML (domain_conf.c:11199)
==3945==    by 0x512957A: virDomainDefParseNode (domain_conf.c:12648)
==3945==    by 0x51296A1: virDomainDefParse (domain_conf.c:12590)
==3945==    by 0x5129ED1: virDomainObjListLoadAllConfigs (domain_conf.c:17209)
==3945==    by 0x1E1FB494: libxlStateInitialize.part.7 (libxl_driver.c:907)
==3945==    by 0x1E1FB7F9: libxlStateInitialize (libxl_driver.c:823)
==3945==    by 0x51739A9: virStateInitialize (libvirt.c:834)
==3945==    by 0x11C36A: daemonRunStateInit (libvirtd.c:906)
==3945==    by 0x50F5670: virThreadHelper (virthreadpthread.c:161)
Comment 8 Martin Kletzander 2013-10-24 05:51:45 EDT
I posted a patch that might help with this:

https://www.redhat.com/archives/libvir-list/2013-October/msg01075.html
Comment 9 Jeremy Fitzhardinge 2013-10-24 13:09:56 EDT
Created attachment 815860 [details]
Updated valgrind output

I updated Valgrind to current svn and tried again. I don't think I reproduced the original problem, but it does show very many instances of use-after-free - it looks like there's some problem cleaning up on shutdown perhaps?
Comment 10 Jeremy Fitzhardinge 2013-10-24 13:14:42 EDT
The use-after-free seems to correlate with:

# valgrind '--track-origins=yes' '--log-file=virtd-vg.log' /usr/sbin/libvirtd
2013-10-24 17:00:53.308+0000: 18149: info : libvirt version: 1.1.3, package: 2.fc19 (Unknown, 2013-10-07-03:34:07, colepc.home)
2013-10-24 17:00:53.308+0000: 18149: error : libxlVmStart:610 : internal error: libxenlight failed to get free memory for domain 'Wheezy'
2013-10-24 17:00:53.340+0000: 18149: error : libxlAutostartDomain:124 : Failed to autostart VM 'Wheezy': internal error: libxenlight failed to get free memory for domain 'Wheezy'
2013-10-24 17:03:02.030+0000: 18139: error : libxlVmStart:610 : internal error: libxenlight failed to get free memory for domain 'Wheezy'
^C
2013-10-24 17:09:18.681+0000: 18138: error : netcfStateCleanup:112 : internal error: Attempt to close netcf state driver with open connections

I'm not sure why the memory allocations are failing - I wonder if that's an artifact of the new Valgrind - but having failed it puts libvirtd into a bad state.
Comment 11 Dave Allan 2013-10-28 14:15:58 EDT
Jeremy, does the patch Martin mentioned in comment 8 resolve the crash?
Comment 12 Jeremy Fitzhardinge 2013-10-28 20:14:16 EDT
I haven't had a chance to test it yet.
Comment 13 Jeremy Fitzhardinge 2013-10-30 13:40:57 EDT
Created attachment 817548 [details]
Patch fixing the problem

Unfortunately Martin's patch doesn't help, because the original code was seriously messed up and not a simple array bounds check from being correct. This patch seems to fix the problem for me.
Comment 14 Eric Blake 2013-10-30 14:18:52 EDT
Can you please also post your patch to libvir-list@redhat.com?  Patch review goes faster when all patches go through the same interface where a simple 'git am' imports the patch, rather than making developers have to manually download from bugzilla.
Comment 15 Cole Robinson 2013-11-17 13:13:07 EST
Patch is upstream now:

commit ba1bf10063a0205c1de12b209b0282833710214f
Author: Jeremy Fitzhardinge <jeremy@goop.org>
Date:   Wed Oct 30 10:38:08 2013 -0700

    libxl: fix dubious cpumask handling in libxlDomainSetVcpuAffinities
Comment 16 Fedora Update System 2013-11-17 18:29:33 EST
libvirt-1.0.5.7-2.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/libvirt-1.0.5.7-2.fc19
Comment 17 Fedora Update System 2013-11-19 00:22:59 EST
Package libvirt-1.0.5.7-2.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-1.0.5.7-2.fc19'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-21644/libvirt-1.0.5.7-2.fc19
then log in and leave karma (feedback).
Comment 18 Jeremy Fitzhardinge 2013-11-19 13:36:37 EST
Looks good.
Comment 19 Fedora Update System 2013-11-20 23:41:24 EST
libvirt-1.0.5.7-2.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.