Bug 1339680 - libvirt CPU driver fails to translate a custom CPU model into something that QEMU recognizes
Summary: libvirt CPU driver fails to translate a custom CPU model into something that ...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jiri Denemark
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: libvirtCPUconfig
TreeView+ depends on / blocked
 
Reported: 2016-05-25 15:14 UTC by Kashyap Chamarthy
Modified: 2016-09-22 13:55 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-09-22 13:55:15 UTC
Embargoed:


Attachments (Terms of Use)
libvirt guest XML that will trigger the failure (591 bytes, text/plain)
2016-05-25 15:14 UTC, Kashyap Chamarthy
no flags Details
libvirt debug log when the error was triggered (93.50 KB, text/plain)
2016-05-25 15:15 UTC, Kashyap Chamarthy
no flags Details
Modified libvirt cpu_map.xml (58.03 KB, text/plain)
2016-05-25 15:30 UTC, Kashyap Chamarthy
no flags Details
libvirtd log with libvirt-1.2.2, where 'gate64' model was successfully translated into something that QEMU can recognize (12.74 MB, text/plain)
2016-05-25 20:04 UTC, Kashyap Chamarthy
no flags Details
libvirt debug log obtained with a reproducer using `virt-install` (refer comment #4) (111.19 KB, text/plain)
2016-05-26 08:45 UTC, Kashyap Chamarthy
no flags Details

Description Kashyap Chamarthy 2016-05-25 15:14:21 UTC
Created attachment 1161480 [details]
libvirt guest XML that will trigger the failure

Description of problem
----------------------

Attempting to define a guest with a CPU model (to allow live migration
in its infrastructure) called "gate64", results in libvirt not
translating the custom model into a model that QEMU can recognize (i.e.
one of the models from `qemu-system-x86_64 -cpu \?`).


Version
-------

libvirt-1.3.4-2.fc25.x86_64
qemu-system-x86-2.6.0-2.fc25.x86_64


How reproducible: Consistently.


Steps to Reproduce
------------------

(0) Enable CPU driver libvirt logging filter ("1:cpu" in 'log_filters'
    in the file /etc/libvirt/libvirtd.conf):

	$ sudo grep -v ^$ /etc/libvirt/libvirtd.conf | grep -v ^#
    log_filters="1:libvirt 1:qemu 1:conf 1:security 3:object 3:event 3:json 3:file 1:util 1:cpu"
    log_outputs="1:file:/var/log/libvirt/libvirtd.log" 

(1) Fetch a script that'll create a custom CPU model called 'gate64'

    $ wget http://git.openstack.org/cgit/openstack-dev/devstack/plain/tools/cpu_map_update.py

(2) Update libvirt's CPU mapping file:

    $ sudo cpu_map_update.py /usr/share/libvirt/cpu_map.xml

(3) Restart libvirt daemon

	$ systemctl restart libvirtd

(4) Check if the new CPU model 'gate64') is picked up by libvirt:

    $ sudo virsh cpu-models x86_64 | grep gate64
    gate64

(5) Define the attached test XML file (vm1.xml), and start it:

    $ virsh define vm1.xml
    $ virsh start vm1

Actual results
--------------

$ virsh start vm1
error: Failed to start domain vm1
error: internal error: process exited while connecting to monitor: 2016-05-25T14:33:59.497700Z qemu-system-x86_64: Unable to find CPU definition: gate64

Libvirt debug log attached.

Selected errors:

[...]
2016-05-25 14:53:30.535+0000: 27066: error : qemuProcessReportLogError:1813 : internal error: early end of file from monitor, possible problem: 2016-05-25T14:53:30.510513Z qemu-system-x86_64: Unable to find CPU definition: gate64
[...]

Expected results
----------------

libvirt should translate the custom CPU model into a CPU definition that
QEMU can understand, resulting in a successful start of the guest.


QEMU command-line
-----------------

2016-05-25 14:53:30.349+0000: 27069: debug : virCommandRunAsync:2429 : About to run LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -name vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-vm1/master-key.aes -machine pc-i440fx-2.6,accel=tcg,usb=off -cpu gate64 -m 2048 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 5b766c8d-25b2-4985-a05c-5839b1eaf8a8 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-vm1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -no-acpi -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -netdev tap,fd=27,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a4:65:cf,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg timestamp=on

Comment 1 Kashyap Chamarthy 2016-05-25 15:15:28 UTC
Created attachment 1161481 [details]
libvirt debug log when the error was triggered

Comment 2 Kashyap Chamarthy 2016-05-25 15:30:34 UTC
Created attachment 1161486 [details]
Modified libvirt cpu_map.xml

Comment 3 Jiri Denemark 2016-05-25 15:31:20 UTC
So it looks like the whole code which computes the right CPU model is skipped. The reason is <domain type='qemu'>. Our code avoids comparing guest CPU definition to host CPU for TCG mode (since the host CPU is irrelevant in this case). And as a side effect the code that would translate the gate64 CPU model into something that is supported by QEMU is skipped too.

Comment 4 Kashyap Chamarthy 2016-05-25 15:32:20 UTC
Additional info
---------------

When I try to reproduce the error by trying via `virt-install`:

    $ sudo virsh cpu-models x86_64 | grep gate64
    gate64

    $ sudo virt-install --name cvm1 --ram 2048 \
		--disk path=/export/cirros-0.3.4.qcow2,format=qcow2 \
    	--nographics --import --os-variant fedora22 --cpu gate64
    
    Starting install...
    ERROR    internal error: Unknown CPU model gate64
    Domain installation does not appear to have been successful.
	[...]

I get a specific error from the CPU driver in the libvirt debug log (but
in the reproducer provided in the Description, I don't see such an
error)

[...]
2016-05-25 14:00:36.267+0000: 27068: error : x86ModelFromCPU:889 : internal error: Unknown CPU model gate64
2016-05-25 14:00:36.267+0000: 27068: debug : cpuDataFree:299 : data=(nil)
2016-05-25 14:00:36.267+0000: 27068: debug : cpuDataFree:299 : data=(nil)
[...]

Which seems to be coming from src/cpu/cpu_x86.c

[...]
 878 static struct x86_model *
 879 x86ModelFromCPU(const virCPUDef *cpu,
 880                 const struct x86_map *map,
 881                 int policy)
 882 {
 883     struct x86_model *model = NULL;
 884     size_t i;
 885 
 886     if (policy == VIR_CPU_FEATURE_REQUIRE) {
 887         if ((model = x86ModelFind(map, cpu->model)) == NULL) {
 888             virReportError(VIR_ERR_INTERNAL_ERROR,
 889                            _("Unknown CPU model %s"), cpu->model);
 890             goto error;
 891         }
[...]

Comment 5 Kashyap Chamarthy 2016-05-25 20:04:49 UTC
Created attachment 1161612 [details]
libvirtd log with libvirt-1.2.2, where 'gate64' model was successfully translated into something that QEMU can recognize

Comment 6 Jiri Denemark 2016-05-26 07:10:33 UTC
(In reply to Kashyap Chamarthy from comment #4)
> [...]
> 2016-05-25 14:00:36.267+0000: 27068: error : x86ModelFromCPU:889 : internal
> error: Unknown CPU model gate64
> 2016-05-25 14:00:36.267+0000: 27068: debug : cpuDataFree:299 : data=(nil)
> 2016-05-25 14:00:36.267+0000: 27068: debug : cpuDataFree:299 : data=(nil)
> [...]

This is not very useful. Would you mind attaching a complete log with your virt-install reproducer so that we can check the difference?

Comment 7 Kashyap Chamarthy 2016-05-26 07:38:20 UTC
(In reply to Jiri Denemark from comment #6)
> (In reply to Kashyap Chamarthy from comment #4)
> > [...]
> > 2016-05-25 14:00:36.267+0000: 27068: error : x86ModelFromCPU:889 : internal
> > error: Unknown CPU model gate64
> > 2016-05-25 14:00:36.267+0000: 27068: debug : cpuDataFree:299 : data=(nil)
> > 2016-05-25 14:00:36.267+0000: 27068: debug : cpuDataFree:299 : data=(nil)
> > [...]
> 
> This is not very useful. Would you mind attaching a complete log with your
> virt-install reproducer so that we can check the difference?

I already did so yesterday, this is this attachment:

    https://bugzilla.redhat.com/attachment.cgi?id=1161612

    libvirtd log with libvirt-1.2.2, where 'gate64' model was 
    successfully translated into something that QEMU can recognize

Comment 8 Jiri Denemark 2016-05-26 08:30:51 UTC
(In reply to Kashyap Chamarthy from comment #7)
> 
> I already did so yesterday, this is this attachment:
> 
>     https://bugzilla.redhat.com/attachment.cgi?id=1161612
> 
>     libvirtd log with libvirt-1.2.2, where 'gate64' model was 
>     successfully translated into something that QEMU can recognize

No, this log just proves that it used to work with libvirt-1.2.2. But you mentioned that if you try with virt-install, you get an "Unknown CPU model gate64" error from libvirt. And a log showing this error is what I'm interested in.

BTW, this bug (in fact a regression) is caused by v1.2.9-31-g445a09b "qemu: Don't compare CPU against host for TCG".

Comment 9 Kashyap Chamarthy 2016-05-26 08:37:12 UTC
(In reply to Jiri Denemark from comment #8)
> (In reply to Kashyap Chamarthy from comment #7)
> > 
> > I already did so yesterday, this is this attachment:
> > 
> >     https://bugzilla.redhat.com/attachment.cgi?id=1161612
> > 
> >     libvirtd log with libvirt-1.2.2, where 'gate64' model was 
> >     successfully translated into something that QEMU can recognize
> 
> No, this log just proves that it used to work with libvirt-1.2.2. But you
> mentioned that if you try with virt-install, you get an "Unknown CPU model
> gate64" error from libvirt. And a log showing this error is what I'm
> interested in.

Duh, right -- I'll  blame it on the coffee.  I have that log too, forgot to attach.

New attachment upcoming...


> BTW, this bug (in fact a regression) is caused by v1.2.9-31-g445a09b "qemu:
> Don't compare CPU against host for TCG".

Thanks for bisecting.  I'll go read the details.

Comment 10 Kashyap Chamarthy 2016-05-26 08:45:38 UTC
Created attachment 1161791 [details]
libvirt debug log obtained with a reproducer using `virt-install` (refer comment #4)

Comment 11 Jiri Denemark 2016-05-26 11:46:42 UTC
So the virt-install case is different because it uses <domain type='kvm'>. The only question is why it can't find gate64 model. But that's unrelated to this bug.

Comment 12 Kashyap Chamarthy 2016-05-26 12:25:49 UTC
(In reply to Jiri Denemark from comment #11)
> So the virt-install case is different because it uses <domain type='kvm'>.

Yep, you're right: It's a different code path with <domain type='kvm'>.

I just tested the `virt-install` case with the parameter '--virt-type qemu', and indeed, the errors correlate with those from the reproducer using test case guest XML (which uses <domain type='qemu'>) from the bug description.

> The only question is why it can't find gate64 model. But that's unrelated to
> this bug.

Comment 13 Jiri Denemark 2016-09-22 13:55:15 UTC
This should be fixed as of

commit 7ce711a30eaf882ccd0217b2528362b563b6d670
Refs: v2.2.0-199-g7ce711a
Author:     Jiri Denemark <jdenemar>
AuthorDate: Wed Jun 22 15:53:48 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Sep 22 15:40:09 2016 +0200

    qemu: Update guest CPU def in live XML

    Storing the updated CPU definition in the live domain definition saves
    us from having to update it over and over when we need it. Not to
    mention that we will soon further update the CPU definition according to
    QEMU once it's started.

    A highly wanted side effect of this patch, libvirt will pass all CPU
    features explicitly specified in domain XML to QEMU, even those that are
    already included in the host model.

    This patch should fix the following bugs:
        https://bugzilla.redhat.com/show_bug.cgi?id=1207095
        https://bugzilla.redhat.com/show_bug.cgi?id=1339680
        https://bugzilla.redhat.com/show_bug.cgi?id=1371039
        https://bugzilla.redhat.com/show_bug.cgi?id=1373849
        https://bugzilla.redhat.com/show_bug.cgi?id=1375524
        https://bugzilla.redhat.com/show_bug.cgi?id=1377913

    Signed-off-by: Jiri Denemark <jdenemar>


Note You need to log in before you can comment on or make changes to this bug.