Bug 1795651 - Can't start or migrate s390x domains using old (~3-4 years) machine types
Summary: Can't start or migrate s390x domains using old (~3-4 years) machine types
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: s390x
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Jiri Denemark
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-28 14:53 UTC by Christian Ehrhardt
Modified: 2020-02-07 12:34 UTC (History)
5 users (show)

Fixed In Version: libvirt-6.1.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-07 08:32:38 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1861125 0 None None None 2020-01-28 14:53:06 UTC

Description Christian Ehrhardt 2020-01-28 14:53:06 UTC
Description of problem:
Libvirt defaults to model an s390x CPU but some older machine types are unable to consume that therefore starting old XML guests or migrations to a old->new system fail.

Version-Release number of selected component (if applicable):
- libvirt 6.0
- qemu 4.2

How reproducible:
100%

Steps to Reproduce:
1. define a simple guest XML with old machine type
$ cat breakme.xml
<domain type='kvm'>
  <name>breakme</name>
  <memory unit='KiB'>524288</memory>
  <os>
    <type arch='s390x' machine='s390-ccw-virtio-2.5'>hvm</type>
  </os>
</domain>
$ virsh define breakme.xml
Domain breakme defined from breakme.xml

2. check the cpu model it got
I get cpu + old-type:
root@testkvm-focal-to:~# virsh dumpxml breakme | grep cpu
  <cpu mode='host-model' check='partial'/>

3. That breaks on start:
# virsh start breakme
error: Failed to start domain breakme
error: internal error: qemu unexpectedly closed the monitor: 2020-01-28T14:37:27.477232Z qemu-system-s390x: CPU models are not available: KVM doesn't support CPU models


Actual results:
Fails to start the guest:
qemu-system-s390x: CPU models are not available: KVM doesn't support CPU models

Reason:
It creates a commandline that can't work.


Expected results:
Guest with old machine type can be started-on and/or migrated-to new systems.


Additional info:
I found this while testing migration across releases.
If migrating a guest that was initially started on Xenial (qemu 2.5) to a new virtualization stack (qemu 4.2 libvirt 6.0) it fails.

The reason is that the old system has defined the machine type of the time e.g.
  machine type: s390-ccw-virtio-2.5
The matching commandline is part:
  -machine s390-ccw-virtio-2.5,accel=kvm,usb=off
  -cpu (not present here)

But on the migration to the new stack this becomes:
  -machine s390-ccw-virtio-2.5,accel=kvm,usb=off (kept from the source of the migration)
  -cpu z13-base,aen=on,aefsi=on,... (modelled on the target)

The problem is that s390x prior to qemu 2.8 had no cpu modelling.
Therefore it fails with:
  qemu-system-s390x: CPU models are not available: KVM doesn't support CPU models

Source XML:
<domain type='kvm' id='3'>
  <name>x-guest2</name>
  <uuid>c4545c20-5415-4f2c-86c7-df77eab1f6bb</uuid>
  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>524288</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='s390x' machine='s390-ccw-virtio-2.5'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-system-s390x</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/uvtool/libvirt/images/x-guest2.qcow'/>
      <backingStore type='file' index='1'>
        <format type='qcow2'/>
        <source file='/var/lib/uvtool/libvirt/images/x-uvt-b64-Y29tLnVidW50dS5jbG91ZC5kYWlseTpzZXJ2ZXI6MTYuMDQ6czM5MHggMjAyMDAxMjE='/>
        <backingStore/>
      </backingStore>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/uvtool/libvirt/images/x-guest2-ds.qcow'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0001'/>
    </disk>
    <interface type='network'>
      <mac address='52:54:00:39:e7:fb'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0002'/>
    </interface>
    <console type='pty' tty='/dev/pts/1'>
      <source path='/dev/pts/1'/>
      <target type='sclp' port='0'/>
      <alias name='console0'/>
    </console>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0003'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='apparmor' relabel='yes'>
    <label>libvirt-c4545c20-5415-4f2c-86c7-df77eab1f6bb</label>
    <imagelabel>libvirt-c4545c20-5415-4f2c-86c7-df77eab1f6bb</imagelabel>
  </seclabel>
</domain>

Source log/commandline:
2020-01-28 13:58:28.719+0000: starting up libvirt version: 1.3.1, package: 1ubuntu10.29 (Matthew Ruffell <matthew.ruffell> Thu, 31 Oct 2019 10:52:41 +1300), qemu version: 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.42), hostname: testkvm-xenial-from
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-s390x -name x-guest2 -S -machine s390-ccw-virtio-2.5,accel=kvm,usb=off -m 512 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid c4545c20-5415-4f2c-86c7-df77eab1f6bb -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-x-guest2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -drive file=/var/lib/uvtool/libvirt/images/x-guest2.qcow,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-ccw,scsi=off,devno=fe.0.0000,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/var/lib/uvtool/libvirt/images/x-guest2-ds.qcow,format=raw,if=none,id=drive-virtio-disk1 -device virtio-blk-ccw,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=25,id=hostnet0 -device virtio-net-ccw,netdev=hostnet0,id=net0,mac=52:54:00:39:e7:fb,devno=fe.0.0002 -chardev pty,id=charconsole0 -device sclpconsole,chardev=charconsole0,id=console0 -device virtio-balloon-ccw,id=balloon0,devno=fe.0.0003 -msg timestamp=on
char device redirected to /dev/pts/1 (label charconsole0)

Target log/commandline:
2020-01-28 13:59:04.396+0000: starting up libvirt version: 6.0.0, package: 0ubuntu1~ppa2 (Christian Ehrhardt <christian.ehrhardt> Mon, 13 Jan 2020 13:14:14 +0100), qemu version: 4.2.0Debian 1:4.2-1ubuntu1~ppa4, kernel: 5.4.0-9-generic, hostname: testkvm-focal-to
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
HOME=/var/lib/libvirt/qemu/domain-3-x-guest2 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-3-x-guest2/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-3-x-guest2/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-3-x-guest2/.config \
QEMU_AUDIO_DRV=none \
/usr/bin/qemu-system-s390x \
-name guest=x-guest2,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-x-guest2/master-key.aes \
-machine s390-ccw-virtio-2.5,accel=kvm,usb=off,dump-guest-core=off \
-cpu z13-base,aen=on,aefsi=on,msa5=on,msa4=on,msa3=on,msa2=on,msa1=on,sthyi=on,edat=on,ri=on,edat2=on,vx=on,ipter=on,cei=on,ap=on,gpereh=on,esop=on,ib=on,siif=on,ibs=on,apqi=on,apft=on,sief2=on,apqci=on,cte=on,bpb=on,64bscao=on,ppa15=on,zpci=on,sea_esop2=on,te=on,cmm=on,gsls=on \
-m 512 \
-overcommit mem-lock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid c4545c20-5415-4f2c-86c7-df77eab1f6bb \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=30,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-boot strict=on \
-blockdev '{"driver":"file","filename":"/var/lib/uvtool/libvirt/images/x-uvt-b64-Y29tLnVidW50dS5jbG91ZC5kYWlseTpzZXJ2ZXI6MTYuMDQ6czM5MHggMjAyMDAxMjE=","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-3-format","read-only":true,"driver":"qcow2","file":"libvirt-3-storage","backing":null}' \
-blockdev '{"driver":"file","filename":"/var/lib/uvtool/libvirt/images/x-guest2.qcow","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"}' \
-device virtio-blk-ccw,scsi=off,devno=fe.0.0000,drive=libvirt-2-format,id=virtio-disk0,bootindex=1 \
-blockdev '{"driver":"file","filename":"/var/lib/uvtool/libvirt/images/x-guest2-ds.qcow","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' \
-device virtio-blk-ccw,scsi=off,devno=fe.0.0001,drive=libvirt-1-format,id=virtio-disk1 \
-netdev tap,fd=32,id=hostnet0 \
-device virtio-net-ccw,netdev=hostnet0,id=net0,mac=52:54:00:39:e7:fb,devno=fe.0.0002 \
-chardev pty,id=charconsole0 \
-device sclpconsole,chardev=charconsole0,id=console0 \
-incoming defer \
-device virtio-balloon-ccw,id=balloon0,devno=fe.0.0003 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/0 (label charconsole0)
2020-01-28T13:59:04.627166Z qemu-system-s390x: CPU models are not available: KVM doesn't support CPU models
2020-01-28 13:59:04.807+0000: shutting down, reason=failed

Migration command:
 $ virsh migrate --live x-guest2 qemu+ssh://10.226.99.253/system


---

I see no way to just remove the cpu type
  <cpu mode='host-model' check='partial'/>
as it is automatically added back.

On that note, host-passthrough existed back then on the 2.5 type I use.
Therefore making the guest definition to be:
  <cpu mode='host-passthrough'/>
makes it work.
So older types might want that (instead of nothing)

Works:
$ cat works.xml
<domain type='kvm'>
  <name>breakme</name>
  <memory unit='KiB'>524288</memory>
  <cpu mode='host-passthrough'/>
  <os>
    <type arch='s390x' machine='s390-ccw-virtio-2.5'>hvm</type>
  </os>
</domain>


I guess we need some compat check on machine-type version vs the auto-addition of cpu host-model?

Comment 1 Jiri Denemark 2020-01-28 15:24:37 UTC
Libvirt should only add <cpu mode='host-model'/> default CPU to the domain
definition if query-machines QMP command returns "host-s390x-cpu" as
default-cpu-type for the machine type used by the domain.

Can you please check what QEMU started as

    /usr/bin/qemu-system-s390x -machine none,accel=kvm -nodefaults -nographic

tells about s390-ccw-virtio-2.5 machine type when you call the query-machines
QMP command?

I guess it says "default-cpu-type": "host-s390x-cpu". If this is the case, we
either need to fix QEMU not to advertise default CPUs for machine type for
which using CPU models would not work or we need to add some kind of hack to
libvirt. I'd prefer if QEMU could stop reporting the default cpu as it has the
best knowledge about machine types which support CPU modeling.

Comment 2 David Hildenbrand 2020-01-28 16:34:35 UTC
(In reply to Jiri Denemark from comment #1)
> Libvirt should only add <cpu mode='host-model'/> default CPU to the domain
> definition if query-machines QMP command returns "host-s390x-cpu" as
> default-cpu-type for the machine type used by the domain.
> 
> Can you please check what QEMU started as
> 
>     /usr/bin/qemu-system-s390x -machine none,accel=kvm -nodefaults -nographic
> 
> tells about s390-ccw-virtio-2.5 machine type when you call the query-machines
> QMP command?
> 
> I guess it says "default-cpu-type": "host-s390x-cpu". If this is the case, we
> either need to fix QEMU not to advertise default CPUs for machine type for
> which using CPU models would not work or we need to add some kind of hack to
> libvirt. I'd prefer if QEMU could stop reporting the default cpu as it has
> the
> best knowledge about machine types which support CPU modeling.

Well ... "-cpu host" works just fine under any machine (even without cpu model support). So I don't think "default-cpu-type=host-s390x-cpu" is wrong for older machine types.

Comment 3 David Hildenbrand 2020-01-28 16:45:16 UTC
I guess the issue here is that there is no way to identify if a QEMU machine supports CPU models - except trying to start something very basic like "-cpu z900" and getting told that it does not work.

Nobody ever stumbled over this, because with the old QEMU versions/machines, I guess there weren't any real enterprise users. Migration without CPU model support is not guaranteed to work either way - especially when migrating between different HW/Hypervisors.

Having that said, I don't think introducing new interfaces (e.g., exporting if a QEMU machine supports CPU models) is worth it. We could blacklist the affected QEMU machines in libvirt, but yeah, that's always hacky.

Comment 4 David Hildenbrand 2020-01-28 16:48:38 UTC
Oh, and the lack of CPU model support in KVM on rather old kernels adds another level of complexity. Any QEMU machine won't be able to start anything besides "-cpu host" in case KVM misses the right interfaces.

Comment 5 Christian Ehrhardt 2020-01-29 07:18:15 UTC
I think the discussion went already further with David explaining why we might just consider this unsupported/Won't-Fix in his opinion - but for completeness let me report the info that Jiri asked for.

Just running on QMP:
{ "execute": "qmp_capabilities" }
{ "execute": "query-machines" }
And reporting the section about s390-ccw-virtio-2.5 used in the reported case.

Old qemu 2.5:
{"name": "s390-ccw-virtio-2.5", "cpu-max": 255}

New qemu 4.2:
{"hotpluggable-cpus": true, "name": "s390-ccw-virtio-2.5", "numa-mem-supported": false, "default-cpu-type": "host-s390x-cpu", "cpu-max": 248, "deprecated": false}

So the assumption that the old type now reports "default-cpu-type": "host-s390x-cpu" was correct.



@David: while "Migration without CPU model support is not guaranteed to work either way - especially when migrating between different HW/Hypervisors" is true, it was working quite well the last few years as long as you stayed on the same machine (which is common and trivial on s390x due to the LPARs as you know).
And about "weren't any real enterprise users", you know how this works on the mainframe - you have very few early adopters and if you burn them too much no one is left :-)


I agree that the current QMP answer of "host-s390x-cpu" seems correct and adding a new interface definitely seems too much for this.
But if a quirk in libvirt to detect the set of older machine-types to handle them differently wouldn't be too hard I'd appreciate that solution (vs a Won't Fix) for sure.

Comment 6 Jiri Denemark 2020-01-29 12:30:26 UTC
Yeah, that's what I figured too. The current "host-s390x-cpu" answer is
correct as using -cpu host will work and thus changing this in QEMU looks
inappropriate. On the other hand, I believe libvirt is the only consumer of
this API :-) Anyway, the question we need to answer is whether we should use
-cpu host or -cpu $HOST_MODEL for a given machine type. Do you have a complete
list of machine types which do not work with host-model?

Comment 7 David Hildenbrand 2020-01-31 13:42:52 UTC
Oh, that question was for me, missed it :)

Basically everything older than 2.8

s390-ccw-virtio-2.4
s390-ccw-virtio-2.5
s390-ccw-virtio-2.6
s390-ccw-virtio-2.7

+ some distro forks of that (Christian might want to comment on that).

The "missing KVM support" is harder to catch. It can strike with any machine (on fairly old kernels).

Comment 8 Christian Ehrhardt 2020-02-01 09:36:18 UTC
Ack - the list above is complete for the upstream source - thanks David.

On downstreams we would have to also match that to s390-ccw-virtio-xenial (and other distros might have other cases), but that obviously would not be part of an upstream commit.

And further for downstreams - Ubuntu as an example - >2.8 will actually turn into 2.11 as that is the next version still available and supported.

Comment 9 Jiri Denemark 2020-02-06 10:25:56 UTC
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2020-February/msg00241.html

Comment 10 Jiri Denemark 2020-02-07 08:32:38 UTC
This is now fixed by

commit c6ff3d1535480ef7fac8c9911ad7715f7d0e2acb
Refs: v6.0.0-322-gc6ff3d1535
Author:     Jiri Denemark <jdenemar>
AuthorDate: Thu Feb 6 10:22:23 2020 +0100
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Feb 7 09:19:02 2020 +0100

    qemu_capabilities: Disable CPU models on old s390 machine types

    Starting a KVM domain on s390 with old machine type (such as
    s390-ccw-virtio-2.5) and without any guest CPU model configured fails
    with

        CPU models are not available: KVM doesn't support CPU models

    QEMU error. This is cause by libvirt using host-model CPU as the default
    CPU based on QEMU reporting "host" CPU model as being the default one
    (see commit v5.9.0-402-g24d8202294: qemu: Use host-model CPU on s390 by
    default). However, even though both QEMU and KVM support CPU models on
    s390 and QEMU can give us the host-model CPU, we can't use it with old
    machine types which only support -cpu host.

    https://bugzilla.redhat.com/show_bug.cgi?id=1795651

    Reported-by: Christian Ehrhardt <paelzer>
    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Ján Tomko <jtomko>

Comment 11 Christian Ehrhardt 2020-02-07 12:34:38 UTC
Tests:
- start old common type s390-ccw-virtio-2.5
- start old ubunut type s390-ccw-virtio-xenial
- migrate from old installation that was pre 2.8

Comparing:
a) libvirt 6.0 (6.0.0-0ubuntu2)
b) libvirt 6.0 + this series (6.0.0-0ubuntu3~test1)

a) failed in all cases with the expected
   qemu-system-s390x: CPU models are not available: KVM doesn't support CPU models
b) all three cases worked fine now


Special case:
If I tried to start the formerly defined "breakme" domains they got added
  <cpu mode='host-model' check='partial'/>
Therefore they now fail with:
  error: Failed to start domain breakme
  error: unsupported configuration: CPU mode 'host-model' for s390x kvm domain on s390x host is not supported by hypervisor

If I undefine, and define again fromt the template as reported in the BZ
  <domain type='kvm'>
    <name>breakme</name>
    <memory unit='KiB'>524288</memory>
    <os>
      <type arch='s390x' machine='s390-ccw-virtio-2.5'>hvm</type>
    </os>
  </domain>
I now get after define:
  <cpu mode='host-passthrough' check='none'/>
So it detected things correctly detecting the old type.

A cross check using a new type like s390-ccw-virtio-4.0 or s390-ccw-virtio-eoan worked fine and gave me the epxected
  <cpu mode='host-model' check='partial'/>

Replying to the list with Tested-by.
Thanks for the patches.


Note You need to log in before you can comment on or make changes to this bug.