Bug 1495171 - Post libvirt upgrade to 3.2.0-14, migration fails with -- "can't apply global Haswell-noTSX-x86_64-cpu.cmt=off: Property '.cmt' not found"
Summary: Post libvirt upgrade to 3.2.0-14, migration fails with -- "can't apply global...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.2
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: pre-dev-freeze
: ---
Assignee: Jiri Denemark
QA Contact: zhe peng
URL:
Whiteboard:
Keywords: ZStream
: 1497320 (view as bug list)
Depends On:
Blocks: libvirtCPUconfig 1508549
TreeView+ depends on / blocked
 
Reported: 2017-09-25 12:03 UTC by Sergii Mykhailushko
Modified: 2018-04-10 10:58 UTC (History)
29 users (show)

(edit)
Previously, the libvirt service in some cases added the "cmt" CPU feature incompatible with the QEMU emulator to KVM guest virtual machines with CPU set to "host-model". As a consequence, migrating or restoring these guests failed. With this update, libvirt no longer adds "cmt" to domain features and automatically removes "cmt" from guest configuration if present. As a result, the affected guests can be migrated and restored correctly.
Clone Of:
: 1508010 1508549 (view as bug list)
(edit)
Last Closed: 2018-04-10 10:57:19 UTC


Attachments (Terms of Use)
To be migrated Nova instance XML from source Compute node (5.58 KB, text/plain)
2017-09-28 17:58 UTC, Kashyap Chamarthy
no flags Details
Nova instance guest XML at the time of launch (2017-01-20 10:06:44) on the source host (4.60 KB, text/plain)
2017-09-29 09:28 UTC, Kashyap Chamarthy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:0704 None None None 2018-04-10 10:58 UTC

Description Sergii Mykhailushko 2017-09-25 12:03:22 UTC
Description of problem:

Customer is applying the latest updates on their RHOSP8 overcloud. However, as they need to live-migrate the instances between the upgraded compute nodes (as the node has to be rebooted to apply the updates) to avoid guest downtime, migration fails with libvirt errors mentioning a disabled CPU property that is not found by the hypervisor:

~~~
2017-09-22 16:29:52.457 50466 ERROR nova.virt.libvirt.driver [req-d8675da4-9c0d-4993-9850-beae5159f008 5ebbe90c089647d9a5a9ad55cb67fed5 1058aeb310b944929945a7b177fd66f3 - - -] [instance: 92a68a77-6e75-43e5-9e7f-1392838ef18f] Live Migration failure: internal error: qemu unexpectedly closed the monitor: 2017-09-22T15:29:52.179516Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/5 (label charserial1)
2017-09-22T15:29:52.198139Z qemu-kvm: can't apply global Haswell-noTSX-x86_64-cpu.cmt=off: Property '.cmt' not found
2017-09-22 16:29:52.577 50466 ERROR nova.virt.libvirt.driver [req-d8675da4-9c0d-4993-9850-beae5159f008 5ebbe90c089647d9a5a9ad55cb67fed5 1058aeb310b944929945a7b177fd66f3 - - -] [instance: 92a68a77-6e75-43e5-9e7f-1392838ef18f] Migration operation has aborted
~~~


Version-Release number of selected component (if applicable):

Same on both source and target nodes

libvirt-3.2.0-14.el7_4.3.x86_64 
openstack-nova-api-12.0.6-17.el7ost.noarch
openstack-nova-cert-12.0.6-17.el7ost.noarch
openstack-nova-common-12.0.6-17.el7ost.noarch
openstack-nova-compute-12.0.6-17.el7ost.noarch
openstack-nova-conductor-12.0.6-17.el7ost.noarch
openstack-nova-console-12.0.6-17.el7ost.noarch
openstack-nova-migration-12.0.6-17.el7ost.noarch
openstack-nova-novncproxy-12.0.6-17.el7ost.noarch
openstack-nova-scheduler-12.0.6-17.el7ost.noarch
python-nova-12.0.6-17.el7ost.noarch
python-novaclient-3.1.0-2.el7ost.noarch


Expected results:

Live migration works fine to reduce instances downtime

Comment 2 Kashyap Chamarthy 2017-09-28 17:36:11 UTC
First off, from looking at the logs, the source libvirt is NOT properly upgraded to libvirt-3.2.0-14.el7_4.3.x86_64.  Let's see why.

You see the libvirt version on the source host in the logs as 'libvirt-3.2.0-14.el7_4.3.x86_64' because the upgrade was performed _while_ the Nova instance is still running on the source host with old libvirt / QEMU versions.

So, the following versions of libvirt is what you see on source and destination hosts for the Nova instance 92a68a77-6e75-43e5-9e7f-1392838ef18f ('instance-000010c8') that's being migrated.  (Found them by looking at the relevant QEMU command-lines for the Nova instance /var/log/libvirt/qemu/instance-000010c8.log on source and destination hosts):

  - Source (cl-ra15-n8.mgt.cluster):
     - libvirt version: 1.2.17, package: 13.el7_2.4    
     - qemu-kvm-rhev-2.3.0-31.el7_2.13)

  - Destination (cl-ra15-n24.mgt.cluster):
     - libvirt version: 3.2.0, package: 14.el7_4.3
     - qemu-kvm-rhev-2.9.0-10.el7

Comment 3 Kashyap Chamarthy 2017-09-28 17:39:37 UTC
The following is the QEMU command-line for the Nova instance
'92a68a77-6e75-43e5-9e7f-1392838ef18f' ('instance-000010c8') from both
source and destination hosts.


QEMU invocation on source host:
-----------------------------------------------------------------------
2017-01-20 10:06:45.332+0000: starting up libvirt version: 1.2.17, package: 13.el7_2.4 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2016-03-02-11:10:27, x86-034.build.eng.bos.redhat.com), qemu version: 2.3.0 (qemu-kvm-rhev-2.3.0-31.el7_2.13)
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name instance-000010c8 -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Haswell-noTSX,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 92a68a77-6e75-43e5-9e7f-1392838ef18f -smbios type=1,manufacturer=Red Hat,product=OpenStack Compute,version=12.0.3-1.el7ost,serial=d336b2a1-4abd-4602-9bc4-4aeb20f778e7,uuid=92a68a77-6e75-43e5-9e7f-1392838ef18f,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-instance-000010c8/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/92a68a77-6e75-43e5-9e7f-1392838ef18f/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:12:82:a5,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:df:df:79,bus=pci.0,addr=0x4 -chardev file,id=charserial0,path=/var/lib/nova/instances/92a68a77-6e75-43e5-9e7f-1392838ef18f/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
char device redirected to /dev/pts/0 (label charserial1)
-----------------------------------------------------------------------

QEMU invocation on destination host:
-----------------------------------------------------------------------
2017-09-22 15:29:52.078+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08:54:01, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: cl-ra15-n24.mgt.cluster
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-000010c8,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-instance-000010c8/master-key.aes -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX,vme=on,ds=off,acpi=off,ss=on,ht=off,tm=off,pbe=off,dtes64=off,monitor=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,dca=off,osxsave=off,f16c=on,rdrand=on,arat=off,tsc_adjust=off,cmt=off,xsaveopt=on,pdpe1gb=on,abm=on,hypervisor=on -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 92a68a77-6e75-43e5-9e7f-1392838ef18f -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=12.0.3-1.el7ost,serial=d336b2a1-4abd-4602-9bc4-4aeb20f778e7,uuid=92a68a77-6e75-43e5-9e7f-1392838ef18f,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-11-instance-000010c8/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/92a68a77-6e75-43e5-9e7f-1392838ef18f/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:12:82:a5,bus=pci.0,addr=0x3 -netdev tap,fd=34,id=hostnet1,vhost=on,vhostfd=35 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=fa:16:3e:df:df:79,bus=pci.0,addr=0x4 -netdev tap,fd=36,id=hostnet2,vhost=on,vhostfd=37 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=fa:16:3e:5a:b8:c4,bus=pci.0,addr=0x7 -add-fd set=6,fd=39 -chardev file,id=charserial0,path=/dev/fdset/6,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:5 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
2017-09-22T15:29:52.179516Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/5 (label charserial1)
2017-09-22T15:29:52.198139Z qemu-kvm: can't apply global Haswell-noTSX-x86_64-cpu.cmt=off: Property '.cmt' not found
2017-09-22 15:29:52.228+0000: shutting down, reason=failed
-----------------------------------------------------------------------

Comment 4 Kashyap Chamarthy 2017-09-28 17:58 UTC
Created attachment 1332074 [details]
To be migrated Nova instance XML from source Compute node

Comment 5 Kashyap Chamarthy 2017-09-28 18:06:48 UTC
[Drilling down a bit further.  Also thanks to Daniel Berrangé for help 
in investigating this so far.]

If you see the Nova instance XML
(https://bugzilla.redhat.com/attachment.cgi?id=1332074) prepared by
libvirt on the source host, you'll see:

    <cpu mode="custom" match="exact" check="partial">
      <model fallback="allow">Haswell-noTSX</model>
      [...]
      <feature policy="require" name="cmt"/>
      [...]

And the original arguments used to boot the guest did not say anything
about the 'cmt' feature, it just ran: 

    [...]
    -cpu Haswell-noTSX,+abm,+pdpe1gb,+rdrand,+f16c,+osxsave,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme
    [...]
    

But when migrated to the destination host, libvirt is trying to run:

    [...]
    -cpu Haswell-noTSX,vme=on,ds=off,acpi=off,ss=on,ht=off,tm=off,pbe=off,dtes64=off,monitor=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,dca=off,osxsave=off,f16c=on,rdrand=on,arat=off,tsc_adjust=off,cmt=off,xsaveopt=on,pdpe1gb=on,abm=on,hypervisor=on
    [...]

Notice the "cmt=off" bit above.

So despite having "<feature policy="require" name="cmt"/>", libvirt 
seems to be still setting 'cmt' to 'off' on the destination.


Jiri: any insights here?

Comment 6 Kashyap Chamarthy 2017-09-29 09:08:20 UTC
From comment#5, it seems that the new libvirt on the source somehow thinks it needs 'cmt=on' (refer: https://bugzilla.redhat.com/attachment.cgi?id=1332074) in the guest XML when it wasn't there originally -- notice this from the '-cpu' QEMU command-line of the Nova instance on the source host.

---

Just noting for context, a related (fixed) bug:

    https://bugzilla.redhat.com/show_bug.cgi?id=1365500 -- 
    CPU feature cmt not found with 2.0.0-1

Comment 7 Kashyap Chamarthy 2017-09-29 09:28 UTC
Created attachment 1332325 [details]
Nova instance  guest XML at the time of launch (2017-01-20 10:06:44) on the source host

This was obtained from:

   cl-ra15-n8.mgt.cluster/etc/libvirt/qemu/instance-000010c8.xml

Comment 8 Eduardo Habkost 2017-09-29 16:10:29 UTC
I was going to suggest shutting down the VM on the source and remove "cmt"  from the domain XML.  However, this is interesting:

(In reply to Kashyap Chamarthy from comment #5)
[...]
> But when migrated to the destination host, libvirt is trying to run:
> 
>     [...]
>     -cpu
> Haswell-noTSX,vme=on,ds=off,acpi=off,ss=on,ht=off,tm=off,pbe=off,dtes64=off,
> monitor=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,
> dca=off,osxsave=off,f16c=on,rdrand=on,arat=off,tsc_adjust=off,cmt=off,
> xsaveopt=on,pdpe1gb=on,abm=on,hypervisor=on
>     [...]
> 
> Notice the "cmt=off" bit above.
> 
> So despite having "<feature policy="require" name="cmt"/>", libvirt 
> seems to be still setting 'cmt' to 'off' on the destination.
> 

It looks like libvirt knows it needs to remove "cmt" when migrating, which is really good news, so maybe we don't need to shut down the VM and edit the domain XML on the source to fix the problem.

But for some reason libvirt believes "cmt=off" is necessary on the destination.  I hope Jiri can help find a workaround or a fix for this.

Comment 10 Jiri Denemark 2017-10-05 14:27:54 UTC
The domain was started on libvirt-1.2.17-13.el7_2.4 and qemu-kvm-rhev-2.3.0-31.el7_2.13.

While the domain was running the host was migrated to 7.4, which means libvirt-3.2.0-14.el7_4.3 and qemu-kvm-rhev-2.9.0-10.el7.

At this point new libvirt reconnected to the running domain (which was still using qemu-kvm-rhev-2.3.0-31.el7_2.13). Because the old libvirt didn't replace the host-model CPU with the actual custom CPU definition which was used when starting a domain (old libvirt used to do this replacement on demand rather than doing it just once and storing the result in live definition), the new libvirt needs to do this replacement while reconnecting to the running domain (this was addressed in bug 1470582).

We mimic what the old libvirt did when starting the domain by using the host CPU from capabilities XML as a base. Since the new libvirt knows such CPU model would almost never match the CPU QEMU actually provided to a guest, we ask QEMU for all enabled CPUID bits and disable all features in the base CPU model which were not enabled by QEMU. This way we can get a CPU definition which matches the virtual CPU seen by the guest OS.

The thing is "To be migrated Nova instance XML from source Compute node" contains no disabled features and yet, there are several of them disabled when the domain gets started on the destination.

I'll try to reproduce this issue once I get a machine with cmt.

Comment 11 Jiri Denemark 2017-10-05 16:12:58 UTC
I see, the "to be migrated XML" is a migratable XML rather than the active XML
of the domain. This XML is transferred during migration to the destination
host for backward compatibility. The libvirt from RHEL-7.4 on the destination
host will use the active XML which contains the disabled features and thus we
can see a lot of disabled features on the command line there.

So the main problem here is that the process of translating host-model to a
custom mode CPU when libvirt reconnects to existing domains started by old
libvirt does not count with the bloody cmt feature which is only known to
libvirt.

Comment 12 Jiri Denemark 2017-10-05 16:55:34 UTC
There is a simple, although very hackish workaround:

1. systemctl stop libvirtd
2. edit /var/run/libvirt/qemu/instance-000010c8.xml and remove all lines with <feature ... name='cmt'/>
3. systemctl start libvirtd

Comment 13 Kashyap Chamarthy 2017-10-06 10:45:53 UTC
(In reply to Jiri Denemark from comment #11)
> I see, the "to be migrated XML" is a migratable XML rather than the active
> XML of the domain. 

Indeed, that's the correct interpretation.  I tried to be clear when describing that XML. :-)

> This XML is transferred during migration to the destination
> host for backward compatibility. The libvirt from RHEL-7.4 on the destination
> host will use the active XML which contains the disabled features and thus we
> can see a lot of disabled features on the command line there.

Ah-ha, thanks for confirming that.

> So the main problem here is that the process of translating host-model to a
> custom mode CPU when libvirt reconnects to existing domains started by old
> libvirt does not count with the bloody cmt feature which is only known to
> libvirt.

Heh, indeed. 

And indeed your workaround is what I recommended to the bug reporter on IRC.  That seems to be the only possible way to fix this _without_ guest down-time.

Thanks for the overall analysis.

Comment 14 Kashyap Chamarthy 2017-10-06 11:35:56 UTC
*** Bug 1497320 has been marked as a duplicate of this bug. ***

Comment 15 Jiri Denemark 2017-10-06 12:00:54 UTC
(In reply to Kashyap Chamarthy from comment #13)
> And indeed your workaround is what I recommended to the bug reporter on IRC.
> That seems to be the only possible way to fix this _without_ guest down-time.

Yeah, until this bug is fixed in which case upgrading libvirt should solve the issue automatically.

Comment 17 Jiri Denemark 2017-10-11 10:12:54 UTC
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-October/msg00432.html

Comment 18 Jiri Denemark 2017-10-17 13:21:05 UTC
This bug is now fixed upstream by

commit 4b87b3675ffd7794542de70da4391f3787ed76a2
Refs: v3.8.0-132-g4b87b3675f
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Fri Oct 6 12:57:15 2017 +0200
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Tue Oct 17 15:08:05 2017 +0200

    qemu: Separate CPU updating code from qemuProcessReconnect

    The new function is called qemuProcessRefreshCPU.

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
    Reviewed-by: Pavel Hrdina <phrdina@redhat.com>

commit 3276416904393a06df664c5d849ee805d07688d8
Refs: v3.8.0-133-g3276416904
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Mon Oct 9 16:20:43 2017 +0200
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Tue Oct 17 15:08:05 2017 +0200

    conf: Introduce virCPUDefFindFeature

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
    Reviewed-by: Pavel Hrdina <phrdina@redhat.com>

commit e26cc8f82ff346c9ec90409bac06581b64e42b20
Refs: v3.8.0-134-ge26cc8f82f
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Fri Oct 6 13:23:36 2017 +0200
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Tue Oct 17 15:08:05 2017 +0200

    qemu: Filter CPU features when using host CPU

    When reconnecting to a domain started with a host-model CPU which was
    started by old libvirt that did not replace host-model with the real CPU
    definition, libvirt replaces the host-model CPU with the CPU from
    capabilities (because this is what the old libvirt did when it started
    the domain). Without this patch libvirt could use features unknown to
    QEMU in the CPU definition which replaced the original host-model CPU.
    Such domain would keep running just fine, but any attempt to migrate it
    will fail and once the domain is saved or snapshotted, restoring it
    would fail too.

    In other words whenever we want to use the CPU definition from host
    capabilities as a guest CPU definition, we have to filter the unknown
    features.

    https://bugzilla.redhat.com/show_bug.cgi?id=1495171

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
    Reviewed-by: Pavel Hrdina <phrdina@redhat.com>

commit 6a6f6b91e0e76480ea961f83135efcb4faf3284a
Refs: v3.8.0-135-g6a6f6b91e0
Author:     Jiri Denemark <jdenemar@redhat.com>
AuthorDate: Fri Oct 6 14:49:07 2017 +0200
Commit:     Jiri Denemark <jdenemar@redhat.com>
CommitDate: Tue Oct 17 15:08:05 2017 +0200

    qemu: Fix CPU model broken by older libvirt

    When libvirt older than 3.9.0 reconnected to a running domain started by
    old libvirt it could have messed up the expansion of host-model by
    adding features QEMU does not support (such as cmt). Thus whenever we
    reconnect to a running domain, revert to an active snapshot, or restore
    a saved domain we need to check the guest CPU model and remove the
    CPU features unknown to QEMU. We can do this because we know the domain
    was successfully started, which means the CPU did not contain the
    features when libvirt started the domain.

    https://bugzilla.redhat.com/show_bug.cgi?id=1495171

    Signed-off-by: Jiri Denemark <jdenemar@redhat.com>
    Reviewed-by: Pavel Hrdina <phrdina@redhat.com>

Comment 22 Paul Needle 2017-10-31 16:55:07 UTC
Hi all,

I have cloned this Bugzilla as follows, for RHEL 7.4.z:

  Bug 1508010 - Post libvirt upgrade to 3.2.0-14, migration fails with -- "can't apply global Haswell-noTSX-x86_64-cpu.cmt=off: Property '.cmt' not found" [RHEL 7.4.z] 
  https://bugzilla.redhat.com/show_bug.cgi?id=1508010

Kind regards,
Paul.

Comment 28 zhe peng 2017-11-03 10:15:09 UTC
I can reproduce this:
step:
1. prepare a host with cmt
2. install rhel7.2.z(include libvirt,qemu-kvm-rhev)
3. prepare a guest with host-model cpu setting and start
check guest xml:
....
 <cpu mode='host-model'>
    <model fallback='allow'>Haswell-noTSX</model>
    <vendor>Intel</vendor>
    <topology sockets='2' cores='1' threads='1'/>
  </cpu>
....
4. update libvirt to rhel7.4.z build(libvirt-3.2.0-14.el7_4.3.x86_64)
5. check guest xml
#virsh dumpxml $guest
...
 <cpu mode='custom' match='exact' check='full'>
    <model fallback='allow'>Haswell-noTSX</model>
    <vendor>Intel</vendor>
    <topology sockets='2' cores='1' threads='1'/>
    ...
    <feature policy='disable' name='cmt'/>
    ...
  </cpu>

...
6. migrate guest to dst host(rhel7.4.z installed) will get error:
# virsh migrate --live rhel7.2 qemu+ssh://$target/system --verbose
error: internal error: qemu unexpectedly closed the monitor: 2017-11-03T10:07:45.618330Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/3 (label charserial0)
2017-11-03T10:07:45.626584Z qemu-kvm: can't apply global Haswell-noTSX-x86_64-cpu.cmt=off: Property '.cmt' not found

only update libvirt then migrate vm will cause the error.

Comment 29 zhe peng 2017-11-27 03:16:24 UTC
verify with build:
libvirt-3.9.0-3.el7.x86_64

step same with comment 28

# virsh migrate --live rhel7.2 qemu+ssh://$target_host/system --verbose
Migration: [100 %]

check guest xml on target host:
....
<cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Haswell-noTSX</model>
    <vendor>Intel</vendor>
    <topology sockets='2' cores='1' threads='1'/>
    <feature policy='require' name='vme'/>
    <feature policy='disable' name='ds'/>
    <feature policy='disable' name='acpi'/>
    <feature policy='require' name='ss'/>
    <feature policy='disable' name='ht'/>
    <feature policy='disable' name='tm'/>
    <feature policy='disable' name='pbe'/>
    <feature policy='disable' name='dtes64'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='ds_cpl'/>
    <feature policy='disable' name='vmx'/>
    <feature policy='disable' name='smx'/>
    <feature policy='disable' name='est'/>
    <feature policy='disable' name='tm2'/>
    <feature policy='disable' name='xtpr'/>
    <feature policy='disable' name='pdcm'/>
    <feature policy='disable' name='dca'/>
    <feature policy='disable' name='osxsave'/>
    <feature policy='require' name='f16c'/>
    <feature policy='require' name='rdrand'/>
    <feature policy='disable' name='arat'/>
    <feature policy='disable' name='tsc_adjust'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='abm'/>
    <feature policy='require' name='hypervisor'/>
  </cpu>

....

move to verified.

Comment 35 errata-xmlrpc 2018-04-10 10:57:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704


Note You need to log in before you can comment on or make changes to this bug.