Bug 1497320 - When updating from RHOSP 10.0.3 to RHOSP 10.0.6, live migration with VM booted from volume fails
Summary: When updating from RHOSP 10.0.3 to RHOSP 10.0.6, live migration with VM boote...
Keywords:
Status: CLOSED DUPLICATE of bug 1495171
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Virtualization Maintenance
QA Contact: Shai Revivo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-29 18:42 UTC by David Hill
Modified: 2018-01-22 16:04 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-10-06 11:35:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Attaching the complete QEMU log for 'instance-00000002.log' with all launches (from host: "clgrabguhv22") (38.34 KB, text/plain)
2017-10-06 11:30 UTC, Kashyap Chamarthy
no flags Details
Complete QEMU log for 'instance-00000002.log' with all launches (from host: "clgrabguhv21") (22.13 KB, text/plain)
2017-10-06 11:34 UTC, Kashyap Chamarthy
no flags Details

Description David Hill 2017-09-29 18:42:31 UTC
Description of problem:

Live migration of instances booting from volume fail, since upgrading from OSP 10.0.3 to 10.0.6.
1.	Step one, deploy an OSP10.0.3 overcloud
2.	Step two, create an instance with boot from cinder backed volume (instead of boot from image)
3.	Step three, upgrade the overcloud to 10.0.6
4.	Step four. Try to live migrate the instance from one hypervisor to another
The migration fails.



2017-09-15 19:24:43.571+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08<tel:2017082208>:54:01, x86-039.build.eng.bos.redhat.com<http://x86-039.build.eng.bos.redhat.com>), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: clgrabguhv22.localdomain<http://clgrabguhv22.localdomain>

2017-09-15T19:24:43.621464Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/2 (label charserial1)
2017-09-15T19:24:43.624218Z qemu-kvm: can't apply global Haswell-noTSX-x86_64-cpu.cmt<http://64-cpu.cmt>=off: Property '.cmt' not found
2017-09-15 19:24:43.659+0000: shutting down, reason=failed




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Kashyap Chamarthy 2017-10-06 11:11:16 UTC
From the error message in the description, it just sounds like a
duplicate of the following issue that I'm parallely triaging:

    https://bugzilla.redhat.com/show_bug.cgi?id=1495171 --  Post  
    libvirt upgrade to 3.2.0-14, migration fails with -- "can't apply
    global Haswell-noTSX-x86_64-cpu.cmt=off: Property '.cmt' not found"
    
    
If the issue is urgent, there's a (hack-ish) workaround (from here:
https://bugzilla.redhat.com/show_bug.cgi?id=1495171#c12) to avoid this
error, without guest downtime:

  1. $ systemctl stop libvirtd

  2. Edit /var/run/libvirt/qemu/instance-XXXXXXXX.xml and remove all 
     lines with 

        <feature ... name='cmt'/>

  3. $ systemctl start libvirtd
  
Please be careful while doing the above to not mistakely edit / remove
other elements of the guest XML.

Comment 2 Kashyap Chamarthy 2017-10-06 11:21:09 UTC
Posting it here just for the record.

I see the 'cmt' failure for this instance: sosreport-20170918-174216/clgrabguhv22.localdomain/var/log/libvirt/qemu/instance-00000002.log

-----------------------------------------------------------------------
2017-09-15 19:24:43.571+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08:54:01, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: clgrabguhv22.localdomain
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000002,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-instance-00000002/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX,vme=on,ds=off,acpi=off,ss=on,ht=off,tm=off,pbe=off,dtes64=off,monitor=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,dca=off,osxsave=off,f16c=on,rdrand=on,arat=on,tsc_adjust=on,cmt=off,xsaveopt=on,pdpe1gb=on,abm=on,hypervisor=on -m 2048 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid e891c268-ca44-4dc4-ba66-69aab666305d -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.0.3-9.el7ost,serial=905ad38b-1ea0-4a46-ac9f-552a2fd7b4b5,uuid=e891c268-ca44-4dc4-ba66-69aab666305d,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-instance-00000002/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/disk/by-path/ip-100.74.16.5:3260-iscsi-iqn.2010-01.com.solidfire:eh15.clgrabguye20-5dd74179-3f91-454d-a5c7-ea333eb589dc.11-lun-0,format=raw,if=none,id=drive-virtio-disk0,serial=5dd74179-3f91-454d-a5c7-ea333eb589dc,cache=none,discard=unmap,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:fd:ea:e7,bus=pci.0,addr=0x3 -add-fd set=2,fd=32 -chardev file,id=charserial0,path=/dev/fdset/2,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 100.71.2.141:1 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2017-09-15T19:24:43.621464Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/2 (label charserial1)
2017-09-15T19:24:43.624218Z qemu-kvm: can't apply global Haswell-noTSX-x86_64-cpu.cmt=off: Property '.cmt' not found
2017-09-15 19:24:43.659+0000: shutting down, reason=failed
-----------------------------------------------------------------------

Comment 3 Kashyap Chamarthy 2017-10-06 11:30:00 UTC
Created attachment 1335228 [details]
Attaching the complete QEMU log for 'instance-00000002.log' with all launches (from host: "clgrabguhv22")

In this log, you can see that the *first* time the guest was launched with older versions of libvirt (2.0.0-10.el7_3.9) / QEMU (qemu-kvm-rhev-2.6.0-28.el7_3.9):

-----------------------------------------------------------------------
2017-09-14 18:43:43.353+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-05-04-06:48:37, x86-034.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-28.el7_3.9)
-----------------------------------------------------------------------

And *then* for the instance with the 'cmt' failure, you see newer libvirt / QEMU:

-----------------------------------------------------------------------
2017-09-15 19:24:43.571+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08:54:01, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: clgrabguhv22.localdomain
-----------------------------------------------------------------------

Comment 4 Kashyap Chamarthy 2017-10-06 11:34:55 UTC
Created attachment 1335240 [details]
Complete QEMU log for 'instance-00000002.log' with all launches (from host: "clgrabguhv21")

Here too, one could observe the libvirt / QEMU upgrade:

From: libvirt (2.0.0-10.el7_3.9) and QEMU (qemu-kvm-rhev-2.6.0-28.el7_3.9)

To:   libvirt (3.2.0-14.el7_4.3) and QEMU (qemu-kvm-rhev-2.9.0-10.el7)

Comment 5 Kashyap Chamarthy 2017-10-06 11:35:56 UTC

*** This bug has been marked as a duplicate of bug 1495171 ***


Note You need to log in before you can comment on or make changes to this bug.