Bug 1497320

Summary: When updating from RHOSP 10.0.3 to RHOSP 10.0.6, live migration with VM booted from volume fails
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: qemu-kvm-rhevAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DUPLICATE QA Contact: Shai Revivo <srevivo>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: awaugama, dhill, kchamart, mburns, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-10-06 11:35:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Attaching the complete QEMU log for 'instance-00000002.log' with all launches (from host: "clgrabguhv22")
none
Complete QEMU log for 'instance-00000002.log' with all launches (from host: "clgrabguhv21") none

Description David Hill 2017-09-29 18:42:31 UTC
Description of problem:

Live migration of instances booting from volume fail, since upgrading from OSP 10.0.3 to 10.0.6.
1.	Step one, deploy an OSP10.0.3 overcloud
2.	Step two, create an instance with boot from cinder backed volume (instead of boot from image)
3.	Step three, upgrade the overcloud to 10.0.6
4.	Step four. Try to live migrate the instance from one hypervisor to another
The migration fails.



2017-09-15 19:24:43.571+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08<tel:2017082208>:54:01, x86-039.build.eng.bos.redhat.com<http://x86-039.build.eng.bos.redhat.com>), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: clgrabguhv22.localdomain<http://clgrabguhv22.localdomain>

2017-09-15T19:24:43.621464Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/2 (label charserial1)
2017-09-15T19:24:43.624218Z qemu-kvm: can't apply global Haswell-noTSX-x86_64-cpu.cmt<http://64-cpu.cmt>=off: Property '.cmt' not found
2017-09-15 19:24:43.659+0000: shutting down, reason=failed




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Kashyap Chamarthy 2017-10-06 11:11:16 UTC
From the error message in the description, it just sounds like a
duplicate of the following issue that I'm parallely triaging:

    https://bugzilla.redhat.com/show_bug.cgi?id=1495171 --  Post  
    libvirt upgrade to 3.2.0-14, migration fails with -- "can't apply
    global Haswell-noTSX-x86_64-cpu.cmt=off: Property '.cmt' not found"
    
    
If the issue is urgent, there's a (hack-ish) workaround (from here:
https://bugzilla.redhat.com/show_bug.cgi?id=1495171#c12) to avoid this
error, without guest downtime:

  1. $ systemctl stop libvirtd

  2. Edit /var/run/libvirt/qemu/instance-XXXXXXXX.xml and remove all 
     lines with 

        <feature ... name='cmt'/>

  3. $ systemctl start libvirtd
  
Please be careful while doing the above to not mistakely edit / remove
other elements of the guest XML.

Comment 2 Kashyap Chamarthy 2017-10-06 11:21:09 UTC
Posting it here just for the record.

I see the 'cmt' failure for this instance: sosreport-20170918-174216/clgrabguhv22.localdomain/var/log/libvirt/qemu/instance-00000002.log

-----------------------------------------------------------------------
2017-09-15 19:24:43.571+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08:54:01, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: clgrabguhv22.localdomain
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=instance-00000002,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-instance-00000002/master-key.aes -machine pc-i440fx-rhel7.3.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX,vme=on,ds=off,acpi=off,ss=on,ht=off,tm=off,pbe=off,dtes64=off,monitor=off,ds_cpl=off,vmx=off,smx=off,est=off,tm2=off,xtpr=off,pdcm=off,dca=off,osxsave=off,f16c=on,rdrand=on,arat=on,tsc_adjust=on,cmt=off,xsaveopt=on,pdpe1gb=on,abm=on,hypervisor=on -m 2048 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid e891c268-ca44-4dc4-ba66-69aab666305d -smbios 'type=1,manufacturer=Red Hat,product=OpenStack Compute,version=14.0.3-9.el7ost,serial=905ad38b-1ea0-4a46-ac9f-552a2fd7b4b5,uuid=e891c268-ca44-4dc4-ba66-69aab666305d,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-instance-00000002/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/disk/by-path/ip-100.74.16.5:3260-iscsi-iqn.2010-01.com.solidfire:eh15.clgrabguye20-5dd74179-3f91-454d-a5c7-ea333eb589dc.11-lun-0,format=raw,if=none,id=drive-virtio-disk0,serial=5dd74179-3f91-454d-a5c7-ea333eb589dc,cache=none,discard=unmap,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:fd:ea:e7,bus=pci.0,addr=0x3 -add-fd set=2,fd=32 -chardev file,id=charserial0,path=/dev/fdset/2,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 100.71.2.141:1 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2017-09-15T19:24:43.621464Z qemu-kvm: -chardev pty,id=charserial1: char device redirected to /dev/pts/2 (label charserial1)
2017-09-15T19:24:43.624218Z qemu-kvm: can't apply global Haswell-noTSX-x86_64-cpu.cmt=off: Property '.cmt' not found
2017-09-15 19:24:43.659+0000: shutting down, reason=failed
-----------------------------------------------------------------------

Comment 3 Kashyap Chamarthy 2017-10-06 11:30:00 UTC
Created attachment 1335228 [details]
Attaching the complete QEMU log for 'instance-00000002.log' with all launches (from host: "clgrabguhv22")

In this log, you can see that the *first* time the guest was launched with older versions of libvirt (2.0.0-10.el7_3.9) / QEMU (qemu-kvm-rhev-2.6.0-28.el7_3.9):

-----------------------------------------------------------------------
2017-09-14 18:43:43.353+0000: starting up libvirt version: 2.0.0, package: 10.el7_3.9 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-05-04-06:48:37, x86-034.build.eng.bos.redhat.com), qemu version: 2.6.0 (qemu-kvm-rhev-2.6.0-28.el7_3.9)
-----------------------------------------------------------------------

And *then* for the instance with the 'cmt' failure, you see newer libvirt / QEMU:

-----------------------------------------------------------------------
2017-09-15 19:24:43.571+0000: starting up libvirt version: 3.2.0, package: 14.el7_4.3 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-22-08:54:01, x86-039.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-rhev-2.9.0-10.el7), hostname: clgrabguhv22.localdomain
-----------------------------------------------------------------------

Comment 4 Kashyap Chamarthy 2017-10-06 11:34:55 UTC
Created attachment 1335240 [details]
Complete QEMU log for 'instance-00000002.log' with all launches (from host: "clgrabguhv21")

Here too, one could observe the libvirt / QEMU upgrade:

From: libvirt (2.0.0-10.el7_3.9) and QEMU (qemu-kvm-rhev-2.6.0-28.el7_3.9)

To:   libvirt (3.2.0-14.el7_4.3) and QEMU (qemu-kvm-rhev-2.9.0-10.el7)

Comment 5 Kashyap Chamarthy 2017-10-06 11:35:56 UTC

*** This bug has been marked as a duplicate of bug 1495171 ***