Bug 1088216

Summary:	Fail restore/migrate after hot-unplug the vcpus from guest
Product:	Red Hat Enterprise Linux 6	Reporter:	Xuesong Zhang <xuzhang>
Component:	qemu-kvm	Assignee:	Virtualization Maintenance <virt-maint>
Status:	CLOSED WONTFIX	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.6	CC:	acathrow, bsarathy, chayang, dallan, dgilbert, dyuan, ehabkost, imammedo, juzhang, lersek, michen, mkenneth, mzhan, quintela, qzhang, virt-maint, xfu, zpeng
Target Milestone:	rc
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2014-06-12 13:37:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Xuesong Zhang 2014-04-16 09:10:49 UTC

Description of problem:
Fail migrate the guest back after hot-unplug the vcpus on target

Version-Release number of selected component (if applicable):
libvirt-0.10.2-32.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.423.el6.x86_64
kernel-2.6.32-457.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare one shutoff guest for migration, revise the dumpxml as following:
......
  <vcpu placement='static' current='2'>4</vcpu>
......

2. start the guest

3. check the cpu settings in the dumpxml
# virsh dumpxml mig|grep cpu
  <vcpu placement='static' current='2'>4</vcpu>

4. migrate the guest to target host.
# virsh migrate mig --live qemu+ssh://#target host IP#/system --verbose
Migration: [100 %]

5. on target host, hot-unplug vcpus from guest
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 10    mig                            running

# virsh dumpxml mig|grep cpu
  <vcpu placement='static' current='2'>4</vcpu>
# virsh vcpucount mig
maximum      live           4
current      live           2

# virsh setvcpus mig 1

# virsh vcpucount mig
maximum      live           4
current      live           1

# virsh dumpxml mig|grep cpu
  <vcpu placement='static' current='1'>4</vcpu>

6. migrate the guest back to source host, it will failed.
# virsh migrate mig --live qemu+ssh://#srouce host IP#/system --verbose
Migration: [ 96 %]error: Requested operation is not valid: domain 'mig' is not processing incoming migration


7. If hot-plug vcpus to the guest on target host, the guest can be migrated back to the source host.
# virsh setvcpus mig 3

# virsh vcpucount mig
maximum      live           4
current      live           3

# virsh migrate mig --live qemu+ssh://#srouce host IP#/system --verbose
root.5.128's password: 
Migration: [100 %]

8. check the vcpu info on the source host, it's as expected also.
# virsh dumpxml mig|grep cpu
  <vcpu placement='static' current='3'>4</vcpu>

# virsh vcpucount mig
maximum      config         4
maximum      live           4
current      config         2
current      live           3



Actual results:
As step 6

Expected results:
The guest should be migrate back successfully. 
If can't migrate back the guest since the cpu is hot-unpluged, please provide more details explanation while prompt the error message.


Addtional info:

Comment 1 zhe peng 2014-04-16 09:32:09 UTC

The migration will failed if hot-unplug vcpus, no need migrate back.
error msg:
Migration: [ 96 %]error: operation failed: domain is no longer running

Comment 2 Xuesong Zhang 2014-04-22 08:49:07 UTC

Another scenario will be failed, since same reason with this bug.
1. hot-unplug the vcpus from running guest
2. save the guest to file guest.save
3. restore the guest from file guest.save, will meet the following error:

# virsh restore rhel6.5.save3
error: Failed to restore domain from rhel6.5.save3
error: Unable to read from monitor: Connection reset by peer

Following is the log info: qemu guest log, libvirtd log.
# tailf /var/log/libvirt/qemu/rhel6.5.log
2014-04-22 08:44:51.503+0000: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name rhel6.5 -S -M rhel6.5.0 -M rhel6.5.0 -enable-kvm  -m  1024  -realtime mlock=off -smp-smp 1,maxcpus=4,sockets=4,cores=1,threads=11,maxcpus=4,sockets=4,cores=1,threads=1 -uuid fe380b68-11c6-b7d0-e6a8-b466823497f8  -nodefconfig -nodefaults -chardev-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel6.5.monitor,server,nowait -mon  chardev=charmonitor,id=monitor,mode=control -rtc base=utc  -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/var/lib/libvirt/images/rhel6.5-20140306.img,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=27,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:7a:c3:47,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/rhel6.5.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 127.0.0.1:0 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -incoming fd:25 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
char device redirected to /dev/pts/6
Unknown savevm section or instance 'cpu_common' 1
load of migration failed
2014-04-22 08:44:52.977+0000: shutting down


# tailf /var/log/libvirt/libvirtd.log
2014-04-22 08:44:52.977+0000: 2325: error : qemuMonitorIORead:514 : Unable to read from monitor: Connection reset by peer
2014-04-22 08:44:53.353+0000: 2326: warning : qemuDomainSaveImageStartVM:5524 : failed to restore save state label on /root/rhel6.5.save3

Comment 3 Michal Privoznik 2014-04-30 12:41:33 UTC

Laszlo,

I think this is one of the bugs that's somewhere in between libvirt and qemu. I mean, from my understanding the problem is libvirt does not create the same command line on the destination (prior to vcpu hotunplug: -smp 2,maxcpus=4,sockets=4,cores=1,threads=1   on the dst after the hotunplug: -smp 1,maxcpus=4,sockets=4,cores=1,threads=1). However, the other solution might be that qemu doesn't transfer the stale vcpu thread state to the destination. I mean, in this particular case, when vcpu#2 is unplugged, the vcpu thread might be joined with qemu resulting in no ABI breakage. What's your opinion on this?

Comment 4 Laszlo Ersek 2014-04-30 15:13:16 UTC

This is the key message, from comment 2, on the target host:

> Unknown savevm section or instance 'cpu_common' 1

This corresponds to the VMStateDescription object called "vmstate_cpu_common", in file "exec.c", which is instantiated for each VCPU.

On the source host, you start with -smp 2, hence two VCPUs are created (each with its VCPU thread etc). When you unplug one, the VCPU object and the thread stay, the VCPU just gets disabled in ACPI. (See bug 1017858 comment 30.)

However the VCPU object (disabled in ACPI) will be part of the migration stream nonetheless. When you start qemu-kvm with -smp 1 on the target host, apparently no VCPU object exists with instance-id==1, to load the vmstate into.

VCPUs are created on startup in:

  pc_init1() [hw/pc.c]
    LOOP: 0 <= i < smp_cpus
      pc_new_cpu()
        cpu_init() [target-i386/cpu.h]
          cpu_x86_init() [target-i386/helper.c]
            cpu_exec_init() [exec.c]
              vmstate_register(... &vmstate_cpu_common ...) <--- "cpu_common"
        qemu_init_vcpu() [vl.c]
          ...

Unfortunately, joining (ie. killing) the VCPU thread, and releasing the VCPU object on hot-unplug (in addition to disabling it in ACPI) is out of scope for RHEL-6. I'm not sure how far Igor (CC'd) got with this feature in upstream, but it's too intrusive for RHEL-6 I think.

One thing we might entertain is a vmstate_unregister() call on VCPU hot-unplug (ie. at the time it is disabled in ACPI), in the disable_processor() branch of qemu_system_cpu_hot_add() [hw/acpi.c].

Mirroring that, the enable_processor() branch of the same function should re-register the vmstate, but *only* when we reenable (re-plug) a preexistent VCPU (object & thread), not when we create a brand new one.

There's no telling of course what else this would regress :(

From a quick look, I don't think it would cause the cpu_index space to "collapse". That is, if you omit the "cpu_common" VMSD with instance_id==1 from the stream, that still allows other such VMSDs to keep their instance IDs (which are the cpu_index values for VCPUs), hence probably keeping the mapping to topology information (APIC IDs) intact as well. But this reasoning is hardly a hard proof :(

I guess I could hack up a proof of concept patch for the vmstate_unregister()-on-unplug idea, but it's really not my expertise.

--------o--------

Regarding the reverse direction (ie. migrating from a lower SMP count to a higher one) -- I'm not overly familiar with migration, but I think in this case the VCPU states that are not present in the migration stream are simply not loaded in memory -- those VCPUs keep their original (halted) state from the time the target qemu instance was started.

Does this help?...:(

Comment 5 Michal Privoznik 2014-05-02 10:56:33 UTC

@Igor, so what's your thought on this? Would it be possible just to not send offline VCPU state to the other side?

@Laszlo, well if qemu fix turns out too invasive is there something libvirt can do? For instance, start domain with '-smp 2' but prior issuing 'cont' do vcpu hotunplug?

Comment 6 Laszlo Ersek 2014-05-02 12:08:44 UTC

What happens if you start the incoming domain with -smp 2, and don't do anything else? Basically, start the incoming domain with -smp N, where N is the maximum number of VCPUs that the source domain has ever seen during its lifetime. (That's the number of the source's VCPU threads / objects.)  If the source domain provides a vmsate object in the stream for each of the N VCPUs, that's best; if not (because some have been unplugged on the source before migration), then those VCPUs will remain parked on the target.

Comment 8 Igor Mammedov 2014-05-02 12:48:36 UTC

(In reply to Michal Privoznik from comment #5)
> @Igor, so what's your thought on this? Would it be possible just to not send
> offline VCPU state to the other side?
That would be too ugly and without actually trying to implement all Laslo said above it's hard to say if anything might regress.

> 
> @Laszlo, well if qemu fix turns out too invasive is there something libvirt
> can do? For instance, start domain with '-smp 2' but prior issuing 'cont' do
> vcpu hotunplug?

Starting destination domain with the maximum amount of VCPUs as source have ever had should work.

But to cross out all above said, why do we care about vpcu-unplug on RHEL6 if it's not supported? (and I'm not aware that we are going to support it there at all)
Perhaps it would be easy to just disable in libvirt decreasing VCPU count and be done with it.