Bug 1815426

Summary:	Virsh managedsave never finished if guest have net failover element
Product:	Red Hat Enterprise Linux 9	Reporter:	Luyao Huang <lhuang>
Component:	libvirt	Assignee:	Laine Stump <laine>
libvirt sub component:	Networking	QA Contact:	yalzhang <yalzhang>
Status:	CLOSED MIGRATED	Docs Contact:
Severity:	medium
Priority:	medium	CC:	aadam, ailan, berrange, chayang, dyuan, jinzhao, jsuchane, juzhang, laine, lvivier, quintela, virt-maint, xuzhang, yalzhang, yama, yanghliu, yanqzhan, yicui
Version:	9.0	Keywords:	MigratedToJIRA, Triaged
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-09-22 15:52:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Luyao Huang 2020-03-20 08:50:46 UTC

Description of problem:
Mangedsave never finished if guest have net failover element

Version-Release number of selected component (if applicable):
libvirt-daemon-6.0.0-13.module+el8.2.0+6048+0fa476b4.x86_64
qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64

How reproducible:
100%

Steps to Reproduce:
1. prepare a guest have net failover xml:

# virsh dumpxml vm1 --inactive

    <interface type='network'>
      <mac address='fe:54:00:18:0b:29'/>
      <source network='default'/>
      <model type='virtio'/>
      <teaming type='persistent'/>
      <alias name='ua-backup0'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='fe:54:00:18:0b:29'/>
      <source network='hostdevnet'/>
      <model type='virtio'/>
      <teaming type='transient' persistent='ua-backup0'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>

# virsh net-dumpxml hostdevnet
<network connections='2'>
  <name>hostdevnet</name>
  <uuid>d1d21152-ab38-4b66-9ddd-13d788b727fc</uuid>
  <forward mode='hostdev' managed='yes'>
    <address type='pci' domain='0x0000' bus='0x82' slot='0x10' function='0x1'/>
    <address type='pci' domain='0x0000' bus='0x82' slot='0x10' function='0x3'/>
    <address type='pci' domain='0x0000' bus='0x82' slot='0x10' function='0x5'/>
    <address type='pci' domain='0x0000' bus='0x82' slot='0x10' function='0x7'/>
  </forward>
</network>

2. start guest and login check guest vnic:
IN guest:

# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp8s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fe:54:00:18:0b:29 brd ff:ff:ff:ff:ff:ff
    inet 10.73.33.191/23 brd 10.73.33.255 scope global dynamic noprefixroute enp8s0
       valid_lft 43087sec preferred_lft 43087sec
    inet6 2620:52:0:4920:a9a2:280c:adc8:5a1d/64 scope global dynamic noprefixroute 
       valid_lft 2591889sec preferred_lft 604689sec
    inet6 fe80::cd32:c409:d931:34da/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: enp8s0nsby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master enp8s0 state UP group default qlen 1000
    link/ether fe:54:00:18:0b:29 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.96/24 brd 192.168.122.255 scope global dynamic noprefixroute enp8s0nsby
       valid_lft 3487sec preferred_lft 3487sec
    inet6 fe80::e25a:be03:b8ce:253/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master enp8s0 state UP group default qlen 1000
    link/ether fe:54:00:18:0b:29 brd ff:ff:ff:ff:ff:ff
    inet 10.73.33.191/23 brd 10.73.33.255 scope global dynamic noprefixroute enp7s0
       valid_lft 43087sec preferred_lft 43087sec
    inet6 fe80::e709:c37e:f302:9b6d/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

3. run managedsave command:

# virsh managedsave vm1
(never finish, have to cancel it by user) 
^Cerror: Failed to save domain vm1 state
error: operation aborted: domain save job: canceled by client

4. login guest and vf not reattach back:

]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp8s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fe:54:00:18:0b:29 brd ff:ff:ff:ff:ff:ff
    inet 10.73.33.191/23 brd 10.73.33.255 scope global dynamic noprefixroute enp8s0
       valid_lft 42880sec preferred_lft 42880sec
    inet6 2620:52:0:4920:a9a2:280c:adc8:5a1d/64 scope global dynamic noprefixroute 
       valid_lft 2591682sec preferred_lft 604482sec
    inet6 fe80::cd32:c409:d931:34da/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: enp8s0nsby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master enp8s0 state UP group default qlen 1000
    link/ether fe:54:00:18:0b:29 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.96/24 brd 192.168.122.255 scope global dynamic noprefixroute enp8s0nsby
       valid_lft 3280sec preferred_lft 3280sec
    inet6 fe80::e25a:be03:b8ce:253/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever

Actual results:
managedsave cannot finish after set net failover elements in guest

Expected results:
managedsave command return success or report an error if this operation is not support 

Additional info:

guest full xml:
<domain type='kvm'>
  <name>vm1</name>
  <uuid>f98741e7-ddad-4be1-82c0-0ae3655d1c25</uuid>
  <memory unit='KiB'>1024000</memory>
  <currentMemory unit='KiB'>1024000</currentMemory>
  <vcpu placement='static' current='2'>4</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'/>
  <os>
    <type arch='x86_64' machine='pc-q35-rhel8.2.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu mode='host-model' check='partial'>
    <numa>
      <cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/migrate/RHEL-8.2-lhuang.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x16'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='fe:54:00:18:0b:29'/>
      <source network='default'/>
      <model type='virtio'/>
      <teaming type='persistent'/>
      <alias name='ua-backup0'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='fe:54:00:18:0b:29'/>
      <source network='hostdevnet'/>
      <model type='virtio'/>
      <teaming type='transient' persistent='ua-backup0'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='9'/>
    </channel>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/r6.agent'/>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='2'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes' listen='0.0.0.0' keymap='en-us'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='qxl' ram='131072' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <stats period='2'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
</domain>


qemu command line:

/usr/libexec/qemu-kvm -name guest=vm1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-vm1/master-key.aes -machine pc-q35-rhel8.2.0,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX-IBRS,vme=on,ss=on,vmx=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaveopt=on,pdpe1gb=on,abm=on,ibpb=on,amd-ssbd=on,skip-l1dfl-vmentry=on -m 1000 -overcommit mem-lock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=500 -numa node,nodeid=1,cpus=2-3,mem=500 -uuid f98741e7-ddad-4be1-82c0-0ae3655d1c25 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=41,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-pci-bridge,id=pci.2,bus=pci.1,addr=0x0 -device pcie-root-port,port=0x11,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x6 -device qemu-xhci,id=usb,bus=pci.4,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.5,addr=0x0 -blockdev {"driver":"file","filename":"/var/lib/libvirt/migrate/RHEL-8.2-lhuang.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null} -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1 -netdev tap,fd=43,id=hostua-backup0,vhost=on,vhostfd=44 -device virtio-net-pci,failover=on,netdev=hostua-backup0,id=ua-backup0,mac=fe:54:00:18:0b:29,bus=pci.8,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=9,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,fd=45,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=2 -spice port=5901,addr=0.0.0.0,disable-ticketing,seamless-migration=on -k en-us -device qxl-vga,id=video0,ram_size=134217728,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device vfio-pci,host=0000:82:10.3,id=hostdev0,bus=pci.7,addr=0x0,failover_pair_id=ua-backup0 -device virtio-balloon-pci,id=balloon0,bus=pci.6,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on

Comment 4 Juan Quintela 2020-04-07 08:40:52 UTC

hi

I am going to change qemu savevm (the part that implements managedsave) to return one error if failover is setup.
It is not clear to me of anything reasonable to do if we are using failover.  There is nothing that requires that when we return there is a device available to do the failover back of the assigned device.

We can add later the capability of just disabling the device assignment if there is a managed save.
Adding a NeedInfo to lain to see what he thinks about it from the libvirt point of view.

Comment 5 Laine Stump 2020-04-09 02:07:11 UTC

libvirt doesn't use the savevm command. It implements its managedsave by migrating to an open fd, then stopping the qemu process. Here is a trace of the monitor commands leading up to the "hang" (I produced this with "stap examples/systemtap/qemu-monitor.stp"):

 92.664 > 0x7f8d60037a60 {"execute":"stop","id":"libvirt-362"}
 92.669 ! 0x7f8d60037a60 {"timestamp": {"seconds": 1586397518, "microseconds": 481155}, "event": "STOP"}
 93.326 < 0x7f8d60037a60 {"return": {}, "id": "libvirt-362"}
 93.333 > 0x7f8d60037a60 {"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-363"}
 93.335 < 0x7f8d60037a60 {"return": {}, "id": "libvirt-363"}
 93.336 > 0x7f8d60037a60 {"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-364"} (fd=36)
 93.337 < 0x7f8d60037a60 {"return": {}, "id": "libvirt-364"}
 93.337 > 0x7f8d60037a60 {"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-365"}
 93.339 ! 0x7f8d60037a60 {"timestamp": {"seconds": 1586397519, "microseconds": 150972}, "event": "MIGRATION", "data": {"status": "setup"}}
 93.339 ! 0x7f8d60037a60 {"timestamp": {"seconds": 1586397519, "microseconds": 151133}, "event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"}}
 93.339 < 0x7f8d60037a60 {"return": {}, "id": "libvirt-365"}
 93.339 > 0x7f8d60037a60 {"execute":"query-migrate","id":"libvirt-366"}
 93.358 ! 0x7f8d60037a60 {"timestamp": {"seconds": 1586397519, "microseconds": 170526}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
 93.358 ! 0x7f8d60037a60 {"timestamp": {"seconds": 1586397519, "microseconds": 170708}, "event": "MIGRATION", "data": {"status": "wait-unplug"}}
 93.358 < 0x7f8d60037a60 {"return": {"status": "wait-unplug"}, "id": "libvirt-366"}


So I don't think disabling the savevm command will have the effect you're expecting.

Also, I don't agree that disabling managedsave when there is a failover device is the ideal solution - even though there is no guarantee there will be an assigned device available at the time of the restore, that shouldn't be an issue - if that's the case then libvirt will just refuse to restore; the user can do whatever is necessary to make the resource available, then try again (we could also enhance it to allow restore without the assigned device (similar to how we allow restore with a missing USB device), in which case the restored guest would be operating with the backup device.

Comment 6 Juan Quintela 2020-05-20 07:57:48 UTC

Hi

I found the problem, not yet the solution:


(gdb) bt
#0  0x00007f9d4f98eb18 in futex_abstimed_wait_cancelable (private=0, 
    abstime=0x7f9cb5ee4690, clockid=0, expected=0, futex_word=0x55764f1e96a8)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:208
#1  do_futex_wait (sem=sem@entry=0x55764f1e96a8, 
    abstime=abstime@entry=0x7f9cb5ee4690, clockid=0) at sem_waitcommon.c:112
#2  0x00007f9d4f98ec43 in __new_sem_wait_slow (sem=sem@entry=0x55764f1e96a8, 
    abstime=abstime@entry=0x7f9cb5ee4690, clockid=0) at sem_waitcommon.c:184
#3  0x00007f9d4f98ecd2 in sem_timedwait (sem=sem@entry=0x55764f1e96a8, 
    abstime=abstime@entry=0x7f9cb5ee4690) at sem_timedwait.c:39
#4  0x000055764cf62e4f in qemu_sem_timedwait (sem=sem@entry=0x55764f1e96a8, 
    ms=ms@entry=250)
    at /usr/src/debug/qemu-5.0.0-2.fc31.x86_64/util/qemu-thread-posix.c:306
#5  0x000055764cdfc345 in migration_thread (opaque=0x55764f1e9420)
    at /usr/src/debug/qemu-5.0.0-2.fc31.x86_64/migration/migration.c:3424
#6  0x000055764cf62813 in qemu_thread_start (args=0x55764f23f420)
    at /usr/src/debug/qemu-5.0.0-2.fc31.x86_64/util/qemu-thread-posix.c:519
#7  0x00007f9d4f9854e2 in start_thread (arg=<optimized out>)
    at pthread_create.c:479
#8  0x00007f9d4f8b46a3 in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) list
3419	        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
3420	                          MIGRATION_STATUS_WAIT_UNPLUG);
3421	
3422	        while (s->state == MIGRATION_STATUS_WAIT_UNPLUG &&
3423	               qemu_savevm_state_guest_unplug_pending()) {
3424	            qemu_sem_timedwait(&s->wait_unplug_sem, 250);
3425	        }
3426	
3427	        migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG,
3428	                MIGRATION_STATUS_ACTIVE);

Trying to understand why it is not unpluging the network card on time.

I can reproduce it easily now.

Comment 11 Juan Quintela 2020-09-17 10:47:38 UTC

Hi

Just got back to this bugzilla.  And there is no way to proper fix it.

virsh managedsave

libvirt just does:
* pause guest
* dump memory with live migration

and what network failure does inside qemu is:
* hotunplug VF
* do the proper migration

So, libvirt stops the guest, and qemu after that waits for the guest to answer to the hotunplug event, but the guest is paused.

So, we can't do anything inside qemu, the only two things that I can think of is changing qemu to:
* just fail if we have network failure, and give an error that is not possible
* do the hot-unplug inside libvirt, and be careful with cancelations and errors.

Laine, what do you think?

Comment 12 Juan Quintela 2020-09-17 11:26:58 UTC

Hi again

I am changing the code on qemu to give one error if we request a migration of guest that is paused and that needs hot-unplug.  This fixes the hang part of this bug, but not the managedsave bit, for that we need "colaboration" and changes on libvirt itself.

My current better idea is creating a new migration parameter:
- tentatively named pause-during-migration
- that do the hot-unplug, pause the guest, and then do the migration, and after it, continue the guest
- why? otherwise, we need multiple changes in libvirt, that do the right thing, i.e. unplug the network card, do the migration, and handle all the cancel/error cases correctly.

Later, Juan.

Comment 13 Laine Stump 2020-09-17 14:26:07 UTC

Could the pause have the same code that's been added to the first stage of migration? Or is that command synchronous? If it's synchronous, maybe a new asynchronous version of the pause is needed.

Comment 18 Laurent Vivier 2021-06-18 14:12:30 UTC

Could you re-test with RHEL-AV-8.5.0 to see if the problem has been fixed by the rebase?

Thanks

Comment 19 Yanghang Liu 2021-06-22 09:41:16 UTC

Hi Laurent,


This problem can still be reproduced in the following test env:

  host:
    4.18.0-315.el8.x86_64
    qemu-kvm-6.0.0-20.module+el8.5.0+11499+199527ef.x86_64

  guest:
    4.18.0-314.el8.x86_64



Related qmp log when running the "virsh managedsave $domain" cmd:

> {"execute":"stop","id":"libvirt-391"}
!  {"timestamp": {"seconds": 1624354140, "microseconds": 406765}, "event": "STOP"}
<  {"return": {}, "id": "libvirt-391"}
> {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":false},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":false},{"capability":"late-block-activate","state":false},{"capability":"multifd","state":false},{"capability":"dirty-bitmaps","state":false}]},"id":"libvirt-392"}
<  {"return": {}, "id": "libvirt-392"}
> {"execute":"migrate-set-parameters","arguments":{"max-bandwidth":9223372036853727232},"id":"libvirt-393"}
<  {"return": {}, "id": "libvirt-393"}
> {"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-394"} (fd=41)
<  {"return": {}, "id": "libvirt-394"}
> {"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-395"}
<  {"return": {}, "id": "libvirt-395"}
!  {"timestamp": {"seconds": 1624354140, "microseconds": 451949}, "event": "MIGRATION", "data": {"status": "setup"}}
!  {"timestamp": {"seconds": 1624354140, "microseconds": 452059}, "event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"}}
!  {"timestamp": {"seconds": 1624354140, "microseconds": 457243}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
!  {"timestamp": {"seconds": 1624354140, "microseconds": 457339}, "event": "MIGRATION", "data": {"status": "wait-unplug"}}
> {"execute":"query-migrate","id":"libvirt-396"}
<  {"return": {"blocked": false, "status": "wait-unplug"}, "id": "libvirt-396"}   <---- the "virsh managedsave $domain" cmd can not finish.

Comment 20 Yanghang Liu 2021-06-24 02:43:11 UTC

Hi Laurent,

If this bug will be fixed in RHEL8.5, could you please help setup the ITR and the DTM ?

Comment 21 John Ferlan 2021-09-08 21:21:16 UTC

Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.  Removed the ITR from all bugs as part of the change.

Comment 26 Laurent Vivier 2021-09-28 13:13:17 UTC

Laine,

from libvirt point of view, do you agree with the idea proposed by Juan in comment #12 ?

To create a new migration parameter named "pause-during-migration" (any suggestion?), do the hot-unplug, pause the guest, and then do the migration, replug the card and after it, continue the guest.

Comment 27 Laurent Vivier 2021-09-30 17:18:43 UTC

I've proposed a patch upstream:

https://patchew.org/QEMU/20210930170926.1298118-1-lvivier@redhat.com/

Author: Laurent Vivier <lvivier>
Date:   Mon Sep 27 14:53:25 2021 +0200

    failover: allow to pause the VM during the migration
    
    If we want to save a snapshot of a VM to a file, we used to follow the
    following steps:
    
    1- stop the VM:
       (qemu) stop
    
    2- migrate the VM to a file:
       (qemu) migrate "exec:cat > snapshot"
    
    3- resume the VM:
       (qemu) cont
    
    After that we can restore the snapshot with:
      qemu-system-x86_64 ... -incoming "exec:cat snapshot"
      (qemu) cont
    
    But when failover is configured, it doesn't work anymore.
    
    As the failover needs to ask the guest OS to unplug the card
    the machine cannot be paused.
    
    This patch introduces a new migration parameter, "pause-vm", that
    asks the migration to pause the VM during the migration startup
    phase after the the card is unplugged.
    
    Once the migration is done, we only need to resume the VM with
    "cont" and the card is plugged back:
    
    1- set the parameter:
       (qemu) migrate_set_parameter pause-vm on
    
    2- migrate the VM to a file:
       (qemu) migrate "exec:cat > snapshot"
    
       The primary failover card (VFIO) is unplugged and the VM is paused.
    
    3- resume the VM:
       (qemu) cont
    
       The VM restarts and the primary failover card is plugged back
    
    The VM state sent in the migration stream is "paused", it means
    when the snapshot is loaded or if the stream is sent to a destination
    QEMU, the VM needs to be resumed manually.
    
    Signed-off-by: Laurent Vivier <lvivier>

Comment 28 Laine Stump 2021-09-30 20:04:57 UTC

Okay, with the concrete example I have a better idea how to respond to your question from Comment 26 :-).


libvirt migrates to a file in 3 places (that I see):

1) qemuSnapshotCreateActiveExternal
2) qemuDomainSaveInternal

In both of these cases, the CPUs are always paused (qemuProcessStopCPUs()) prior to the migrate-to-file (qemuMigrationSrcToFile())

2) doCoreDump

In *some* cases the CPUs are paused prior to migrate-to-file, but in other cases (a) when VIR_DUMP_LIVE is set and b) when the coredump is in response to a watchdog event) the CPUs are *NOT* paused.

So if we can easily determine that this new parameter is available (I'm guessing we'll be able to detect and map it to a qemu capability flag just as with other version-specific things) then the call to qemuProcessStopCPUs() could be _replaced_ with setting this parameter in (1) and (2) (I haven't looked through the error recovery paths, but likely there will be places where we'll need to change behavior).

But I don't know what to do in the  cases of (2) where we are currently doing the migrate without pausing CPUs. (Maybe it's unimportant, and we can just fail in those cases? Dan?)

Comment 29 Daniel Berrangé 2021-10-01 08:30:59 UTC

I'm not sure that we need a new parameter in QEMU at all.  In these cases we have the issue because we do "stop" and then "migrate", but QEMU has long supported running "migrate" and then "stop".  The only tricky bit here is that we need to wait for the failover unplug to complete before invoking "stop".  QEMU emits events when the state of the migration changes.  IIUC, with failover, we start in "wait-unplug" and then transition to "active" when unplug is done.

IOW, it looks like we can already solve this in livirt

 - If failover
     - migrate
     - wait for event signalling "active" state
     - stop
 - else
     - stop
     - migrate


The only downside to this is that we have a tiny window where migration started transferring memory and the CPUs havent been paused by libirt yet. AFAICT, this is harmless.

Comment 30 Laurent Vivier 2021-10-01 13:00:29 UTC

Daniel,

if you think the problem can/must be solved in libvirt, please re-assign this BZ to libvirt component.

Comment 31 Daniel Berrangé 2021-10-01 13:06:50 UTC

I repeated my comments in the upstream thread now, so lets see where that discussion takes us upstream. My preference is to find a solution that works with existing QEMU releases, if that is viable.

Comment 33 RHEL Program Management 2021-11-17 07:27:11 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 34 yalzhang@redhat.com 2021-11-23 07:58:55 UTC

I guess this bug is closed by accident. I have tried on latest rhel 9, the behavior changes but still may need some fix. Please help to evaluate, Thank you! 

Managedsave for guest with failover setting succeed, but the hostdev interface will never register back. 

# rpm -q libvirt qemu-kvm kernel
libvirt-7.9.0-1.module+el8.6.0+13150+28339563.x86_64
qemu-kvm-6.1.0-4.module+el8.6.0+13039+4b81a1dc.x86_64
kernel-4.18.0-350.el8.x86_64

1. Start a vm with failover setting, and check in the vm, there are 3 interfaces and all looks good;

2. managedsave succeed
# virsh managedsave rhel 

Domain 'rhel' state saved by libvirt

after managedsave, the guest shuytdown, and check the inactive xml, there are 2 interfaces.
# virsh dumpxml rhel | grep /interface -B12
    <interface type='network'>
      <mac address='52:54:00:aa:1c:ef'/>
      <source network='host-bridge'/>
      <model type='virtio'/>
      <teaming type='persistent'/>
      <alias name='ua-backup0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:aa:1c:ef'/>
      <source network='hostdev-net'/>
      <teaming type='transient' persistent='ua-backup0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>

3. start the vm again, there are only 1 bridge interface in the live xml, and hostdev interface is gone
# virsh start rhel 
Domain 'rhel' started

# virsh dumpxml rhel | grep /interface -B12
    <interface type='bridge'>
      <mac address='52:54:00:aa:1c:ef'/>
      <source network='host-bridge' portid='1248188e-c8c9-4101-bf43-11514d71ed9e' bridge='br0'/>
      <target dev='vnet8'/>
      <model type='virtio'/>
      <teaming type='persistent'/>
      <alias name='ua-backup0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

check on the vm, there are only 2 interfaces: the master interface with net_failover driver and the bridge interface with virtio_net.

Comment 35 Laurent Vivier 2021-11-23 08:38:58 UTC

Dan,

what is the conclusion?

Do we need to manage the move to PAUSED state in QEMU failover or do you think libvirt can rely on the migration state to stop machine after the card has been unplugged ?

Comment 37 Daniel Berrangé 2021-11-23 09:40:59 UTC

Re-opened because it was prematurely closed by the auto-closer.

I still believe that we ought to be able to solve this exclusively in libvirt, so moving the bug to libvirt.

Comment 39 yalzhang@redhat.com 2022-09-15 05:51:31 UTC

Test on libvirt-8.5.0-6.el9.x86_64, managedsave can not finished, and after the it canceled, there are only 2 interfaces in the vm.
The VF can not register back. It's the same with comment 0.
# virsh managedsave rhel 
^C^C^C
error: Failed to save domain 'rhel' state
error: operation aborted: job 'domain save' canceled by client

On vm:
# ip l 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:aa:1a:ef brd ff:ff:ff:ff:ff:ff
3: enp1s0nsby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master enp1s0 state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:aa:1a:ef brd ff:ff:ff:ff:ff:ff
# ethtool -i  enp1s0 | grep driver
driver: net_failover
# ethtool -i  enp1s0nsby | grep driver
driver: virtio_net

Comment 41 RHEL Program Management 2023-09-22 15:50:41 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 42 RHEL Program Management 2023-09-22 15:52:06 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.