Bug 1815426
Summary: | Virsh managedsave never finished if guest have net failover element | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Luyao Huang <lhuang> |
Component: | libvirt | Assignee: | Laine Stump <laine> |
libvirt sub component: | Networking | QA Contact: | yalzhang <yalzhang> |
Status: | CLOSED MIGRATED | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | aadam, ailan, berrange, chayang, dyuan, jinzhao, jsuchane, juzhang, laine, lvivier, quintela, virt-maint, xuzhang, yalzhang, yama, yanghliu, yanqzhan, yicui |
Version: | 9.0 | Keywords: | MigratedToJIRA, Triaged |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-09-22 15:52:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Luyao Huang
2020-03-20 08:50:46 UTC
hi I am going to change qemu savevm (the part that implements managedsave) to return one error if failover is setup. It is not clear to me of anything reasonable to do if we are using failover. There is nothing that requires that when we return there is a device available to do the failover back of the assigned device. We can add later the capability of just disabling the device assignment if there is a managed save. Adding a NeedInfo to lain to see what he thinks about it from the libvirt point of view. libvirt doesn't use the savevm command. It implements its managedsave by migrating to an open fd, then stopping the qemu process. Here is a trace of the monitor commands leading up to the "hang" (I produced this with "stap examples/systemtap/qemu-monitor.stp"): 92.664 > 0x7f8d60037a60 {"execute":"stop","id":"libvirt-362"} 92.669 ! 0x7f8d60037a60 {"timestamp": {"seconds": 1586397518, "microseconds": 481155}, "event": "STOP"} 93.326 < 0x7f8d60037a60 {"return": {}, "id": "libvirt-362"} 93.333 > 0x7f8d60037a60 {"execute":"migrate_set_speed","arguments":{"value":9223372036853727232},"id":"libvirt-363"} 93.335 < 0x7f8d60037a60 {"return": {}, "id": "libvirt-363"} 93.336 > 0x7f8d60037a60 {"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-364"} (fd=36) 93.337 < 0x7f8d60037a60 {"return": {}, "id": "libvirt-364"} 93.337 > 0x7f8d60037a60 {"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-365"} 93.339 ! 0x7f8d60037a60 {"timestamp": {"seconds": 1586397519, "microseconds": 150972}, "event": "MIGRATION", "data": {"status": "setup"}} 93.339 ! 0x7f8d60037a60 {"timestamp": {"seconds": 1586397519, "microseconds": 151133}, "event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"}} 93.339 < 0x7f8d60037a60 {"return": {}, "id": "libvirt-365"} 93.339 > 0x7f8d60037a60 {"execute":"query-migrate","id":"libvirt-366"} 93.358 ! 0x7f8d60037a60 {"timestamp": {"seconds": 1586397519, "microseconds": 170526}, "event": "MIGRATION_PASS", "data": {"pass": 1}} 93.358 ! 0x7f8d60037a60 {"timestamp": {"seconds": 1586397519, "microseconds": 170708}, "event": "MIGRATION", "data": {"status": "wait-unplug"}} 93.358 < 0x7f8d60037a60 {"return": {"status": "wait-unplug"}, "id": "libvirt-366"} So I don't think disabling the savevm command will have the effect you're expecting. Also, I don't agree that disabling managedsave when there is a failover device is the ideal solution - even though there is no guarantee there will be an assigned device available at the time of the restore, that shouldn't be an issue - if that's the case then libvirt will just refuse to restore; the user can do whatever is necessary to make the resource available, then try again (we could also enhance it to allow restore without the assigned device (similar to how we allow restore with a missing USB device), in which case the restored guest would be operating with the backup device. Hi I found the problem, not yet the solution: (gdb) bt #0 0x00007f9d4f98eb18 in futex_abstimed_wait_cancelable (private=0, abstime=0x7f9cb5ee4690, clockid=0, expected=0, futex_word=0x55764f1e96a8) at ../sysdeps/unix/sysv/linux/futex-internal.h:208 #1 do_futex_wait (sem=sem@entry=0x55764f1e96a8, abstime=abstime@entry=0x7f9cb5ee4690, clockid=0) at sem_waitcommon.c:112 #2 0x00007f9d4f98ec43 in __new_sem_wait_slow (sem=sem@entry=0x55764f1e96a8, abstime=abstime@entry=0x7f9cb5ee4690, clockid=0) at sem_waitcommon.c:184 #3 0x00007f9d4f98ecd2 in sem_timedwait (sem=sem@entry=0x55764f1e96a8, abstime=abstime@entry=0x7f9cb5ee4690) at sem_timedwait.c:39 #4 0x000055764cf62e4f in qemu_sem_timedwait (sem=sem@entry=0x55764f1e96a8, ms=ms@entry=250) at /usr/src/debug/qemu-5.0.0-2.fc31.x86_64/util/qemu-thread-posix.c:306 #5 0x000055764cdfc345 in migration_thread (opaque=0x55764f1e9420) at /usr/src/debug/qemu-5.0.0-2.fc31.x86_64/migration/migration.c:3424 #6 0x000055764cf62813 in qemu_thread_start (args=0x55764f23f420) at /usr/src/debug/qemu-5.0.0-2.fc31.x86_64/util/qemu-thread-posix.c:519 #7 0x00007f9d4f9854e2 in start_thread (arg=<optimized out>) at pthread_create.c:479 #8 0x00007f9d4f8b46a3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) list 3419 migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, 3420 MIGRATION_STATUS_WAIT_UNPLUG); 3421 3422 while (s->state == MIGRATION_STATUS_WAIT_UNPLUG && 3423 qemu_savevm_state_guest_unplug_pending()) { 3424 qemu_sem_timedwait(&s->wait_unplug_sem, 250); 3425 } 3426 3427 migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, 3428 MIGRATION_STATUS_ACTIVE); Trying to understand why it is not unpluging the network card on time. I can reproduce it easily now. Hi Just got back to this bugzilla. And there is no way to proper fix it. virsh managedsave libvirt just does: * pause guest * dump memory with live migration and what network failure does inside qemu is: * hotunplug VF * do the proper migration So, libvirt stops the guest, and qemu after that waits for the guest to answer to the hotunplug event, but the guest is paused. So, we can't do anything inside qemu, the only two things that I can think of is changing qemu to: * just fail if we have network failure, and give an error that is not possible * do the hot-unplug inside libvirt, and be careful with cancelations and errors. Laine, what do you think? Hi again I am changing the code on qemu to give one error if we request a migration of guest that is paused and that needs hot-unplug. This fixes the hang part of this bug, but not the managedsave bit, for that we need "colaboration" and changes on libvirt itself. My current better idea is creating a new migration parameter: - tentatively named pause-during-migration - that do the hot-unplug, pause the guest, and then do the migration, and after it, continue the guest - why? otherwise, we need multiple changes in libvirt, that do the right thing, i.e. unplug the network card, do the migration, and handle all the cancel/error cases correctly. Later, Juan. Could the pause have the same code that's been added to the first stage of migration? Or is that command synchronous? If it's synchronous, maybe a new asynchronous version of the pause is needed. Could you re-test with RHEL-AV-8.5.0 to see if the problem has been fixed by the rebase? Thanks Hi Laurent, This problem can still be reproduced in the following test env: host: 4.18.0-315.el8.x86_64 qemu-kvm-6.0.0-20.module+el8.5.0+11499+199527ef.x86_64 guest: 4.18.0-314.el8.x86_64 Related qmp log when running the "virsh managedsave $domain" cmd: > {"execute":"stop","id":"libvirt-391"} ! {"timestamp": {"seconds": 1624354140, "microseconds": 406765}, "event": "STOP"} < {"return": {}, "id": "libvirt-391"} > {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":false},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":false},{"capability":"late-block-activate","state":false},{"capability":"multifd","state":false},{"capability":"dirty-bitmaps","state":false}]},"id":"libvirt-392"} < {"return": {}, "id": "libvirt-392"} > {"execute":"migrate-set-parameters","arguments":{"max-bandwidth":9223372036853727232},"id":"libvirt-393"} < {"return": {}, "id": "libvirt-393"} > {"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-394"} (fd=41) < {"return": {}, "id": "libvirt-394"} > {"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-395"} < {"return": {}, "id": "libvirt-395"} ! {"timestamp": {"seconds": 1624354140, "microseconds": 451949}, "event": "MIGRATION", "data": {"status": "setup"}} ! {"timestamp": {"seconds": 1624354140, "microseconds": 452059}, "event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"}} ! {"timestamp": {"seconds": 1624354140, "microseconds": 457243}, "event": "MIGRATION_PASS", "data": {"pass": 1}} ! {"timestamp": {"seconds": 1624354140, "microseconds": 457339}, "event": "MIGRATION", "data": {"status": "wait-unplug"}} > {"execute":"query-migrate","id":"libvirt-396"} < {"return": {"blocked": false, "status": "wait-unplug"}, "id": "libvirt-396"} <---- the "virsh managedsave $domain" cmd can not finish. Hi Laurent, If this bug will be fixed in RHEL8.5, could you please help setup the ITR and the DTM ? Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. Removed the ITR from all bugs as part of the change. Laine, from libvirt point of view, do you agree with the idea proposed by Juan in comment #12 ? To create a new migration parameter named "pause-during-migration" (any suggestion?), do the hot-unplug, pause the guest, and then do the migration, replug the card and after it, continue the guest. I've proposed a patch upstream: https://patchew.org/QEMU/20210930170926.1298118-1-lvivier@redhat.com/ Author: Laurent Vivier <lvivier> Date: Mon Sep 27 14:53:25 2021 +0200 failover: allow to pause the VM during the migration If we want to save a snapshot of a VM to a file, we used to follow the following steps: 1- stop the VM: (qemu) stop 2- migrate the VM to a file: (qemu) migrate "exec:cat > snapshot" 3- resume the VM: (qemu) cont After that we can restore the snapshot with: qemu-system-x86_64 ... -incoming "exec:cat snapshot" (qemu) cont But when failover is configured, it doesn't work anymore. As the failover needs to ask the guest OS to unplug the card the machine cannot be paused. This patch introduces a new migration parameter, "pause-vm", that asks the migration to pause the VM during the migration startup phase after the the card is unplugged. Once the migration is done, we only need to resume the VM with "cont" and the card is plugged back: 1- set the parameter: (qemu) migrate_set_parameter pause-vm on 2- migrate the VM to a file: (qemu) migrate "exec:cat > snapshot" The primary failover card (VFIO) is unplugged and the VM is paused. 3- resume the VM: (qemu) cont The VM restarts and the primary failover card is plugged back The VM state sent in the migration stream is "paused", it means when the snapshot is loaded or if the stream is sent to a destination QEMU, the VM needs to be resumed manually. Signed-off-by: Laurent Vivier <lvivier> Okay, with the concrete example I have a better idea how to respond to your question from Comment 26 :-). libvirt migrates to a file in 3 places (that I see): 1) qemuSnapshotCreateActiveExternal 2) qemuDomainSaveInternal In both of these cases, the CPUs are always paused (qemuProcessStopCPUs()) prior to the migrate-to-file (qemuMigrationSrcToFile()) 2) doCoreDump In *some* cases the CPUs are paused prior to migrate-to-file, but in other cases (a) when VIR_DUMP_LIVE is set and b) when the coredump is in response to a watchdog event) the CPUs are *NOT* paused. So if we can easily determine that this new parameter is available (I'm guessing we'll be able to detect and map it to a qemu capability flag just as with other version-specific things) then the call to qemuProcessStopCPUs() could be _replaced_ with setting this parameter in (1) and (2) (I haven't looked through the error recovery paths, but likely there will be places where we'll need to change behavior). But I don't know what to do in the cases of (2) where we are currently doing the migrate without pausing CPUs. (Maybe it's unimportant, and we can just fail in those cases? Dan?) I'm not sure that we need a new parameter in QEMU at all. In these cases we have the issue because we do "stop" and then "migrate", but QEMU has long supported running "migrate" and then "stop". The only tricky bit here is that we need to wait for the failover unplug to complete before invoking "stop". QEMU emits events when the state of the migration changes. IIUC, with failover, we start in "wait-unplug" and then transition to "active" when unplug is done. IOW, it looks like we can already solve this in livirt - If failover - migrate - wait for event signalling "active" state - stop - else - stop - migrate The only downside to this is that we have a tiny window where migration started transferring memory and the CPUs havent been paused by libirt yet. AFAICT, this is harmless. Daniel, if you think the problem can/must be solved in libvirt, please re-assign this BZ to libvirt component. I repeated my comments in the upstream thread now, so lets see where that discussion takes us upstream. My preference is to find a solution that works with existing QEMU releases, if that is viable. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. I guess this bug is closed by accident. I have tried on latest rhel 9, the behavior changes but still may need some fix. Please help to evaluate, Thank you! Managedsave for guest with failover setting succeed, but the hostdev interface will never register back. # rpm -q libvirt qemu-kvm kernel libvirt-7.9.0-1.module+el8.6.0+13150+28339563.x86_64 qemu-kvm-6.1.0-4.module+el8.6.0+13039+4b81a1dc.x86_64 kernel-4.18.0-350.el8.x86_64 1. Start a vm with failover setting, and check in the vm, there are 3 interfaces and all looks good; 2. managedsave succeed # virsh managedsave rhel Domain 'rhel' state saved by libvirt after managedsave, the guest shuytdown, and check the inactive xml, there are 2 interfaces. # virsh dumpxml rhel | grep /interface -B12 <interface type='network'> <mac address='52:54:00:aa:1c:ef'/> <source network='host-bridge'/> <model type='virtio'/> <teaming type='persistent'/> <alias name='ua-backup0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> <interface type='network'> <mac address='52:54:00:aa:1c:ef'/> <source network='hostdev-net'/> <teaming type='transient' persistent='ua-backup0'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </interface> 3. start the vm again, there are only 1 bridge interface in the live xml, and hostdev interface is gone # virsh start rhel Domain 'rhel' started # virsh dumpxml rhel | grep /interface -B12 <interface type='bridge'> <mac address='52:54:00:aa:1c:ef'/> <source network='host-bridge' portid='1248188e-c8c9-4101-bf43-11514d71ed9e' bridge='br0'/> <target dev='vnet8'/> <model type='virtio'/> <teaming type='persistent'/> <alias name='ua-backup0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> check on the vm, there are only 2 interfaces: the master interface with net_failover driver and the bridge interface with virtio_net. Dan, what is the conclusion? Do we need to manage the move to PAUSED state in QEMU failover or do you think libvirt can rely on the migration state to stop machine after the card has been unplugged ? Re-opened because it was prematurely closed by the auto-closer. I still believe that we ought to be able to solve this exclusively in libvirt, so moving the bug to libvirt. Test on libvirt-8.5.0-6.el9.x86_64, managedsave can not finished, and after the it canceled, there are only 2 interfaces in the vm. The VF can not register back. It's the same with comment 0. # virsh managedsave rhel ^C^C^C error: Failed to save domain 'rhel' state error: operation aborted: job 'domain save' canceled by client On vm: # ip l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:aa:1a:ef brd ff:ff:ff:ff:ff:ff 3: enp1s0nsby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master enp1s0 state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:aa:1a:ef brd ff:ff:ff:ff:ff:ff # ethtool -i enp1s0 | grep driver driver: net_failover # ethtool -i enp1s0nsby | grep driver driver: virtio_net Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |