Bug 1956897 - RFE: Allow killing stuck migration connection
Summary: RFE: Allow killing stuck migration connection
Keywords:
Status: POST
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.4
Assignee: Virtualization Maintenance
QA Contact: Li Xiaohui
URL:
Whiteboard:
Depends On:
Blocks: 1955195
TreeView+ depends on / blocked
 
Reported: 2021-05-04 15:42 UTC by Michal Privoznik
Modified: 2021-05-17 08:26 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1955195
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description Michal Privoznik 2021-05-04 15:42:34 UTC
+++ This bug was initially created as a clone of Bug #1955195 +++

Description of problem:

Libvirt is planning on adopting 'yank' command (see bug 1955195) that was implemented in upstream QEMU in commit v6.0.0-rc0~150^2~6 (and related commits).

This is an RFE to do whatever is needed to get the command into RHEL-AV.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Michal Privoznik 2021-05-04 15:46:38 UTC
An idea how to test 'yank' command is to start migration and then inject firewall rules that would drop packets silently (that is DROP instead of REJECT - so that the source doesn't get notified).

Comment 4 Li Xiaohui 2021-05-12 13:03:41 UTC
Tested 'yank' on rhelav-8.5.0 (kernel-4.18.0-304.3.el8.x86_64 & qemu-img-6.0.0-16.module+el8.5.0+10848+2dccc46d.x86_64)


Test scenarios: 
1.inject firewall via drop on dst host when migration is active, then migration hang, use yank to fail migration. 


Test steps:
1.Boot a guest on src host;
2.Boot a guest on dst host with '-incoming defer';
3.Set migration incoming on dst host via qmp cmd;
{"execute":"migrate-incoming","arguments":{"uri":"tcp:[::]:1234"}}
4.Start migration on src host via qmp cmd;
{"execute": "migrate","arguments":{"uri": "tcp:10.73.130.69:1234"}}
5.During migration is active, inject firewall via drop on dst host:
# iptables -A INPUT -p tcp --dport 1234 -j DROP
6.After migration hang(query migrate, only the total time of migration is increasing, other migration params stay unchanged), use yank cmd to fail migration on src host:
{ "execute": "query-yank" }
{"return": [{"type": "chardev", "id": "qmp_id_qmpmonitor1"}, {"type": "chardev", "id": "qmp_id_catch_monitor"}, {"type": "chardev", "id": "compat_monitor0"}, {"type": "chardev", "id": "serial0"}, {"type": "migration"}]}
{"execute":"yank","arguments":{"instances":[{"type":"migration"}]}}


Actual result:
After step 6, query migration status, get failed status on src host via qmp cmd, I think the result is expected:
{"execute":"query-migrate"}
{"return": {"blocked": false, "status": "failed", "error-desc": "Unable to write to socket: Broken pipe"}}

Next we need quit qemu process on dst host by manually, then close firewall on dst host, we can start migration again and vm works well on dst host after migration.



BTW, who could help answer following two questions: 
Question 1) do we need test 'yank' with network failure scenario(hit a failure of the migration network during migration is active)???
I think network failure scenario is more nearly the requirement about 'yank' as it would hit qemu hang issue but inject firewall only hit migration hang(qemu and qmp still work well) :
*******************************************************************************
+# A yank instance can be yanked with the @yank qmp command to recover from a hanging QEMU.
+#
+# Currently implemented yank instances:
+#  - nbd block device:
+#    Yanking it will shut down the connection to the nbd server without
+#    attempting to reconnect.
+#  - socket chardev:
+#    Yanking it will shut down the connected socket.
+#  - migration:
+#    Yanking it will shut down all migration connections. Unlike
+#    @migrate_cancel, it will not notify the migration process, so migration
+#    will go into @failed state, instead of @cancelled state. @yank should be
+#    used to recover from hangs.

Question 2) shall qemu on dst host quit by automatically after executing 'yank' command as migration would fail?

Comment 5 Li Xiaohui 2021-05-12 13:12:48 UTC
Besides migration, yank is also related with nbd block, chardev (You can see the last content in Comment 4 or see defails from downstream qemu-kvm-6.0 commit: 50186051f425da3ace2425371c5271d0b64e7122).

Comment 6 Dr. David Alan Gilbert 2021-05-12 13:58:34 UTC
Thanks, that's a good test.  It would be better to use the "oob" capability (see https://github.com/qemu/qemu/blob/master/docs/interop/qmp-spec.txt#L116 )
that way even if there is a currently executing QMP command that's blocked, the 'yank' command should still execute.

Comment 7 John Ferlan 2021-05-14 14:34:01 UTC
Moving this to POST since this was included in qemu-6.0 and as noted in comment 4 is already testable.

I set ITM=14 mainly to get the release+ - feel free to use a later one for completion of "new" tests though
I did not set DTM, theoretically it could be 10 as that's about when the code was built, but that'll probably anger the dev missed bot since 10 already passed.

Danilo - I'll let you do the rest of the magic to move to ON_QA

Comment 8 Li Xiaohui 2021-05-17 08:26:05 UTC
(In reply to Dr. David Alan Gilbert from comment #6)
> Thanks, that's a good test.  It would be better to use the "oob" capability
> (see https://github.com/qemu/qemu/blob/master/docs/interop/qmp-spec.txt#L116
> )
> that way even if there is a currently executing QMP command that's blocked,
> the 'yank' command should still execute.

Thanks for the reminder. 

I will test network failure scenario when the machines are available.


Note You need to log in before you can comment on or make changes to this bug.