Bug 1974366

Summary: Fail to set migrate incoming for 2nd time after the first time failed
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Li Xiaohui <xiaohli>
Component: qemu-kvmAssignee: Leonardo Bras <leobras>
qemu-kvm sub component: Live Migration QA Contact: Li Xiaohui <xiaohli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: chayang, ddepaula, dgilbert, fjin, jinzhao, leobras, mdean, peterx, qzhang, virt-maint
Version: 8.5Keywords: Triaged
Target Milestone: rc   
Target Release: 8.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-6.0.0-29.module+el8.5.0+12386+43574bac Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1974683 (view as bug list) Environment:
Last Closed: 2021-11-16 07:54:24 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1974683    

Description Li Xiaohui 2021-06-21 13:25:20 UTC
Description of problem:
The first time set migrate incoming failed due to wrong listening address, then try again, but failed to set the right migrate incoming for 2nd time:
{"execute": "migrate-incoming", "arguments": {"uri": "tcp:10.73.130.69:4000"}, "id": "iyPg3lJW"}
{"id": "iyPg3lJW", "error": {"class": "GenericError", "desc": "duplicate yank instance"}}


Version-Release number of selected component (if applicable):
hosts info: kernel-4.18.0-310.el8.x86_64 & qemu-kvm-6.0.0-19.module+el8.5.0+11385+6e7d542e.x86_64


How reproducible:
100%


Steps to Reproduce:
1.Boot a vm on dst host with "-incoming defer"
2.Set migrate incoming on dst host, will get expected error because it should be $dst_host_ip:
{"execute": "migrate-incoming", "arguments": {"uri": "tcp:$src_host_ip:4000"}, "id": "iyPg3lJW"}
{"timestamp": {"seconds": 1624272064, "microseconds": 65049}, "event": "MIGRATION", "data": {"status": "setup"}}
{"id": "iyPg3lJW", "error": {"class": "GenericError", "desc": "Failed to bind socket: Cannot assign requested address"}}
3.Then try again step 2
{"execute": "migrate-incoming", "arguments": {"uri": "tcp:$dst_host_ip:4000"}, "id": "iyPg3lJW"}
{"id": "iyPg3lJW", "error": {"class": "GenericError", "desc": "duplicate yank instance"}}


Actual results:
Can't succeed setting migrate incoming for the 2nd time after the first time failed


Expected results:
succeed setting migrate incoming for the 2nd time after the first time failed


Additional info:
1.Didn't hit such issue on host(qemu-kvm-4.2.0-52.module+el8.5.0+11386+ef5875dd.x86_64)
2.This issue should be related with yank (Bug 1956897)
3.Didn't reproduce bz on libvirt-daemon-7.4.0-1.module+el8.5.0+11218+83343022.x86_64 since libvirt haven't implemented yank(Bug 1955195)

Comment 1 Fangge Jin 2021-06-21 13:54:53 UTC
I guess we won't meet such issue in libvirt, because when the first "migrate-incoming" fails, qemu process on dest host will be stopped by libvirt

Comment 2 Li Xiaohui 2021-06-21 13:59:50 UTC
(In reply to Fangge Jin from comment #1)
> I guess we won't meet such issue in libvirt, because when the first
> "migrate-incoming" fails, qemu process on dest host will be stopped by
> libvirt

Got it, thanks for the explanation.

Comment 3 Dr. David Alan Gilbert 2021-06-21 14:11:19 UTC
Yes afree with Fangge's comments, but it is a valid bug that we should fix.

Comment 4 Leonardo Bras 2021-06-22 02:20:47 UTC
This bugs reproduces upstream.

Comment 5 Li Xiaohui 2021-06-22 02:34:37 UTC
Hi leonardo, shall we clone this bz on rhel9 since the problem also happens.

Comment 6 Leonardo Bras 2021-06-22 03:31:19 UTC
Sure(In reply to Li Xiaohui from comment #5)
> Hi leonardo, shall we clone this bz on rhel9 since the problem also happens.

Sure!


I have sent a v1 patch upstream fixing this issue, but it is taking a while for the archives / patchwork to update.
I will soon update with a patchwork link.

Comment 8 Li Xiaohui 2021-06-22 10:12:21 UTC
(In reply to Leonardo Bras from comment #6)
> Sure(In reply to Li Xiaohui from comment #5)
> > Hi leonardo, shall we clone this bz on rhel9 since the problem also happens.
> 
> Sure!

Thank you. I have filed a bz on rhel9:
Bug 1974683 - Fail to set migrate incoming for 2nd time after the first time failed

> 
> 
> I have sent a v1 patch upstream fixing this issue, but it is taking a while
> for the archives / patchwork to update.
> I will soon update with a patchwork link.

Comment 9 Leonardo Bras 2021-07-20 04:37:27 UTC
Updates:
- I sent a v2 fixing a few issues from v1 (for reference only)
http://patchwork.ozlabs.org/project/qemu-devel/patch/20210629050522.147057-1-leobras@redhat.com/

But as seen in comment #2 from Peter Xu, there could be a more extended solution that would fix more possible bugs.
This patch series proposed by Peter can be seen here:
http://patchwork.ozlabs.org/project/qemu-devel/list/?series=251186&state=%2A&archive=both

This solution got accepted, which made my v2 unnecessary. 
The commit IDs for this patch series are:

cc48c587d25ff5dd7dddb4e5072de9ca8464c832  migration: Move yank outside qemu_start_incoming_migration()
b7f9afd48e7bc5c341e55348f2c2eed08314be7d migration: Allow reset of postcopy_recover_triggered when failed

Comment 12 Li Xiaohui 2021-08-17 06:46:17 UTC
Hi Leonardo,
Could we get this bz on_qa before ITM 26 (Aug 30)? And I need several days to verify it. 

If can't, we'd better move ITM from 8.5.0 to 8.6.0, thanks.

Comment 18 John Ferlan 2021-08-24 15:09:08 UTC
NB: changed ITM=26 because the RHEL rule to apply release+ requires it...

Comment 23 Yanan Fu 2021-08-26 01:46:59 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 24 Li Xiaohui 2021-08-26 07:14:21 UTC
Test this bz on hosts(kernel-4.18.0-330.el8.x86_64 & qemu-kvm-6.0.0-29.module+el8.5.0+12386+43574bac.x86_64), test pass, mark this bz verified.

Comment 26 errata-xmlrpc 2021-11-16 07:54:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4684