Bug 2167657

Summary: Migration hang when trying to cancel multifd migration sometimes
Product: Red Hat Enterprise Linux 9 Reporter: yafu <yafu>
Component: qemu-kvmAssignee: Juan Quintela <quintela>
qemu-kvm sub component: Live Migration QA Contact: Li Xiaohui <xiaohli>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: medium CC: chayang, jinzhao, leobras, nilal, peterx, quintela, virt-maint, xiaohli
Version: 9.2Keywords: Triaged
Target Milestone: rcFlags: xiaohli: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-04-17 08:29:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2160929    
Bug Blocks:    

Description yafu 2023-02-07 08:06:45 UTC
Description of problem:
Migration hang  when trying to cancel multifd migration sometimes, it's more easily to reproduce at the final stage of migration.

Version-Release number of selected component (if applicable):
libvirt-daemon-9.0.0-3.el9.x86_64
qemu-kvm-7.2.0-7.el9.x86_64

How reproducible:
10%

Steps to Reproduce:
1.Do multifd migraiton:
# virsh migrate test2 qemu+ssh://*.*.com/system --live --verbose --persistent  --parallel --parallel-connections 255

2.Cancel the migration with 'ctrl+c' when the migration almost complete(migration hand not only but easily happened during the final stage),  migration hang sometimes:
# # virsh migrate test2 qemu+ssh://*.*.com/system --live  --verbose --persistent  --parallel --parallel-connections 255  
Migration: [ 99 %]^C
Migration: [ 99 %]
Migration: [ 99 %]

3.Check domjobinfo on the source host:
# virsh domjobinfo test2 
Job type:         Unbounded   
Operation:        Outgoing migration
Time elapsed:     202769       ms
Data processed:   528.429 MiB
Data remaining:   256.953 MiB
Data total:       2.013 GiB
Memory processed: 528.429 MiB
Memory remaining: 256.953 MiB
Memory total:     2.013 GiB
Memory bandwidth: 50.588 MiB/s
Dirty rate:       0            pages/s
Page size:        4096         bytes
Iteration:        1           
Postcopy requests: 0           
Constant pages:   327905      
Normal pages:     133919      
Normal data:      523.121 MiB
Expected downtime: 300          ms
Setup time:       158          ms

4.Check the connection between source and target host:
# netstat -tuxnap|grep qemu-kvm|grep -E "4915|unix" 
unix  2      [ ACC ]     STREAM     LISTENING     115323316 3154235/qemu-kvm     /var/lib/libvirt/qemu/domain-20-test2/monitor.sock
unix  2      [ ACC ]     STREAM     LISTENING     115323315 3154235/qemu-kvm     /var/lib/libvirt/qemu/channel/target/domain-20-test2/org.qemu.guest_agent.0
unix  3      [ ]         STREAM     CONNECTED     115284933 3154235/qemu-kvm     
unix  3      [ ]         STREAM     CONNECTED     115045317 3154235/qemu-kvm     /var/lib/libvirt/qemu/domain-20-test2/monitor.sock
unix  3      [ ]         STREAM     CONNECTED     115284869 3154235/qemu-kvm     
unix  3      [ ]         STREAM     CONNECTED     115284942 3154235/qemu-kvm     /var/lib/libvirt/qemu/channel/target/domain-20-test2/org.qemu.guest_agent.0

5.Check the guest on the target host:
#virsh list
(no output)

Actual results:
Migration hang when trying to cancel multifd migration sometimes, migration may hang during different phase(in the middle of the migration or the final stage of migraiton)

Expected results:
Migration cancelled successfully.

Additional info:

Comment 2 Li Xiaohui 2023-02-13 09:37:21 UTC
Hi Yan, do we have libvirt case that corresponds to this bug? If yes, please help add the polarion link of this case, thanks

Comment 3 Li Xiaohui 2023-02-17 12:29:55 UTC
Hi Juan, 

I also reproduce this bug through libvirt on RHEL 9.2.0 (qemu-kvm-7.2.0-8.el9.x86_64 && libvirt-9.0.0-4.el9.x86_64)

And I found this bug not only happens on the final stage, but also on the iterative stage. You can see my test result:
[root@dell-per7525-25 home]# virsh migrate rhel920 qemu+ssh://$dst_host_ip/system --live --verbose --persistent --parallel  --parallel-connections 4 --zerocopy 
Migration: [ 41 %]^C
Migration: [ 41 %]
Migration: [ 41 %]^C
Migration: [ 41 %]

Above test scenario is same with Bug 2160929, but above test runs on x86. 

I think this bug should be duplicated with Bug 2160929. And Bug 2160929 should be adjusted to all hardware (aarch64, s390x, x86). Is right?

Comment 4 Li Xiaohui 2023-02-23 07:22:22 UTC
(In reply to Li Xiaohui from comment #3)
> Hi Juan, 
> 
> I also reproduce this bug through libvirt on RHEL 9.2.0
> (qemu-kvm-7.2.0-8.el9.x86_64 && libvirt-9.0.0-4.el9.x86_64)
> 
> And I found this bug not only happens on the final stage, but also on the
> iterative stage. You can see my test result:
> [root@dell-per7525-25 home]# virsh migrate rhel920
> qemu+ssh://$dst_host_ip/system --live --verbose --persistent --parallel 
> --parallel-connections 4 --zerocopy 
> Migration: [ 41 %]^C
> Migration: [ 41 %]
> Migration: [ 41 %]^C
> Migration: [ 41 %]
> 
> Above test scenario is same with Bug 2160929, but above test runs on x86. 
> 
> I think this bug should be duplicated with Bug 2160929. And Bug 2160929
> should be adjusted to all hardware (aarch64, s390x, x86). Is right?

Ok, I can also reproduce Bug 2160929 on x86 with 1/100 rate.

So perhaps Bug 2160929 and this bug are same ones.

Comment 5 yafu 2023-02-28 03:47:31 UTC
(In reply to Li Xiaohui from comment #2)
> Hi Yan, do we have libvirt case that corresponds to this bug? If yes, please
> help add the polarion link of this case, thanks

Hi, Xiaohui

Sorry, there is no testcase in polarion for this bug.

Comment 8 Li Xiaohui 2023-04-17 08:29:04 UTC
Hi all,
I would close this bug as duplicated with bug 2160929 according to https://bugzilla.redhat.com/show_bug.cgi?id=2160929#c13


Let me know if you have a different opinion.

Comment 9 Li Xiaohui 2023-04-17 08:30:35 UTC

*** This bug has been marked as a duplicate of bug 2160929 ***