Bug 1256213
Summary: | "Virsh migrate" hangs after virsh keepalive times out. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Fangge Jin <fjin> | ||||||
Component: | libvirt | Assignee: | Virtualization Maintenance <virt-maint> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Fangge Jin <fjin> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 8.0 | CC: | dyuan, dzheng, fjin, jdenemar, jsuchane, knoel, mvanderw, mzhan, xuzhang, zpeng | ||||||
Target Milestone: | rc | Keywords: | Triaged | ||||||
Target Release: | 8.1 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2021-05-15 07:30:42 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1288337 | ||||||||
Attachments: |
|
Description
Fangge Jin
2015-08-24 05:01:31 UTC
Created attachment 1066202 [details]
libvirtd debug log
Created attachment 1066203 [details]
gdb output
What's the behaviour when you press Ctrl-C after that message about keepalive timeout is printed out? (In reply to Martin Kletzander from comment #4) > What's the behaviour when you press Ctrl-C after that message about > keepalive timeout is printed out? After press Ctrl-C, it prints out "migration job: canceled by client" and exits: [root@fjin-4-141 test]# virsh -k2 -K20 migrate rhel6.6-GUI --live --verbose qemu+ssh://10.66.4.208/system root.4.208's password: Migration: [ 71 %]2015-08-25 01:55:44.625+0000: 22951: info : libvirt version: 1.2.17, package: 6.el7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2015-08-21-20:23:32, x86-035.build.eng.bos.redhat.com) 2015-08-25 01:55:44.625+0000: 22951: warning : virKeepAliveTimerInternal:143 : No response from client 0x7fe50206bf70 after 20 keepalive messages in 42 seconds Migration: [ 71 %]^Cerror: operation aborted: migration job: canceled by client [root@fjin-4-141 test]# I tried with build libvirt-1.2.17-3.el7+bz1256213.x86_64, the behaviour is same as before. I'm still trying to figure out how is it possible for you to get that kinds of outputs. Looking at everything, the most interesting part I notice is that it really looks like you're getting disconnections from the target host, but virsh does not set up keepalive for the destination when migrating. And I can't reproduce it following the steps that you have. What's the output of 'virsh uri'? I can still get the same results as before following the steps. The output of 'virsh uri' on both source and target is : # virsh uri qemu:///system And why do you say "virsh does not set up keepalive for the destination when migrating"? I think the message virsh printed after disconnection can indicate it had keepalive for the destination: warning : virKeepAliveTimerInternal:143 : No response from client 0x7fe50206bf70 after 20 keepalive messages in 42 seconds I meant htat no matter which way I look at the source, virsh only sets up client keepalive on the connection to source, not destination. And that's not even considering p2p migrations and the like. This may be already fixed by patches I pushed upstream some time ago. Could you please retest this with the current version of libvirt? Tried on build libvirt-1.3.5-1.el7.x86_64, it seems the problem still exists. Steps: 1.Do migration: # time virsh -k2 -K20 migrate rhel6 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --verbose --live Migration: [ 1 %] 2. Before migration completes, on target host, do: # iptables -A OUTPUT -s <source ip> -j DROP # iptables -A INTPUT -s <source ip> -j DROP 3. Wait more than 1 minute, virsh doesn't exit and no error message outputs: # time virsh -k2 -K20 migrate rhel6 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --verbose --live Migration: [ 1 %] 4. On target host: # iptables -F 5. Wait a while, virsh exits: # time virsh -k2 -K20 migrate rhel6 qemu+ssh://hp-dl385g7-06.lab.eng.pek2.redhat.com/system --verbose --live Migration: [ 1 %]error: operation failed: migration job: unexpectedly failed real 2m1.763s user 0m0.043s sys 0m0.062s It works fine with peer-to-peer migration controlled by the source libvirtd. This issue only affects non-p2p migration when the client controls the migration by calling several APIs on each side of the migration. The source libvirtd cannot see when a connection between the client and the destination host breaks and thus it cannot automatically abort the migration. The client itself will need to do this. *** Bug 1367620 has been marked as a duplicate of this bug. *** This bug is going to be addressed in next major release. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. |