Bug 983350
Summary: | The running Guest was paused while cancel the migration on the third machine | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | zhenfeng wang <zhwang> | ||||||||
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 7.0 | CC: | ajia, dyuan, gsun, mzhan, pkrempa, rbalakri, vivianzhang, ydu, zpeng | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | libvirt-1.2.7-1.el7 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | 983348 | Environment: | |||||||||
Last Closed: | 2015-03-05 07:20:50 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | 983348 | ||||||||||
Bug Blocks: | |||||||||||
Attachments: |
|
Description
zhenfeng wang
2013-07-11 03:35:31 UTC
The guest won't always be paused in rhel7,it always happens while the migration was finished more then 90%,just like # virsh -c qemu+ssh://xx.xx.xx.xx/system migrate --live rhel73 qemu+ssh://yy.yy.yy.yy/system --verbose --unsafe root.xx.xx's password: root.yy.yy's password: Migration: [ 96 %]^Cerror: internal error received hangup / error event on socket error: One or more references were leaked after disconnect from the hypervisor root.xx.xx's password: error: Reconnected to the hypervisor Fixed upstream with: commit b46c4787dde79b015dad67dedda4ccf6ff1a3082 Author: Peter Krempa <pkrempa> Date: Thu Aug 29 15:18:20 2013 +0200 virsh-domain: Avoid killing ssh transport tunnels when cancelling job The vshWatchJob function registers a SIGINT handler that is used to abort the active job and does not terminate virsh. Unfortunately, this breaks when using the ssh transport as SIGINT is sent to the foreground process group including the ssh transport processes which terminate. This breaks the connection and migration is left in a insane state. With this patch the terminal is modified to ignore key binding that sends SIGINT and does the handling manually. Resoves: https://bugzilla.redhat.com/show_bug.cgi?id=983348 commit ebef68936396f7eab077e883ac48c4ce0508afa2 Author: Peter Krempa <pkrempa> Date: Thu Aug 29 10:36:00 2013 +0200 virsh: Remember terminal state when starting and add helpers This patch adds instrumentation to allow modification of config of the terminal in virsh and successful reset of the state afterwards. The added helpers allow to disable receiving of SIGINT when pressing the key sequence (Ctrl+C usualy). This normally sends SIGINT to the foreground process group which kills ssh processes used for transport of the data. commit 8c725cc10daa666d47ab5a4f2ccc0b196ab608d8 Author: Peter Krempa <pkrempa> Date: Mon Aug 26 12:31:51 2013 +0200 virsh-domain: rename print_job_progress to vshPrintJobProgress Verify this issue with libvirt-1.2.7-1.el7.x86_64: 1. Set setenforce 1 && virt_use_nfs 1 (on both source and target) 2.prepare a guest which the image file is on the NFS server,and mount the nfs server on both source and target 3. start the guest on the source machine 4. Start the migrataion on the third machine, and cancel the migration during about 96% [root@rhel7-c ~]# virsh -c qemu+ssh://10.66.6.xx/system migrate rhel7 --live qemu+ssh://10.66.4.xx/system --verbose root.6.xx's password: root.4.xx's password: Migration: [ 1 %] Migration: [ 61 %] Migration: [ 73 %] Migration: [ 73 %]^[[A Migration: [ 74 %] Migration: [ 76 %] Migration: [ 81 %] Migration: [ 95 %] Migration: [ 96 %]error: operation aborted: migration job: canceled by client 4. The guest is still in Running status on source side, and not displayed on target side. Hello, peter when I do regression for this bug on rhel7.1, I found that after cancel the migration, the reported error still not accurate, but guest is still in running status. Could you please help me check whether it is a known issue for this bug? Version-Release number of selected component (if applicable): libvirt-1.2.8-6.el7.x86_64 qemu-kvm-rhev-2.1.2-6.el7.x86_64 kernel-3.10.0-195.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. set setenforce 1 && virt_use_nfs 1 (on both source and target) 2.prepare a guest which the image file is on the NFS server,and mount the nfs server on both source and target start the guest on the source machine # virsh list Id Name State ---------------------------------------------------- 80 vm2 running 3. start migration on the third machine, and ctrl+c to cancel the migration # virsh -c qemu+ssh://10.66.7.206/system migrate vm2 --live qemu+ssh://10.66.6.205/system --verbose root.7.206's password: root.6.205's password: Migration: [ 3 %]^Cerror: internal error: received hangup / error event on socket root.7.206's password: error: Reconnected to the hypervisor 4. check the guest status again # virsh list Id Name State ---------------------------------------------------- 80 vm2 running you can see that after ctrl+c the migration, the reported error seems still not accurate, and meanwhile to ask me input the source host password again. I think it would better to show the result as "error: operation aborted: migration job: canceled by client" Hope for your reply, thanks vivian zhang (In reply to vivian zhang from comment #8) > How reproducible: > 100% > > Steps to Reproduce: > > 1. set setenforce 1 && virt_use_nfs 1 (on both source and target) > > 2.prepare a guest which the image file is on the NFS server,and mount the > nfs server on both source and target > start the guest on the source machine > # virsh list > Id Name State > ---------------------------------------------------- > 80 vm2 running > > 3. start migration on the third machine, and ctrl+c to cancel the migration > # virsh -c qemu+ssh://10.66.7.206/system migrate vm2 --live Did you also upgrade libvirt on the machine running this command? As the issue was caused on the client side, it's necessery to specially upgrade the host running the virsh command. To make sure, please run "virsh version" (In reply to Peter Krempa from comment #9) > (In reply to vivian zhang from comment #8) > > > How reproducible: > > 100% > > > > Steps to Reproduce: > > > > 1. set setenforce 1 && virt_use_nfs 1 (on both source and target) > > > > 2.prepare a guest which the image file is on the NFS server,and mount the > > nfs server on both source and target > > start the guest on the source machine > > # virsh list > > Id Name State > > ---------------------------------------------------- > > 80 vm2 running > > > > 3. start migration on the third machine, and ctrl+c to cancel the migration > > # virsh -c qemu+ssh://10.66.7.206/system migrate vm2 --live > > Did you also upgrade libvirt on the machine running this command? As the > issue was caused on the client side, it's necessery to specially upgrade the > host running the virsh command. > > To make sure, please run "virsh version" hi, Peter the libvirt version has been updated to as below # virsh version Compiled against library: libvirt 1.2.8 Using library: libvirt 1.2.8 Using API: QEMU 1.2.8 Running hypervisor: QEMU 2.1.2 (In reply to vivian zhang from comment #10) > (In reply to Peter Krempa from comment #9) > > (In reply to vivian zhang from comment #8) ... > > hi, Peter > the libvirt version has been updated to as below > > # virsh version > Compiled against library: libvirt 1.2.8 > Using library: libvirt 1.2.8 > Using API: QEMU 1.2.8 > Running hypervisor: QEMU 2.1.2 In that case this should not happen. Can you please provide debug logs from both the client and the daemon that would show the issue happening. (In reply to Peter Krempa from comment #11) > (In reply to vivian zhang from comment #10) > > (In reply to Peter Krempa from comment #9) > > > (In reply to vivian zhang from comment #8) > > ... > > > > > hi, Peter > > the libvirt version has been updated to as below > > > > # virsh version > > Compiled against library: libvirt 1.2.8 > > Using library: libvirt 1.2.8 > > Using API: QEMU 1.2.8 > > Running hypervisor: QEMU 2.1.2 > > In that case this should not happen. Can you please provide debug logs from > both the client and the daemon that would show the issue happening. hi,Peter I captured 3 logs: 1. use debug command on the third machine to get client log with name client1113.log # LIBVIRT_DEBUG=1 virsh -c qemu+ssh://10.66.7.206/system migrate rhel6new --live qemu+ssh://10.66.6.205/system --verbose 2. the source and target libvirtd.log with setting log_level=1 please check firstly, anything unclear, please contact me. thanks vivianzhang Created attachment 956908 [details]
client debug log
Created attachment 956909 [details]
libvirtd source tar log
Created attachment 956910 [details]
libvirtd target tar log
I can produce this bug on build libvirt-1.1.1-29.el7.x86_64 qemu-kvm-rhev-1.5.3-60.el7_0.9.x86_64 I could not reproduce the issue described as comment8 anymore, so verify it on the latest build libvirt-1.2.8-11.el7.x86_64 qemu-kvm-rhev-2.1.2-17.el7.x86_64 verify steps: 1. prepare a migration env with img mount with nfs server on both source and target host 2. setenforce 1 and virt_us_nfs on 3. prepare the third machine, do migration, cancel the process nearly 90% # virsh -c qemu+ssh://xx.xx.xx.xx/system migrate rhel7 --live qemu+ssh://xx.xx.xx.xx/system --verbose root.xx.xx's password: root.xx.xx's password: Migration: [ 45 %] Migration: [ 47 %] Migration: [ 55 %] Migration: [ 62 %] Migration: [ 71 %] Migration: [ 82 %] Migration: [ 88 %] Migration: [ 90 %] Migration: [ 92 %] Migration: [ 94 %] Migration: [ 95 %]^Cerror: operation aborted: migration job: canceled by client 4. check the guest on source host, still running, and works well # virsh list Id Name State ---------------------------------------------------- 10 rhel7 running 5. configure the guest with spice connection, open it using virt-viewer, repeat step 3-4, get the same result # virsh -c qemu+ssh://xx.xx.xx.xx/system migrate rhel7 --live qemu+ssh://xx.xx.xx.xx/system --verbose root.xx.xx's password: root.xx.xx's password: Migration: [ 95 %]^Cerror: operation aborted: migration job: canceled by client move to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html |