RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 983350 - The running Guest was paused while cancel the migration on the third machine
Summary: The running Guest was paused while cancel the migration on the third machine
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Peter Krempa
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 983348
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-11 03:35 UTC by zhenfeng wang
Modified: 2016-04-26 14:21 UTC (History)
9 users (show)

Fixed In Version: libvirt-1.2.7-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of: 983348
Environment:
Last Closed: 2015-03-05 07:20:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
client debug log (807.97 KB, text/plain)
2014-11-13 02:11 UTC, vivian zhang
no flags Details
libvirtd source tar log (1.30 MB, application/x-gzip)
2014-11-13 02:18 UTC, vivian zhang
no flags Details
libvirtd target tar log (86.62 KB, application/x-gzip)
2014-11-13 02:18 UTC, vivian zhang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0323 0 normal SHIPPED_LIVE Low: libvirt security, bug fix, and enhancement update 2015-03-05 12:10:54 UTC

Description zhenfeng wang 2013-07-11 03:35:31 UTC
+++ This bug was initially created as a clone of Bug #983348 +++


Description of problem:
The running Guest was paused while cancel the migration on the third machine which connect the source machine with the remote access

Version-Release number of selected component (if applicable):
kernel-2.6.32-358.2.1.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64
libvirt-0.10.2-19.el6.x86_64
How reproducible:
100%

Steps
1. set setenforce 1 && virt_use_nfs 1 (on both source and target)

2.prepare a guest which the image file is on the NFS server,and mount the nfs server on both source and target
start the guest on the source machine
#virsh start rhelguest1
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhelguest1                         running
3.Start the migrataion on the third machine
# virsh -c qemu+ssh://xx.xx.xx.xx/system migrate rhelguest1 --live qemu+ssh://yy.yy.yy.yy/system --verbose
The authenticity of host 'xx.xx.xx.xx (xx.xx.xx.xx)' can't be established.
RSA key fingerprint is ce:52:b1:64:6c:0c:23:25:1d:9c:22:17:7b:66:0b:68.
Are you sure you want to continue connecting (yes/no)? yes
root.83.194's password:
root.83.191's password:
Migration: [ 31 %]^Cerror: internal error received hangup / error event on socket

4.Check the guest's status on the source host,the guest was in paused status
# virsh list
 Id    Name                           State
----------------------------------------------------
 7     rhelguest1                     paused

5.The guest won't be paused while cancel the migration on the source host directly

Actual results:
The running Guest was paused while cancel the migration on the third machine which connect the source machine with the remote access
Expected results:
The guest should keep running status

Comment 1 zhenfeng wang 2013-07-11 03:39:56 UTC
The guest won't always be paused in rhel7,it always happens while the migration was finished more then 90%,just like

# virsh -c qemu+ssh://xx.xx.xx.xx/system  migrate --live rhel73 qemu+ssh://yy.yy.yy.yy/system --verbose --unsafe
root.xx.xx's password: 
root.yy.yy's password: 
Migration: [ 96 %]^Cerror: internal error received hangup / error event on socket
error: One or more references were leaked after disconnect from the hypervisor
root.xx.xx's password: 
error: Reconnected to the hypervisor

Comment 4 Peter Krempa 2013-09-03 08:04:58 UTC
Fixed upstream with:

commit b46c4787dde79b015dad67dedda4ccf6ff1a3082
Author: Peter Krempa <pkrempa>
Date:   Thu Aug 29 15:18:20 2013 +0200

    virsh-domain: Avoid killing ssh transport tunnels when cancelling job
    
    The vshWatchJob function registers a SIGINT handler that is used to
    abort the active job and does not terminate virsh. Unfortunately, this
    breaks when using the ssh transport as SIGINT is sent to the foreground
    process group including the ssh transport processes which terminate.
    This breaks the connection and migration is left in a insane state.
    
    With this patch the terminal is modified to ignore key binding that
    sends SIGINT and does the handling manually.
    
    Resoves: https://bugzilla.redhat.com/show_bug.cgi?id=983348

commit ebef68936396f7eab077e883ac48c4ce0508afa2
Author: Peter Krempa <pkrempa>
Date:   Thu Aug 29 10:36:00 2013 +0200

    virsh: Remember terminal state when starting and add helpers
    
    This patch adds instrumentation to allow modification of config of the
    terminal in virsh and successful reset of the state afterwards.
    
    The added helpers allow to disable receiving of SIGINT when pressing the
    key sequence (Ctrl+C usualy). This normally sends SIGINT to the
    foreground process group which kills ssh processes used for transport of
    the data.

commit 8c725cc10daa666d47ab5a4f2ccc0b196ab608d8
Author: Peter Krempa <pkrempa>
Date:   Mon Aug 26 12:31:51 2013 +0200

    virsh-domain: rename print_job_progress to vshPrintJobProgress

Comment 7 zhengqin 2014-08-26 09:09:21 UTC
Verify this issue with libvirt-1.2.7-1.el7.x86_64:


1. Set setenforce 1 && virt_use_nfs 1 (on both source and target)

2.prepare a guest which the image file is on the NFS server,and mount the nfs server on both source and target

3. start the guest on the source machine

4. Start the migrataion on the third machine, and cancel the migration during about 96%

[root@rhel7-c ~]# virsh -c qemu+ssh://10.66.6.xx/system migrate rhel7 --live qemu+ssh://10.66.4.xx/system --verbose
root.6.xx's password: 
root.4.xx's password: 



Migration: [  1 %]
Migration: [ 61 %]
Migration: [ 73 %]
Migration: [ 73 %]^[[A
Migration: [ 74 %]
Migration: [ 76 %]
Migration: [ 81 %]
Migration: [ 95 %]
Migration: [ 96 %]error: operation aborted: migration job: canceled by client

4. The guest is still in Running status on source side, and not displayed on target side.

Comment 8 vivian zhang 2014-11-10 08:37:08 UTC
Hello, peter
when I do regression for this bug on rhel7.1, I found that after cancel the migration, the reported error still not accurate, but guest is still in running status. Could you please help me check whether it is a known issue for this bug?

Version-Release number of selected component (if applicable):
libvirt-1.2.8-6.el7.x86_64
qemu-kvm-rhev-2.1.2-6.el7.x86_64
kernel-3.10.0-195.el7.x86_64


How reproducible:
100%

Steps to Reproduce:

1. set setenforce 1 && virt_use_nfs 1 (on both source and target)

2.prepare a guest which the image file is on the NFS server,and mount the nfs server on both source and target
start the guest on the source machine
# virsh list
 Id    Name                           State
----------------------------------------------------
80    vm2                            running

3. start migration on the third machine, and ctrl+c to cancel the migration
# virsh -c qemu+ssh://10.66.7.206/system migrate vm2 --live qemu+ssh://10.66.6.205/system --verbose
root.7.206's password: 
root.6.205's password: 
Migration: [  3 %]^Cerror: internal error: received hangup / error event on socket
root.7.206's password: 
error: Reconnected to the hypervisor

4. check the guest status again
# virsh list
 Id    Name                           State
----------------------------------------------------
 80    vm2                            running

you can see that after ctrl+c the migration, the reported error seems still not accurate, and meanwhile to ask me input the source host password again. 
I think it would better to show the result as "error: operation aborted: migration job: canceled by client"

Hope for your reply, thanks
vivian zhang

Comment 9 Peter Krempa 2014-11-10 12:56:48 UTC
(In reply to vivian zhang from comment #8)

> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 
> 1. set setenforce 1 && virt_use_nfs 1 (on both source and target)
> 
> 2.prepare a guest which the image file is on the NFS server,and mount the
> nfs server on both source and target
> start the guest on the source machine
> # virsh list
>  Id    Name                           State
> ----------------------------------------------------
> 80    vm2                            running
> 
> 3. start migration on the third machine, and ctrl+c to cancel the migration
> # virsh -c qemu+ssh://10.66.7.206/system migrate vm2 --live

Did you also upgrade libvirt on the machine running this command? As the issue was caused on the client side, it's necessery to specially upgrade the host running the virsh command.

To make sure, please run "virsh version"

Comment 10 vivian zhang 2014-11-11 01:40:08 UTC
(In reply to Peter Krempa from comment #9)
> (In reply to vivian zhang from comment #8)
> 
> > How reproducible:
> > 100%
> > 
> > Steps to Reproduce:
> > 
> > 1. set setenforce 1 && virt_use_nfs 1 (on both source and target)
> > 
> > 2.prepare a guest which the image file is on the NFS server,and mount the
> > nfs server on both source and target
> > start the guest on the source machine
> > # virsh list
> >  Id    Name                           State
> > ----------------------------------------------------
> > 80    vm2                            running
> > 
> > 3. start migration on the third machine, and ctrl+c to cancel the migration
> > # virsh -c qemu+ssh://10.66.7.206/system migrate vm2 --live
> 
> Did you also upgrade libvirt on the machine running this command? As the
> issue was caused on the client side, it's necessery to specially upgrade the
> host running the virsh command.
> 
> To make sure, please run "virsh version"

hi, Peter
the libvirt version has been updated to as below

# virsh version
Compiled against library: libvirt 1.2.8
Using library: libvirt 1.2.8
Using API: QEMU 1.2.8
Running hypervisor: QEMU 2.1.2

Comment 11 Peter Krempa 2014-11-12 15:54:00 UTC
(In reply to vivian zhang from comment #10)
> (In reply to Peter Krempa from comment #9)
> > (In reply to vivian zhang from comment #8)

...

> 
> hi, Peter
> the libvirt version has been updated to as below
> 
> # virsh version
> Compiled against library: libvirt 1.2.8
> Using library: libvirt 1.2.8
> Using API: QEMU 1.2.8
> Running hypervisor: QEMU 2.1.2

In that case this should not happen. Can you please provide debug logs from both the client and the daemon that would show the issue happening.

Comment 12 vivian zhang 2014-11-13 02:09:28 UTC
(In reply to Peter Krempa from comment #11)
> (In reply to vivian zhang from comment #10)
> > (In reply to Peter Krempa from comment #9)
> > > (In reply to vivian zhang from comment #8)
> 
> ...
> 
> > 
> > hi, Peter
> > the libvirt version has been updated to as below
> > 
> > # virsh version
> > Compiled against library: libvirt 1.2.8
> > Using library: libvirt 1.2.8
> > Using API: QEMU 1.2.8
> > Running hypervisor: QEMU 2.1.2
> 
> In that case this should not happen. Can you please provide debug logs from
> both the client and the daemon that would show the issue happening.


hi,Peter
I captured 3 logs:
1. use debug command on the third machine to get client log with name client1113.log
# LIBVIRT_DEBUG=1 virsh -c qemu+ssh://10.66.7.206/system migrate rhel6new --live qemu+ssh://10.66.6.205/system --verbose 

2. the source and target libvirtd.log with setting log_level=1

please check firstly, anything unclear, please contact me.

thanks

vivianzhang

Comment 13 vivian zhang 2014-11-13 02:11:34 UTC
Created attachment 956908 [details]
client debug log

Comment 14 vivian zhang 2014-11-13 02:18:04 UTC
Created attachment 956909 [details]
libvirtd source tar log

Comment 15 vivian zhang 2014-11-13 02:18:49 UTC
Created attachment 956910 [details]
libvirtd target tar log

Comment 16 vivian zhang 2014-12-23 02:47:36 UTC
I can produce this bug on build
libvirt-1.1.1-29.el7.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.9.x86_64

I could not reproduce the issue described as comment8 anymore,  so verify it on the latest build
libvirt-1.2.8-11.el7.x86_64
qemu-kvm-rhev-2.1.2-17.el7.x86_64

verify steps:

1. prepare a migration env with img mount with nfs server on both source and target host

2. setenforce 1 and virt_us_nfs on

3. prepare the third machine, do migration, cancel the process nearly 90% 
# virsh -c qemu+ssh://xx.xx.xx.xx/system migrate rhel7 --live qemu+ssh://xx.xx.xx.xx/system --verbose
root.xx.xx's password: 
root.xx.xx's password: 
Migration: [ 45 %]
Migration: [ 47 %]
Migration: [ 55 %]
Migration: [ 62 %]
Migration: [ 71 %]
Migration: [ 82 %]
Migration: [ 88 %]
Migration: [ 90 %]
Migration: [ 92 %]
Migration: [ 94 %]
Migration: [ 95 %]^Cerror: operation aborted: migration job: canceled by client

4. check the guest on source host, still running, and works well
# virsh list
 Id    Name                           State
----------------------------------------------------
 10    rhel7                          running


5. configure the guest with spice connection, open it using virt-viewer, repeat step 3-4, get the same result
# virsh -c qemu+ssh://xx.xx.xx.xx/system migrate rhel7 --live qemu+ssh://xx.xx.xx.xx/system --verbose
root.xx.xx's password: 
root.xx.xx's password: 
Migration: [ 95 %]^Cerror: operation aborted: migration job: canceled by client


move to verified

Comment 18 errata-xmlrpc 2015-03-05 07:20:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html


Note You need to log in before you can comment on or make changes to this bug.