Bug 799478

Summary: libvirt emits inappropriate error when using domabortjob to abort stuck migration
Product: Red Hat Enterprise Linux 6 Reporter: Jiri Denemark <jdenemar>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 6.3CC: abaron, acathrow, berrange, dallan, danken, dnaori, dyuan, hateya, mgoldboi, mjenner, mzhan, rwu, syeghiay, vbian, veillard, weizhan, ykaul, yupzhang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.9.10-15.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 725373 Environment:
Last Closed: 2012-06-20 06:49:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 669581, 725373    
Bug Blocks: 723198, 773650, 773651, 773677, 773696    
Attachments:
Description Flags
the backtrace for libvirtd hang none

Description Jiri Denemark 2012-03-02 17:39:41 UTC
+++ This bug was initially created as a clone of Bug #725373 +++

Created attachment 515009 [details]
libvirtd log

Description of problem:
when using domabortjob to abort stuck migration (destination host blocked during migration via iptables), the migration command reports inappropriate error.

--- Additional comment from weizhan on 2012-01-09 22:31:35 EST ---

Test with 
kernel-2.6.32-223.el6.x86_64
qemu-kvm-0.12.1.2-2.213.el6.x86_64
libvirt-0.9.9-1.el6.x86_64

1. Do migration
#virsh migrate --live --p2p kvm-rhel6u2-x86_64-new  qemu+tls://10.66.83.197/system
At the same time, on other console do
#iptables -A OUTPUT -d 10.66.83.197 -j REJECT
Then do
#virsh domjobabort kvm-rhel6u2-x86_64-new

The migration job will not return immediately, but will wait for a while then return 
error: An error occurred, but the cause is unknown

--- Additional comment from weizhan on 2012-02-29 06:49:56 EST ---

still can reproduce the phenomemon on comment 14 on
qemu-kvm-0.12.1.2-2.232.el6.x86_64
kernel-2.6.32-225.el6.x86_64
libvirt-0.9.10-3.el6.x86_64

so re-assign this bug

--- Additional comment from jdenemar on 2012-02-29 10:46:32 EST ---

Oh I see, what it is. virsh domjobabort correctly cancels the migration and source libvirtd tries to call Finish API on destination libvirtd to tell it about the abortion. Since all packets to destination libvirtd are discarded, the API waits for some time (~30 seconds) until the broken connection is detected and then source libvirtd correctly resumes the domain and reports the migration API finished. So far, everything worked as expected. However, it seems the error message was eaten somewhere on the way and virsh has nothing useful to report.

--- Additional comment from jdenemar on 2012-03-02 12:30:57 EST ---

Actually the eaten error message is a minor issue which doesn't need to block this bug. I'm moving this bz back to ON_QA and I'll create a new bug requesting the error message to be fixed.

Comment 1 Jiri Denemark 2012-05-11 16:21:54 UTC
This bug was fixed as part of a patch series required to fix bug 807907. Namely by the first patch in the series: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-April/msg00856.html

Comment 3 weizhang 2012-05-16 10:13:27 UTC
test on 
qemu-kvm-0.12.1.2-2.292.el6.x86_64
kernel-2.6.32-269.el6.x86_64
libvirt-0.9.10-20.el6.x86_64

For migration and domjobabort 1 time , wait several minute, it will report error
error: operation aborted: migration job: canceled by client

But for several times abort, libvirt will hang. Do I need to file a new bug?

Comment 4 weizhang 2012-05-17 06:07:00 UTC
Created attachment 585107 [details]
the backtrace for libvirtd hang

Comment 5 Jiri Denemark 2012-05-17 11:56:33 UTC
Oh, as confirmed with another backtrace, it looks like we have a reproducer for bug 821468. I'll move the backtraces and investigation there and leave this bug for the error message fix.

Comment 6 weizhang 2012-05-18 08:45:33 UTC
For the new issue will continue followed on bug 821468, so verify PASS for this bug as comment 3 shows

Comment 8 errata-xmlrpc 2012-06-20 06:49:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0748.html