Bug 1131755

Summary: Error info instead of success report outputs when SIGINT to interrupt a "virsh save" process but "virsh save" job finished successfully
Product: Red Hat Enterprise Linux 7 Reporter: zhengqin <zsong>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: dyuan, mzhan, rbalakri, zhwang, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.2.17-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 05:47:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description zhengqin 2014-08-20 02:55:50 UTC
Description of problem:

This issue had been confirmed with "Jiri Denemark" <jdenemar> , here is mail info:
--------------------------------------------------------------------------
++My Question:
On rhel7,  press ctrl+c to interrupt the save job, the guest will be shutoff and the error will show:
"error: Requested operation is not valid: domain is not running", this is not right.

on rhel6.6, press ctrl+c to interrupt the save job, the guest is still running and the error will show:
"error: Failed to save domain rhel6 to savefile
error: operation aborted: domain save job: canceled by client", this is right. 


++Jiri's reply:
This would suggest that saving was very fast and finished before you
tried to cancel it. Is the savefile correct at this point (i.e., can you
restore the domain from it?

++My Response:
Yes, I could restore domain from savefile.
But, normally, If "virsh save" process finished successfully, A string "Domain xx saved to savefile" should output.
For example:
-----------------------------------------------------------
[root@rhel7 ~]# virsh list
 Id    Name                           State
------------------------------------------------
 76    rhel6u5_D                      running
[root@rhel7 ~]# virsh  save rhel6u5_D  savefile

Domain rhel6u5_D saved to savefile
-----------------------------------------------------------
Currently, if press ctrl+c to interrupt the save job, the string "Domain xx saved to savefile" was not output.
I guess whether the SIGINT signal interrupts after the save job finished but before the "save process" finished ?
That is to say, savefile was generated successfully, but above error outputs instead of "Domain xx saved to savefile" output.
The "error: Requested operation is not valid: domain is not running" seems unreasonable and useless.


++Jiri's reply:
I see, that's probably a bug in virsh which reports the error from
virDomainJobAbort instead of reporting success from the original API

---------------------------------------------------------------------------


Version-Release number of selected component (if applicable):
libvirt-1.2.7-1.el7.x86_64
qemu-kvm-rhev-2.1.0-1.el7.x86_64
kernel-3.10.0-142.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Prepare a running guest named "rhel6-new", execute the following commands:

[root@rhel7-a ~]# virsh list
   Id    Name                           State
----------------------------------------------------
   13    rhel6-new                      running

2. After running virsh save command and wait a half of or a second, Press keys ctrl+c to interrupt the save job
[root@rhel7-a ~]# virsh save rhel6-new /tmp/rhel6.save
^Cerror: Requested operation is not valid: domain is not running

3. Check guest's status.
[root@rhel7-a ~]# ^C
[root@rhel7-a ~]# virsh list
   Id    Name                           State
----------------------------------------------------

[root@rhel7-a ~]# virsh list --all
   Id    Name                           State
----------------------------------------------------
   -     rhel6-new                      shut off




Actual results:
1. The guest is shutoff and error msg is not right.
2. save file /tmp/rhel6.save generated successfully and could restore from it.


Expected results:
1. Success report "Domain rhel6-new saved to savefile" should output since save file generated successfully.
OR
2. The guest should be kept running and the error msg should just like:
"error: Failed to save domain rhel6-new to savefile
error: operation aborted: domain save job: canceled by client"


Additional info:
This issue does not occur on rhel6.6


on rhel6.6:
[root@rhel6 ~]# virsh list
   Id    Name                           State
----------------------------------------------------
   11    rhel6                          running

[root@rhel6 ~]# virsh save rhel6 savefile
^Cerror: Failed to save domain rhel6 to savefile
error: operation aborted: domain save job: canceled by client

[root@rhel6 ~]# ^C
[root@rhel6 ~]# virsh list
   Id    Name                           State
----------------------------------------------------
   11    rhel6                          running

Comment 3 Jiri Denemark 2015-06-02 12:51:13 UTC
Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2015-June/msg00035.html

Comment 4 Jiri Denemark 2015-06-03 08:26:30 UTC
Pushed upstream as v1.2.16-52-ge9507fd:

commit e9507fd41c9c6b73093cc0a4ce568bf0d8204854
Author: Jiri Denemark <jdenemar>
Date:   Mon Jun 1 15:06:16 2015 +0200

    virsh: Fix Ctrl-C behavior when watching a job
    
    When watching a job (save, managedsave, dump, migrate) virsh spawns a
    thread to call the appropriate API and waits for the result while
    watching for interruption signals (SIGINT, Ctrl-C on the terminal).
    Whenever such signal is caught, virsh calls virDomainAbortJob, stops
    waiting for the job, and returns the result of virDomainAbortJob.
    
    This is wrong because the job might have finished in the meantime or it
    might have been cancelled by someone else and virsh would just report
    the failure to abort the job. However, we are not interested in the
    virDomainAbortJob's result at all, we need to keep waiting for the main
    job to finish and report its result instead.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1131755
    
    Signed-off-by: Jiri Denemark <jdenemar>

Comment 6 zhenfeng wang 2015-07-08 06:08:25 UTC
Could reproduce this issue following the bug description, the following steps were the verification steps with libvirt-1.2.17-1.el7.x86_64

1.Prepare a running guest
# virsh list 
 Id    Name                           State
----------------------------------------------------
 4     rhel7.0                        running


2.Save the guest, the guest will give clear  info if it successfully abort the job or the job have finish while you abort the job

<1. abort the job successfully
# virsh save rhel7.0 rhel7.0.save
^Cerror: Failed to save domain rhel7.0 to rhel7.0.save
error: operation aborted: domain save job: canceled by client

<2.The job has been finished during you abort the job, currently it will show you save job has been finished
# virsh save rhel7.0 rhel7.0.save
^C
Domain rhel7.0 saved to rhel7.0.save


3.Do managedsave with the guest, will get the same result with save

<1. abort the job successfully
# virsh managedsave rhel7.0
^Cerror: Failed to save domain rhel7.0 state
error: operation aborted: domain save job: canceled by client


<2.The job has been finished during you abort the job, currently it will show you managedsave job has been finished
# virsh managedsave rhel7.0
^C^C
Domain rhel7.0 state saved by libvirt


4.Do migration with the guest, will get the similiar result with save

<1. abort the job successfully
# virsh migrate --live rhel7.0 qemu+ssh://10.66.6.6/system --verbose --unsafe
root.6.6's password: 
Migration: [ 89 %]^Cerror: operation aborted: migration job: canceled by client

<2.The job has been finished during you abort the job, currently it will show you migration job has been finished
# virsh migrate --live rhel7.0 qemu+ssh://10.66.6.6/system --verbose --unsafe
root.6.6's password: 
Migration: [100 %]^C

5.Do dump with guest, will get the similiar result with save 
<1. abort the job successfully
# virsh dump rhel7.0 rhel7.dump --crash
^C^Cerror: Failed to core dump domain rhel7.0 to rhel7.dump
error: operation aborted: domain core dump job: canceled by client


<2.The job has been finished during you abort the job, currently it will show you dump job has been finished
[root@zhwangrhel71 ~]# virsh dump rhel7.0 rhel7.dump --crash
^C^C
Domain rhel7.0 dumped to rhel7.dump

According to the upper steps, mark this bug verifed

Comment 8 errata-xmlrpc 2015-11-19 05:47:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2202.html