Bug 1565552

Summary: Allow monitoring of migration done during external snapshot
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Yanqiu Zhang <yanqzhan>
Component: libvirtAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED DEFERRED QA Contact: Yanqiu Zhang <yanqzhan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: ---CC: dyuan, fjin, jdenemar, lmen, lrotenbe, mzhan, pkrempa, rbalakri, xuzhang, yalzhang, yanqzhan
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-18 14:09:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1749284    
Attachments:
Description Flags
gstack_n_libvirtd_log none

Description Yanqiu Zhang 2018-04-10 09:48:18 UTC
Description of problem:
"domjobinfo/domjobabort" hangs if they are called at the beginning of snapshot creation 

Version-Release number of selected component (if applicable):
libvirt-3.9.0-14.el7_5.2.x86_64
qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64

How reproducible:
99%

Steps to Reproduce:
1. In terminal 1, abort domain job(or query domain job status by domjobinfo) in a loop:
# while true; do date; virsh domjobabort avocado-vt-vm1; done

2.In terminal 2, start a guest and do snapshot:
# virsh start avocado-vt-vm1
# virsh snapshot-create-as avocado-vt-vm1 sp1 --memspec file=/tmp/foo.image
3.Check the output in terminal 1 during snapshot, domjobabort(domjobinfo) hangs at 19:55:06 until snapshot creation is finished:

Mon Apr  9 19:55:05 CST 2018
error: Requested operation is not valid: no job is active on the domain

Mon Apr  9 19:55:06 CST 2018
error: Requested operation is not valid: no job is active on the domain

Mon Apr  9 19:55:18 CST 2018
error: Requested operation is not valid: no job is active on the domain

Mon Apr  9 19:55:19 CST 2018
error: Requested operation is not valid: no job is active on the domain

4.Check terminal 2, snapshot is created successfully:
# virsh snapshot-create-as avocado-vt-vm1 sp1 --memspec file=/tmp/foo.image
Domain snapshot sp1 created


Actual results:
As step3, domjobabort(domjobinfo) hangs until snapshot creation is finished. And no domjobinfo output or cancel job succeed.

Expected results:
"domjobabort/domjobinfo" should not hang, should functions well: print domain job info, or cancel job successfully.

Additional info:

Comment 1 Yanqiu Zhang 2018-04-10 09:53:23 UTC
Created attachment 1419792 [details]
gstack_n_libvirtd_log

Comment 2 Peter Krempa 2018-04-10 10:19:17 UTC
The long time operation which can be monitored is the migration of memory to file to capture the memory image.

The disk snapshot creation is synchronous and thus can't be monitored.

Comment 3 yalzhang@redhat.com 2018-08-20 03:32:58 UTC
Test on below packages with scenario as below, the result is a little different.

# rpm -q libvirt qemu-kvm-rhev
libvirt-4.5.0-6.el7.x86_64
qemu-kvm-rhev-2.12.0-10.el7.x86_64

Scenario 1: create snapshot with "--memspec", the domjobinfo report error

1. On one terminal, start a guest, then create snapshot by below command,
at the same time, open another terminal to check the job info:

[terminal 1] # virsh list
 Id    Name                           State
----------------------------------------------------
 38    domain3                        running
[terminal 1]# virsh snapshot-create-as domain3 sp2 --memspec file=/tmp/foo2.image

On another terminal in the same time, run the domjobinfo command, it returned immediately but report error:
[terminal 2]# virsh domjobinfo domain3  
error: internal error: invalid job statistics type

Scenario 2: create snapshot without "--memspec", the domjobinfo hang until the job finished and report none
[terminal 1]# virsh snapshot-create-as domain1 sp3
Domain snapshot sp3 created

[terminal 2]# virsh domjobinfo domain1 ====> hang util job finished, report None
Job type:         None        


Scenario 3: create snapshot with "--memspec", the domjobabort works
[terminal 1]# virsh snapshot-create-as domain1 sp6 --memspec file=/tmp/foo9.image
error: operation aborted: snapshot job: canceled by client

[terminal 2]# virsh domjobabort domain1;  ===> return immediately


Scenario 4: create snapshot without "--memspec", the domjobabort hang and do not work
[terminal 1]# time virsh snapshot-create-as domain1 sp9
Domain snapshot sp9 created

real	0m16.862s
user	0m0.019s
sys	0m0.017s

[terminal 2]# time virsh domjobabort domain1; ==> hang until job finished
error: Requested operation is not valid: no job is active on the domain


real	0m16.579s
user	0m0.017s
sys	0m0.018s

Comment 4 Liran Rotenberg 2020-01-19 12:15:06 UTC
Hi,
Working on BZ 1749284 I encountered error: internal error: invalid job statistics type.
It was on:
libvirt-libs-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-nodedev-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-nwfilter-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-storage-scsi-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-config-network-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-bash-completion-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-storage-core-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-config-nwfilter-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-storage-logical-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-storage-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-lock-sanlock-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-network-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-kvm-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
python3-libvirt-4.5.0-2.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-storage-disk-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-storage-mpath-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-interface-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-admin-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-qemu-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-storage-rbd-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-storage-iscsi-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-daemon-driver-secret-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64
libvirt-client-4.5.0-24.3.module_el8.0.0+189+f9babebb.x86_64

Now I upgraded the libvirt to:
libvirt-daemon-driver-storage-core-5.6.0-6.el8.x86_64
libvirt-daemon-driver-storage-mpath-5.6.0-6.el8.x86_64
libvirt-daemon-driver-interface-5.6.0-6.el8.x86_64
libvirt-daemon-driver-qemu-5.6.0-6.el8.x86_64
libvirt-daemon-driver-secret-5.6.0-6.el8.x86_64
libvirt-daemon-driver-storage-gluster-5.6.0-6.el8.x86_64
libvirt-daemon-driver-network-5.6.0-6.el8.x86_64
libvirt-daemon-driver-storage-iscsi-5.6.0-6.el8.x86_64
libvirt-daemon-driver-storage-scsi-5.6.0-6.el8.x86_64
libvirt-daemon-driver-nwfilter-5.6.0-6.el8.x86_64
libvirt-daemon-config-network-5.6.0-6.el8.x86_64
libvirt-admin-5.6.0-6.el8.x86_64
python3-libvirt-5.6.0-2.el8.x86_64
libvirt-libs-5.6.0-6.el8.x86_64
libvirt-daemon-kvm-5.6.0-6.el8.x86_64
libvirt-daemon-driver-storage-iscsi-direct-5.6.0-6.el8.x86_64
libvirt-daemon-driver-storage-logical-5.6.0-6.el8.x86_64
libvirt-client-5.6.0-6.el8.x86_64
libvirt-daemon-driver-nodedev-5.6.0-6.el8.x86_64
libvirt-daemon-5.6.0-6.el8.x86_64
libvirt-daemon-driver-storage-disk-5.6.0-6.el8.x86_64
libvirt-daemon-driver-storage-5.6.0-6.el8.x86_64
libvirt-bash-completion-5.6.0-6.el8.x86_64
libvirt-daemon-driver-storage-rbd-5.6.0-6.el8.x86_64
libvirt-daemon-config-nwfilter-5.6.0-6.el8.x86_64
libvirt-lock-sanlock-5.6.0-6.el8.x86_64

# virsh -r domjobinfo 26
Job type:         Unbounded   
Operation:        Snapshot    
Time elapsed:     530          ms
Data processed:   92.460 MiB
Data remaining:   11.131 GiB
Data total:       13.071 GiB
Memory processed: 92.460 MiB
Memory remaining: 11.131 GiB
Memory total:     13.071 GiB
Memory bandwidth: 292.233 MiB/s
Dirty rate:       0            pages/s
Page size:        4096         bytes
Iteration:        1           
Postcopy requests: 0           
Constant pages:   485946      
Normal pages:     22558       
Normal data:      88.117 MiB
Expected downtime: 300          ms
Setup time:       131          ms

Is it fixed on 5.6.0-6? Can you confirm and maybe close the bug?

Comment 5 Yanqiu Zhang 2020-01-20 03:35:38 UTC
(In reply to Liran Rotenberg from comment #4)
> Is it fixed on 5.6.0-6? Can you confirm and maybe close the bug?

Hi, 
It still reproduces per latest test result on:
libvirt-daemon-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64
qemu-kvm-4.1.0-23.module+el8.1.1+5467+ba2d821b.x86_64

Steps:
1.  # virsh snapshot-create-as avocado-vt-vm1 sp5 --memspec file=/tmp/foo.image5
Domain snapshot sp5 created

# while true; do date>>domjobinfo.txt; virsh -r domjobinfo avocado-vt-vm1>>domjobinfo.txt; done
Sun Jan 19 22:15:07 EST 2020
Job type:         None
...
Sun Jan 19 22:15:07 EST 2020
Job type:         None
              <====hang 7s here
Sun Jan 19 22:15:14 EST 2020
Job type:         None
...
Sun Jan 19 22:15:14 EST 2020
Job type:         None

Sun Jan 19 22:15:15 EST 2020
Job type:         None
...

2.# virsh snapshot-create-as avocado-vt-vm1 sp8 --memspec file=/tmp/foo.image8
Domain snapshot sp8 created
# while true; do date>>domjobabort.txt; virsh domjobabort avocado-vt-vm1>>domjobabort.txt; done
Sun Jan 19 22:30:05 EST 2020
error: Requested operation is not valid: no job is active on the domain
...
Sun Jan 19 22:30:06 EST 2020
       <====hang 5s
Sun Jan 19 22:30:11 EST 2020
...
Sun Jan 19 22:30:12 EST 2020


Additional info: 
Your error "internal error: invalid job statistics type" is an issue fixed in bz1688774.

Comment 6 Jaroslav Suchanek 2020-02-18 14:09:15 UTC
This bug was closed deferred as a result of bug triage.

Please reopen if you disagree and provide justification why this bug should
get enough priority. Most important would be information about impact on
customer or layered product. Please indicate requested target release.