Bug 1688774 - Fails to query domjobinfo when do snapshot
Summary: Fails to query domjobinfo when do snapshot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.0
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: rc
: 8.0
Assignee: Jiri Denemark
QA Contact: yisun
URL:
Whiteboard:
Depends On:
Blocks: 1690703
TreeView+ depends on / blocked
 
Reported: 2019-03-14 12:12 UTC by Fangge Jin
Modified: 2020-11-14 06:01 UTC (History)
8 users (show)

Fixed In Version: libvirt-5.0.0-7.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1690703 (view as bug list)
Environment:
Last Closed: 2019-05-29 16:05:30 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:1293 0 None None None 2019-05-29 16:05:42 UTC

Description Fangge Jin 2019-03-14 12:12:20 UTC
Description of problem:
Fails to query domjobinfo when do snapshot

Version-Release number of selected component (if applicable):
libvirt-5.0.0-6.module+el8+2860+4e0fe96a.x86_64
qemu-kvm-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Start a vm

2.Do snapshot
# virsh snapshot-create-as df899f5c-db94-48b2-867a-e0c266b59b7a sp8 --memspec file=/tmp/foo.image.8

3.Query domain job info during snapshot:
# virsh domjobinfo df899f5c-db94-48b2-867a-e0c266b59b7a
error: internal error: invalid job statistics type

4.After snapshot completes, query domain job info again:
# virsh domjobinfo df899f5c-db94-48b2-867a-e0c266b59b7a --completed
error: internal error: invalid job statistics type


Actual results:
As step 3&4, fails to query domain job info for snapshot

Expected results:
Can succeed to query domain job info for snapshot

Additional info:

Comment 1 Jiri Denemark 2019-03-14 12:55:09 UTC
Hmm, looks like the snapshot code does not properly set jobInfo->statsType:

(gdb) p jobInfo
$1 = {
    status = QEMU_DOMAIN_JOB_STATUS_COMPLETED,
    operation = VIR_DOMAIN_JOB_OPERATION_SNAPSHOT,
    started = 1552567278659,
    stopped = 1552567278667,
    sent = 0,
    received = 0,
    timeElapsed = 30,
    timeDelta = 0,
    timeDeltaSet = false,
    statsType = QEMU_DOMAIN_JOB_STATS_TYPE_NONE,
    stats = {
        mig = {
            status = 6,
            total_time = 17,
            downtime_set = true,
            downtime = 22,
            setup_time_set = true,
            setup_time = 1,
            ram_transferred = 3510941,
            ram_remaining = 0,
            ram_total = 168370176,
            ram_bps = 228062937,
            ram_duplicate_set = true,
            ram_duplicate = 40339,
            ram_normal = 767,
            ram_normal_bytes = 3141632,
            ram_dirty_rate = 0,
            ram_page_size = 4096,
            ram_iteration = 2,
            ...
        },
        dump = {
            status = 6,
            completed = 17,
            total = 1
        }
    },
    mirrorStats = {
        transferred = 0,
        total = 0
    }
}

Comment 2 Jiri Denemark 2019-03-14 15:29:39 UTC
The patch was sent upstream for review: https://www.redhat.com/archives/libvir-list/2019-March/msg00971.html

Comment 3 Jiri Denemark 2019-03-15 08:47:49 UTC
This bug is fixed upstream by

commit 1c2a9260e865af8ad7dde9cdd21515800d1864e7
Refs: v5.1.0-237-g1c2a9260e8
Author:     Jiri Denemark <jdenemar>
AuthorDate: Thu Mar 14 15:33:26 2019 +0100
Commit:     Jiri Denemark <jdenemar>
CommitDate: Fri Mar 15 09:39:19 2019 +0100

    qemu: Set job statsType for external memory snapshot

    Any job which is able to provide statistics that can be queried via
    virDomainGetJob{Stats,Info} has to set an appropriate statsType.

    Without a proper statsType qemuDomainJobInfoToParams and
    qemuDomainJobInfoToInfo have no idea what statistics should be sent to
    the API caller.

    https://bugzilla.redhat.com/show_bug.cgi?id=1688774

    Signed-off-by: Jiri Denemark <jdenemar>
    Reviewed-by: Erik Skultety <eskultet>

Comment 6 yisun 2019-03-20 05:08:16 UTC
Meet a new problem after the fix, pls help to confirm

=========================
Do following thing at the same time in different consoles:
=========================
a. (IN VM) generate some data to make the snapshot process last long:
# dd if=/dev/urandom of=/tmp/1.file bs=1K count=5000000

b. (IN HOST CONSOLE 1) create the snapshot as follow:
[root@hp-z220-02 tmp]# virsh snapshot-create-as  61a1423f-5591-4b36-94ad-be0d982c34e5 s1 --memspec file=/tmp/snap.1

c. (IN HOST CONSOLE 2) check the domjobinfo:
[root@hp-z220-02 ~]# time virsh domjobinfo 61a1423f-5591-4b36-94ad-be0d982c34e5


=========================
RESULT:
=========================
at above step c, the cmd just hangs there, will return something after step b finished, as follow:

b1:
[root@hp-z220-02 tmp]# virsh snapshot-create-as  61a1423f-5591-4b36-94ad-be0d982c34e5 s1 --memspec file=/tmp/snap.1
Domain snapshot s1 created

c1:
[root@hp-z220-02 ~]# time virsh domjobinfo 61a1423f-5591-4b36-94ad-be0d982c34e5
Job type:         None        

real	0m27.377s
user	0m0.011s
sys	0m0.005s

=========================
Problem:
=========================
During taking snapshot, the domjobinfo should display the info about the running job, something like following:
Tue Apr 10 07:03:56 EDT 2018
Job type:         Unbounded   
Operation:        Snapshot
Time elapsed:     1027         ms
Data processed:   110.118 MiB
Data remaining:   151.859 MiB
Data total:       1.126 GiB
Memory processed: 110.118 MiB
Memory remaining: 151.859 MiB
Memory total:     1.126 GiB
Memory bandwidth: 223.485 MiB/s
Dirty rate:       0            pages/s
Iteration:        1           
Constant pages:   228613      
Normal pages:     27634       
Normal data:      107.945 MiB
Expected downtime: 300          ms
Setup time:       8            ms

Comment 7 Fangge Jin 2019-03-20 05:42:21 UTC
Hi yisun

The "hang" problem you met is a different issue and is tracked in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1565552#c2

Actually you can only get the domjobinfo in memory snapshot phase which is very short, but you could increase the guest memory size to make the phase longer. Note: adding memory load in guest will not help, because guest is paused at the beginning.

And note that you can monitor libvirtd log when do the test, after you see libvirtd sends command "migrate" to qemu monitor, you could try to query domjobinfo immediately.

Comment 8 yisun 2019-03-20 11:15:12 UTC
(In reply to Fangge Jin from comment #7)
> Hi yisun
> 
> The "hang" problem you met is a different issue and is tracked in this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1565552#c2
> 
> Actually you can only get the domjobinfo in memory snapshot phase which is
> very short, but you could increase the guest memory size to make the phase
> longer. Note: adding memory load in guest will not help, because guest is
> paused at the beginning.
> 
> And note that you can monitor libvirtd log when do the test, after you see
> libvirtd sends command "migrate" to qemu monitor, you could try to query
> domjobinfo immediately.

Thx for the info
Verified with:
libvirt-5.0.0-7.module+el8+2887+effa3c42.x86_64

1. in vm run:
[root@localhost ~]# dd if=/dev/urandom of=/tmp/1.file bs=1K count=5000000

2. in host console 1 run:
[root@localhost ~]# virsh snapshot-create-as 5d9e1bfb-9342-4530-82b1-f8c2dac70e8d s1 --memspec file=/tmp/snap.1

3. in host console 2 run:
[root@localhost ~]# virsh domjobinfo 5d9e1bfb-9342-4530-82b1-f8c2dac70e8d

4. check console 1 until snapshot finished:
[root@localhost ~]# virsh snapshot-create-as 5d9e1bfb-9342-4530-82b1-f8c2dac70e8d s1 --memspec file=/tmp/snap.1
Domain snapshot s1 created

5. check console 2 make sure there is no error:
[root@localhost ~]# virsh domjobinfo 5d9e1bfb-9342-4530-82b1-f8c2dac70e8d
Job type:         None  
<=== This is hang until snapshot finished, tracked by bz565552


[root@localhost ~]# virsh domjobinfo 5d9e1bfb-9342-4530-82b1-f8c2dac70e8d --completed
Job type:         Completed   
Operation:        Snapshot    
Time elapsed:     2298         ms
Data processed:   700.835 MiB
Data remaining:   0.000 B
Data total:       1.126 GiB
Memory processed: 700.835 MiB
Memory remaining: 0.000 B
Memory total:     1.126 GiB
Memory bandwidth: 1.340 GiB/s
Dirty rate:       0            pages/s
Page size:        4096         bytes
Iteration:        3           
Postcopy requests: 0           
Constant pages:   116251      
Normal pages:     178809      
Normal data:      698.473 MiB
Total downtime:   526          ms
Setup time:       6            ms

Comment 9 yisun 2019-04-26 03:48:15 UTC
(In reply to yisun from comment #8)
> (In reply to Fangge Jin from comment #7)
> > Hi yisun
> > 
> > The "hang" problem you met is a different issue and is tracked in this bug:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1565552#c2
> > 
> > Actually you can only get the domjobinfo in memory snapshot phase which is
> > very short, but you could increase the guest memory size to make the phase
> > longer. Note: adding memory load in guest will not help, because guest is
> > paused at the beginning.
> > 
> > And note that you can monitor libvirtd log when do the test, after you see
> > libvirtd sends command "migrate" to qemu monitor, you could try to query
> > domjobinfo immediately.
> 
> Thx for the info
> Verified with:
> libvirt-5.0.0-7.module+el8+2887+effa3c42.x86_64
> 
> 1. in vm run:
> [root@localhost ~]# dd if=/dev/urandom of=/tmp/1.file bs=1K count=5000000
> 
> 2. in host console 1 run:
> [root@localhost ~]# virsh snapshot-create-as
> 5d9e1bfb-9342-4530-82b1-f8c2dac70e8d s1 --memspec file=/tmp/snap.1
> 
> 3. in host console 2 run:
> [root@localhost ~]# virsh domjobinfo 5d9e1bfb-9342-4530-82b1-f8c2dac70e8d
> 
> 4. check console 1 until snapshot finished:
> [root@localhost ~]# virsh snapshot-create-as
> 5d9e1bfb-9342-4530-82b1-f8c2dac70e8d s1 --memspec file=/tmp/snap.1
> Domain snapshot s1 created
> 
> 5. check console 2 make sure there is no error:
> [root@localhost ~]# virsh domjobinfo 5d9e1bfb-9342-4530-82b1-f8c2dac70e8d
> Job type:         None  
> <=== This is hang until snapshot finished, tracked by bz565552
<==== typo here, should be bz1565552

Comment 11 errata-xmlrpc 2019-05-29 16:05:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1293


Note You need to log in before you can comment on or make changes to this bug.