Bug 1470035

Summary: [qmp] Load internal snapshot failed on Power9
Product: Red Hat Enterprise Linux 7 Reporter: yilzhang
Component: qemu-kvmAssignee: Laurent Vivier <lvivier>
Status: CLOSED ERRATA QA Contact: yilzhang
Severity: high Docs Contact:
Priority: high    
Version: 7.4-AltCC: areis, bugproxy, fnovak, gsun, haizhao, hannsj_uhl, knoel, lvivier, mrezanin, qzhang, rbalakri, virt-maint, yilzhang
Target Milestone: rcKeywords: Patch
Target Release: 7.4-Alt   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-2.9.0-19.el7a Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-09 11:31:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1440030    

Description yilzhang 2017-07-12 10:18:21 UTC
Description of problem:
Load internal snapshot via QMP failed on Power9.

Version-Release number of selected component (if applicable):
Host:  kernel: 4.11.0-10.el7a.ppc64le
       qemu-kvm-2.9.0-16.el7a.ppc64le
Guest: 4.11.0-10.el7a.ppc64le


How reproducible: 100%


Steps to Reproduce:
1. Boot guest with qmp
2. Connect to qmp
#telnet $qmp_server_host_ip 5555
3.Issue qmp_capabilities command in the telnet client
{"execute":"qmp_capabilities"}

4. Create internal snapshot with command 'savevm'
{"execute":"human-monitor-command","arguments":{"command-line":"savevm sn1"}}

5.Create some files inside guest with 'dd' command and create another
internal snapshot.
[guest]#  dd if=/dev/zero of=/home/file102M bs=1M count=102  oflag=sync
{"execute":"human-monitor-command","arguments":{"command-line":"savevm sn2"}}

6. Check the internal snapshot list.
{"execute":"human-monitor-command","arguments":{"command-line":"info snapshots"}} 
{"return": "List of snapshots present on all disks:\r\nID        TAG                 VM SIZE                DATE       VM CLOCK\r\n--        sn1                    313M 2017-07-12 08:43:49   00:02:32.537\r\n--        sn2                    283M 2017-07-12 08:45:13   00:03:28.939\r\n"}

7 Load the snapshot "sn1"
{"execute":"human-monitor-command","arguments":{"command-line":"loadvm sn1"}}


Actual results:
In step7, load snapshot failed:
{"timestamp": {"seconds": 1499863548, "microseconds": 570991}, "event": "STOP"}
{"return": "htab_load() bad index 2113929216 (9474+0 entries) in htab stream (htab_shift=0)\r\nerror while loading state for instance 0x0 of device 'spapr/htab'\r\nError -22 while loading VM state\r\n"}
{"execute":"query-status"}
{"return": {"status": "restore-vm", "singlestep": false, "running": false}}


Expected results:
In step7, load snapshot should succeed, and VM should be in "running" status

Additional info:
Power8 doesn't have this issue

Comment 2 Laurent Vivier 2017-07-12 11:35:32 UTC
According to error message:

{"return": "htab_load() bad index 2113929216 (9474+0 entries) in htab stream (htab_shift=0)\r\nerror while loading state for instance 0x0 of device 'spapr/htab'\r\nError -22 while loading VM state\r\n"}

This is a duplicate of BZ1456287, please re-test with qemu-kvm-2.9.0-17.el7a

Comment 3 yilzhang 2017-07-17 07:45:28 UTC
After applying qemu-kvm-2.9.0-17.el7a on Power9 host, the testing result is as follows:

1. BZ 1456287 cannot be reproduced
2. But this bug still exists, and its actual result is a little different now:
   step4 ( Create internal snapshot with command 'savevm' ) hangs there

Comment 4 Laurent Vivier 2017-07-17 19:59:03 UTC
(In reply to yilzhang from comment #3)
> After applying qemu-kvm-2.9.0-17.el7a on Power9 host, the testing result is
> as follows:
> 
> 1. BZ 1456287 cannot be reproduced
> 2. But this bug still exists, and its actual result is a little different
> now:
>    step4 ( Create internal snapshot with command 'savevm' ) hangs there


I think there is a bug in the bugfix patch.

The end of sequence should be marked by a positive value, not by 0.

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 970093e..fa01511 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1827,7 +1827,7 @@ static int htab_save_iterate(QEMUFile *f, void *opaque)
     /* Iteration header */
     if (!spapr->htab_shift) {
         qemu_put_be32(f, -1);
-        return 0;
+        return 1;
     } else {
         qemu_put_be32(f, 0);
     }
@@ -1866,7 +1866,7 @@ static int htab_save_complete(QEMUFile *f, void *opaque)
     /* Iteration header */
     if (!spapr->htab_shift) {
         qemu_put_be32(f, -1);
-        return 0;
+        return 1;
     } else {
         qemu_put_be32(f, 0);
     }

Comment 6 yilzhang 2017-07-18 03:42:55 UTC
Hi Laurent,

Using your latest build, this case passed without error.
[host]#  rpm -qa| grep qemu-kvm
qemu-kvm-tools-rhev-2.9.0-17.el7a.lvivier201707172158.ppc64le
qemu-kvm-common-rhev-2.9.0-17.el7a.lvivier201707172158.ppc64le
qemu-kvm-rhev-2.9.0-17.el7a.lvivier201707172158.ppc64le
qemu-kvm-rhev-debuginfo-2.9.0-17.el7a.lvivier201707172158.ppc64le

Comment 7 Laurent Vivier 2017-07-18 08:22:38 UTC
Patch sent upstream:

https://lists.nongnu.org/archive/html/qemu-devel/2017-07/msg05494.html

Comment 8 IBM Bug Proxy 2017-08-01 12:30:30 UTC
------- Comment From sthoufee.com 2017-08-01 08:21 EDT-------
patch accepted upstream
http://git.qemu.org/?p=qemu.git;a=commit;h=e8cd4247e96bb2158ef0ae0ff20e72746b9dd32d

Comment 9 Miroslav Rezanina 2017-08-01 13:34:16 UTC
Fix included in qemu-kvm-2.9.0-19.el7a

Comment 11 Laurent Vivier 2017-08-04 09:06:45 UTC
*** Bug 1473121 has been marked as a duplicate of this bug. ***

Comment 12 yilzhang 2017-08-08 08:27:13 UTC
This bug has been verified against the following version of components:

host:  kernel-4.11.0-22.el7a.ppc64le
qemu-kvm-2.9.0-19.el7a.ppc64le
SLOF-20170303-4.git66d250e.el7.noarch
guest kernel: 4.11.0-16.el7a.ppc64le


Actual results: load snapshot succeeds without error
So, this bug is fixed.

Comment 14 errata-xmlrpc 2017-11-09 11:31:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3169