Bug 1470035 - [qmp] Load internal snapshot failed on Power9
[qmp] Load internal snapshot failed on Power9
Status: VERIFIED
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm (Show other bugs)
7.4-Alt
ppc64le Linux
high Severity high
: rc
: 7.4-Alt
Assigned To: Laurent Vivier
yilzhang
: Patch
: 1473121 (view as bug list)
Depends On:
Blocks: 1440030
  Show dependency treegraph
 
Reported: 2017-07-12 06:18 EDT by yilzhang
Modified: 2017-08-14 01:37 EDT (History)
13 users (show)

See Also:
Fixed In Version: qemu-kvm-2.9.0-19.el7a
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 156874 None None None 2017-07-20 07:40 EDT

  None (edit)
Description yilzhang 2017-07-12 06:18:21 EDT
Description of problem:
Load internal snapshot via QMP failed on Power9.

Version-Release number of selected component (if applicable):
Host:  kernel: 4.11.0-10.el7a.ppc64le
       qemu-kvm-2.9.0-16.el7a.ppc64le
Guest: 4.11.0-10.el7a.ppc64le


How reproducible: 100%


Steps to Reproduce:
1. Boot guest with qmp
2. Connect to qmp
#telnet $qmp_server_host_ip 5555
3.Issue qmp_capabilities command in the telnet client
{"execute":"qmp_capabilities"}

4. Create internal snapshot with command 'savevm'
{"execute":"human-monitor-command","arguments":{"command-line":"savevm sn1"}}

5.Create some files inside guest with 'dd' command and create another
internal snapshot.
[guest]#  dd if=/dev/zero of=/home/file102M bs=1M count=102  oflag=sync
{"execute":"human-monitor-command","arguments":{"command-line":"savevm sn2"}}

6. Check the internal snapshot list.
{"execute":"human-monitor-command","arguments":{"command-line":"info snapshots"}} 
{"return": "List of snapshots present on all disks:\r\nID        TAG                 VM SIZE                DATE       VM CLOCK\r\n--        sn1                    313M 2017-07-12 08:43:49   00:02:32.537\r\n--        sn2                    283M 2017-07-12 08:45:13   00:03:28.939\r\n"}

7 Load the snapshot "sn1"
{"execute":"human-monitor-command","arguments":{"command-line":"loadvm sn1"}}


Actual results:
In step7, load snapshot failed:
{"timestamp": {"seconds": 1499863548, "microseconds": 570991}, "event": "STOP"}
{"return": "htab_load() bad index 2113929216 (9474+0 entries) in htab stream (htab_shift=0)\r\nerror while loading state for instance 0x0 of device 'spapr/htab'\r\nError -22 while loading VM state\r\n"}
{"execute":"query-status"}
{"return": {"status": "restore-vm", "singlestep": false, "running": false}}


Expected results:
In step7, load snapshot should succeed, and VM should be in "running" status

Additional info:
Power8 doesn't have this issue
Comment 2 Laurent Vivier 2017-07-12 07:35:32 EDT
According to error message:

{"return": "htab_load() bad index 2113929216 (9474+0 entries) in htab stream (htab_shift=0)\r\nerror while loading state for instance 0x0 of device 'spapr/htab'\r\nError -22 while loading VM state\r\n"}

This is a duplicate of BZ1456287, please re-test with qemu-kvm-2.9.0-17.el7a
Comment 3 yilzhang 2017-07-17 03:45:28 EDT
After applying qemu-kvm-2.9.0-17.el7a on Power9 host, the testing result is as follows:

1. BZ 1456287 cannot be reproduced
2. But this bug still exists, and its actual result is a little different now:
   step4 ( Create internal snapshot with command 'savevm' ) hangs there
Comment 4 Laurent Vivier 2017-07-17 15:59:03 EDT
(In reply to yilzhang from comment #3)
> After applying qemu-kvm-2.9.0-17.el7a on Power9 host, the testing result is
> as follows:
> 
> 1. BZ 1456287 cannot be reproduced
> 2. But this bug still exists, and its actual result is a little different
> now:
>    step4 ( Create internal snapshot with command 'savevm' ) hangs there


I think there is a bug in the bugfix patch.

The end of sequence should be marked by a positive value, not by 0.

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 970093e..fa01511 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1827,7 +1827,7 @@ static int htab_save_iterate(QEMUFile *f, void *opaque)
     /* Iteration header */
     if (!spapr->htab_shift) {
         qemu_put_be32(f, -1);
-        return 0;
+        return 1;
     } else {
         qemu_put_be32(f, 0);
     }
@@ -1866,7 +1866,7 @@ static int htab_save_complete(QEMUFile *f, void *opaque)
     /* Iteration header */
     if (!spapr->htab_shift) {
         qemu_put_be32(f, -1);
-        return 0;
+        return 1;
     } else {
         qemu_put_be32(f, 0);
     }
Comment 6 yilzhang 2017-07-17 23:42:55 EDT
Hi Laurent,

Using your latest build, this case passed without error.
[host]#  rpm -qa| grep qemu-kvm
qemu-kvm-tools-rhev-2.9.0-17.el7a.lvivier201707172158.ppc64le
qemu-kvm-common-rhev-2.9.0-17.el7a.lvivier201707172158.ppc64le
qemu-kvm-rhev-2.9.0-17.el7a.lvivier201707172158.ppc64le
qemu-kvm-rhev-debuginfo-2.9.0-17.el7a.lvivier201707172158.ppc64le
Comment 7 Laurent Vivier 2017-07-18 04:22:38 EDT
Patch sent upstream:

https://lists.nongnu.org/archive/html/qemu-devel/2017-07/msg05494.html
Comment 8 IBM Bug Proxy 2017-08-01 08:30:30 EDT
------- Comment From sthoufee@in.ibm.com 2017-08-01 08:21 EDT-------
patch accepted upstream
http://git.qemu.org/?p=qemu.git;a=commit;h=e8cd4247e96bb2158ef0ae0ff20e72746b9dd32d
Comment 9 Miroslav Rezanina 2017-08-01 09:34:16 EDT
Fix included in qemu-kvm-2.9.0-19.el7a
Comment 11 Laurent Vivier 2017-08-04 05:06:45 EDT
*** Bug 1473121 has been marked as a duplicate of this bug. ***
Comment 12 yilzhang 2017-08-08 04:27:13 EDT
This bug has been verified against the following version of components:

host:  kernel-4.11.0-22.el7a.ppc64le
qemu-kvm-2.9.0-19.el7a.ppc64le
SLOF-20170303-4.git66d250e.el7.noarch
guest kernel: 4.11.0-16.el7a.ppc64le


Actual results: load snapshot succeeds without error
So, this bug is fixed.

Note You need to log in before you can comment on or make changes to this bug.