Bug 1373600

Summary: virtio-balloon stats virtqueue does not migrate properly
Product: Red Hat Enterprise Linux 7 Reporter: Stefan Hajnoczi <stefanha>
Component: qemu-kvm-rhevAssignee: Ladi Prosek <lprosek>
Status: CLOSED ERRATA QA Contact: Yumei Huang <yuhuang>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: ailan, chayang, huding, jherrman, juzhang, mrezanin, mtessun, qizhu, qzhang, rcadova, snagar, virt-maint
Target Milestone: rcKeywords: Regression, ZStream
Target Release: 7.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.8.0-1 Doc Type: Bug Fix
Doc Text:
Prior to this update, migrated guest virtual machines in some cases entered an inconsistent state and terminated unexpectedly after the migration finished due to incorrect handling of the virtqueue. With this update, virtqueue handling on migration is fixed, and no longer causes problems after guest migration.
Story Points: ---
Clone Of:
: 1402509 (view as bug list) Environment:
Last Closed: 2017-08-01 23:34:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1395265, 1401400, 1402509    

Description Stefan Hajnoczi 2016-09-06 17:04:54 UTC
Several issues related to the virtio-balloon stats virtqueue have been or are being addressed upstream.

One was already fixed by Ladi Prosek in March:
4eae2a657d1ff5ada56eb9b4966eae0eff333b0b "balloon: fix segfault and harden the stats queue"

He is also currently working on fixing a stats virtqueue hang after migration.

These fixes all need to be backported to RHEL 7.3 and possibly 7.2.z.

Comment 1 Stefan Hajnoczi 2016-09-06 17:06:41 UTC
As discussed with Ladi and Michael Tsirkin on IRC, assigning to Ladi.

Comment 3 Ladi Prosek 2016-09-13 13:07:43 UTC
The fixes have been merged upstream. Adding Romana to see if it's worth trying to get them into 7.3 proper or if we should defer to 7.3.z.

This is a medium impact issue, not a regression. When migrating a VM with the virtio-balloon QEMU device with stats collection enabled, the stats queue stops working at the destination (i.e. no more stats collection stops after migration). It was independently found by several members of the community but we're not aware of any customers hitting this so far.

Upstream commits to cherry-pick:
297a75e virtio: add virtqueue_rewind()
4a1e48b virtio-balloon: fix stats vq migration

This commit mentioned in the description:
4eae2a6 balloon: fix segfault and harden the stats queue
doesn't have to be ported. It is already included in the RHEV-7.3 tree and is not needed in RHEL-7.3 because it doesn't have the problematic commit that introduced the segfault.

Comment 11 Yumei Huang 2017-03-13 08:37:54 UTC
Verify with same steps as  https://bugzilla.redhat.com/show_bug.cgi?id=1402509#c7, after migration, the stats collection works well  on dst host. 

Details:
qemu-kvm-rhev-2.8.0-5.el7
kernel-3.10.0-558.el7.x86_64

QEMU cmdline:
# /usr/libexec/qemu-kvm -m 4G rhel73-64-virtio.qcow2  -netdev tap,id=hostnet1 -device virtio-net-pci,mac=42:ce:a9:d2:4d:d9,id=idlbq7eA,netdev=hostnet1 -vnc :2  -monitor stdio  -no-user-config -nodefaults  -usb -device usb-tablet,id=input0 -vga qxl    -qmp tcp:0:4444,server,nowait -device virtio-balloon-pci,id=balloon0,guest-stats-polling-interval=2

After migration, 
{ "execute":"qom-get", "arguments":{"path":'/machine/peripheral/balloon0', "property": "guest-stats" } }
{"return": {"stats": {"stat-swap-out": 0, "stat-available-memory": 3261628416, "stat-free-memory": 3145887744, "stat-minor-faults": 1082832, "stat-major-faults": 1213, "stat-total-memory": 3975217152, "stat-swap-in": 0}, "last-update": 1489393955}}

Comment 12 Yumei Huang 2017-03-13 08:47:38 UTC
Moving to verified per comment 11.

Comment 14 errata-xmlrpc 2017-08-01 23:34:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 15 errata-xmlrpc 2017-08-02 01:12:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 16 errata-xmlrpc 2017-08-02 02:04:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 17 errata-xmlrpc 2017-08-02 02:45:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 18 errata-xmlrpc 2017-08-02 03:09:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 19 errata-xmlrpc 2017-08-02 03:29:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392