Bug 1380258 - ppc64le: > 1024GiB of guest RAM will conflict with IO
Summary: ppc64le: > 1024GiB of guest RAM will conflict with IO
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev   
(Show other bugs)
Version: 7.3
Hardware: ppc64le
OS: Linux
high
high
Target Milestone: rc
: 7.4
Assignee: David Gibson
QA Contact: Qunfang Zhang
URL:
Whiteboard:
Keywords:
: 1380618 (view as bug list)
Depends On:
Blocks: 1299988 1401400 1446211 RHV4.1PPC 1364562 1452455
TreeView+ depends on / blocked
 
Reported: 2016-09-29 06:36 UTC by Qunfang Zhang
Modified: 2017-11-28 03:42 UTC (History)
10 users (show)

Fixed In Version: qemu-2.8
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-01 23:37:14 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Screenshot for guest (20.15 KB, image/png)
2016-09-29 06:37 UTC, Qunfang Zhang
no flags Details
guest xml (2.86 KB, text/plain)
2016-10-04 05:30 UTC, IBM Bug Proxy
no flags Details
guest console logs (2.79 KB, text/plain)
2016-10-04 05:30 UTC, IBM Bug Proxy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 20:04:36 UTC
IBM Linux Technology Center 147106 None None None 2018-12-11 11:11 UTC

Description Qunfang Zhang 2016-09-29 06:36:04 UTC
Description of problem:
Boot up guest with 2048G maxmem, it's very slow. In my experiment, it hasn't boot up after 40 mins. With 1024G maxmem, guest works fine. We were aware of it before and had some related bz comment (bug 1263039 comment). Create the bz to track the issue. 

Version-Release number of selected component (if applicable):
kernel-3.10.0-510.el7.ppc64le
qemu-kvm-rhev-2.6.0-27.el7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Boot up a guest with 2048 maxmem, eg:

#  /usr/libexec/qemu-kvm -name test -machine pseries,accel=kvm,usb=off -m 32G,slots=4,maxmem=2048G -smp 4,sockets=1,cores=4,threads=1 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=rhel72-ppc64le-virtio.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0  -drive if=none,id=drive-scsi0-0-1-0,readonly=on -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,bootindex=2,id=scsi0-0-1-0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1  -vga std -qmp tcp:0:4666,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c

2.
3.

Actual results:
Guest could not boot up after 40 mins... still waiting for it, now sure when it could boot up.

Expected results:
Guest could boot up with a few seconds.

Additional info:

Host info:
# free -m 
              total        used        free      shared  buff/cache   available
Mem:         519147       36950      480287          35        1908      480527
Swap:          4095           0        4095


# cat /proc/cpuinfo 

processor	: 0
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 8
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 16
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 24
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 32
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 40
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 48
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 56
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 64
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 72
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 80
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 88
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 96
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 104
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 112
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 120
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

timebase	: 512000000
platform	: PowerNV
model		: 8335-GTB        
machine		: PowerNV 8335-GTB        
firmware	: OPAL v3

Comment 1 Qunfang Zhang 2016-09-29 06:37 UTC
Created attachment 1205815 [details]
Screenshot for guest

Comment 3 Qunfang Zhang 2016-09-30 02:47:55 UTC
Didn't boot up after nearly 1 day.

Comment 4 David Gibson 2016-09-30 03:16:01 UTC
Ouch.  That's much worse than I thought.

I'm investigating.

Comment 5 Qunfang Zhang 2016-09-30 05:20:10 UTC
Even with 1024G plus a bit more maxmem can reproduce it.

Comment 6 David Gibson 2016-09-30 05:26:44 UTC
Unfortunately, I haven't been able to reproduce this.  I'm unable to run a guest with maxmem > 256G on our machine, because it's not able to allocate enough contiguous memory for the guest's hash page table, so doesn't start it at all.

What are the symptoms when the system doesn't start?  Is there any output on the console at all?

For simplicity can you also please try with no VGA or USB devices, just the spapr-vty console.

Comment 7 David Gibson 2016-09-30 05:48:28 UTC
Ok, after IRC discussion, I see that this is not the bug I thought it was.

There's a known problem with slow startup with large maxmem values, but that occurs before even the firmware executes.

This bug is a hang during actual kernel boot up, and only seems to occur with large enough maxmem *and* VGA+USB present.

Comment 8 Qunfang Zhang 2016-09-30 06:03:51 UTC
Yes, remove VGA, guest boots up successfully within a few seconds with 2048G maxmem:

 /usr/libexec/qemu-kvm -name test -machine pseries,accel=kvm,usb=off -m 32G,slots=4,maxmem=2048G -smp 4,sockets=1,cores=4,threads=1 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -serial stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=rhel72-ppc64le-virtio.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0  -drive if=none,id=drive-scsi0-0-1-0,readonly=on -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,bootindex=2,id=scsi0-0-1-0  -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -usb -device usb-tablet,id=tablet1

Comment 9 David Gibson 2016-09-30 06:41:28 UTC
Some further observations:
  * Trips with just VGA, but not USB (in this case the guest doesn't use the vga as console, but we still see the hang on the vty console)
  * Doesn't trip with i6300esb (as a different emulated PCI device)

My best guess at this point would be somehow getting an overlap between the memory and some IO region, but I don't really know how.

Comment 10 Qunfang Zhang 2016-09-30 07:24:26 UTC
Re-test for the following configuration, didn't got difference results:

(1)RHEL7.3 guest and RHEL7.2-z
(2) std vga and virtio-vga

All can reproduce this bug.

Comment 11 Laurent Vivier 2016-10-03 11:52:50 UTC
An "info mtree" from qemu  monitor could help.

It seems the base address of PCI interface is at 0x0000010080000000, which is 1024 GiB + 2 GiB.

So I think memory is overlapping PCI I/O address space.

(qemu) info mtree
address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, RW): system
    0000000000000000-000000003fffffff (prio 0, RW): ppc_spapr.ram
    0000000040000000-0000000fffffffff (prio 0, RW): hotplug-memory
    0000010080000000-000001008000ffff (prio 0, RW): alias pci@800000020000000.io-alias @pci@800000020000000.io 0000000000000000-000000000000ffff
    00000100a0000000-000001101fffffff (prio 0, RW): alias pci@800000020000000.mmio-alias @pci@800000020000000.mmio 0000000080000000-0000000fffffffff

from include/hw/pci-host/spapr.h:

#define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
#define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x00080000000ULL

Comment 12 David Gibson 2016-10-04 01:45:13 UTC
Ouch.  I suspect some kind of RAM / IO collision, but that's rather less subtle than I expected.  I guess 1TiB of RAM seemed like an enormous about when I first wrote that constant.

Ok, so two things we need to do:
  1) In the short term, have qemu error out gracefully if >1TiB of RAM is requested.

  2) Longer term we need to change the default placement of the PHBs.  We wanted to change the spacing anyway to allow for more big-IO cards in each PCI domain (particularly for the nVidia cards which have enormous MMIO BARs).  We can fold these two changes together.

So, working out how to do the placement change without breaking compatibility or migration just took a big leap up my priority list.

Comment 13 David Gibson 2016-10-04 01:49:24 UTC
*** Bug 1380618 has been marked as a duplicate of this bug. ***

Comment 14 IBM Bug Proxy 2016-10-04 05:30:25 UTC
Created attachment 1207073 [details]
guest xml

Comment 15 IBM Bug Proxy 2016-10-04 05:30:32 UTC
Created attachment 1207074 [details]
guest console logs

Comment 16 IBM Bug Proxy 2016-10-04 05:40:43 UTC
------- Comment From ckumar27@in.ibm.com 2016-10-04 01:38 EDT-------
*** This bug has been marked as a duplicate of bug 147001 ***

Comment 17 David Gibson 2016-10-06 04:34:00 UTC
I've posted a series of patches to address this upstream.  It's structed as a minimal fix (2 patches) followed by a more extensive fix which fixes additional problems (2 more patches).

Once it's thrashed out upstream, my intention is to backport the minimal fix for 7.3.z.  7.4 should get the whole set via rebase.

Comment 18 David Gibson 2016-10-12 04:32:02 UTC
I've revised the series mentioned in comment 17, and posted upstream.  I'm hoping this one will be good enough to merge.

Comment 20 David Gibson 2016-11-18 02:27:09 UTC
This is now merged upstream, we should get the fix in the rebase.

Comment 22 Qunfang Zhang 2017-04-11 08:41:39 UTC
Verified this bug with qemu-kvm-rhev-2.9.0-0.el7.patchwork201703291116.ppc64le.rpm with the same steps in comment 0, the issue has been fixed. Guest could boot up successfully now with around 1 min. 

I'll re-test once the official qemu-kvm-rhev-2.9 comes out.

Comment 23 Qunfang Zhang 2017-05-05 09:13:29 UTC
This bug is verified pass with qemu-kvm-rhev-2.9.0-2.el7.ppc64le.

Comment 25 errata-xmlrpc 2017-08-01 23:37:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 26 errata-xmlrpc 2017-08-02 01:14:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 27 errata-xmlrpc 2017-08-02 02:06:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 28 errata-xmlrpc 2017-08-02 02:47:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 29 errata-xmlrpc 2017-08-02 03:12:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 30 errata-xmlrpc 2017-08-02 03:32:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392


Note You need to log in before you can comment on or make changes to this bug.