1380258 – ppc64le: > 1024GiB of guest RAM will conflict with IO

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1380258 - ppc64le: > 1024GiB of guest RAM will conflict with IO

Summary: ppc64le: > 1024GiB of guest RAM will conflict with IO

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.3
Hardware:	ppc64le
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	7.4
Assignee:	David Gibson
QA Contact:	Qunfang Zhang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1380618 (view as bug list)
Depends On:
Blocks:	1299988 RHV4.1PPC 1364562 1401400 1446211 1452455
TreeView+	depends on / blocked

Reported:	2016-09-29 06:36 UTC by Qunfang Zhang
Modified:	2017-11-28 03:42 UTC (History)
CC List:	10 users (show)
Fixed In Version:	qemu-2.8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-01 23:37:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Screenshot for guest (20.15 KB, image/png) 2016-09-29 06:37 UTC, Qunfang Zhang	no flags	Details
guest xml (2.86 KB, text/plain) 2016-10-04 05:30 UTC, IBM Bug Proxy	no flags	Details
guest console logs (2.79 KB, text/plain) 2016-10-04 05:30 UTC, IBM Bug Proxy	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	147106	0	None	None	None	2019-07-22 21:29:46 UTC
Red Hat Product Errata	RHSA-2017:2392	0	normal	SHIPPED_LIVE	Important: qemu-kvm-rhev security, bug fix, and enhancement update	2017-08-01 20:04:36 UTC

Description Qunfang Zhang 2016-09-29 06:36:04 UTC

Description of problem:
Boot up guest with 2048G maxmem, it's very slow. In my experiment, it hasn't boot up after 40 mins. With 1024G maxmem, guest works fine. We were aware of it before and had some related bz comment (bug 1263039 comment). Create the bz to track the issue. 

Version-Release number of selected component (if applicable):
kernel-3.10.0-510.el7.ppc64le
qemu-kvm-rhev-2.6.0-27.el7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Boot up a guest with 2048 maxmem, eg:

#  /usr/libexec/qemu-kvm -name test -machine pseries,accel=kvm,usb=off -m 32G,slots=4,maxmem=2048G -smp 4,sockets=1,cores=4,threads=1 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=rhel72-ppc64le-virtio.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0  -drive if=none,id=drive-scsi0-0-1-0,readonly=on -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,bootindex=2,id=scsi0-0-1-0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1  -vga std -qmp tcp:0:4666,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c

2.
3.

Actual results:
Guest could not boot up after 40 mins... still waiting for it, now sure when it could boot up.

Expected results:
Guest could boot up with a few seconds.

Additional info:

Host info:
# free -m 
              total        used        free      shared  buff/cache   available
Mem:         519147       36950      480287          35        1908      480527
Swap:          4095           0        4095


# cat /proc/cpuinfo 

processor	: 0
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 8
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 16
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 24
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 32
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 40
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 48
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 56
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 64
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 72
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 80
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 88
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 96
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 104
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 112
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

processor	: 120
cpu		: POWER8NVL (raw), altivec supported
clock		: 4023.000000MHz
revision	: 1.0 (pvr 004c 0100)

timebase	: 512000000
platform	: PowerNV
model		: 8335-GTB        
machine		: PowerNV 8335-GTB        
firmware	: OPAL v3

Comment 1 Qunfang Zhang 2016-09-29 06:37:20 UTC

Created attachment 1205815 [details]
Screenshot for guest

Comment 3 Qunfang Zhang 2016-09-30 02:47:55 UTC

Didn't boot up after nearly 1 day.

Comment 4 David Gibson 2016-09-30 03:16:01 UTC

Ouch.  That's much worse than I thought.

I'm investigating.

Comment 5 Qunfang Zhang 2016-09-30 05:20:10 UTC

Even with 1024G plus a bit more maxmem can reproduce it.

Comment 6 David Gibson 2016-09-30 05:26:44 UTC

Unfortunately, I haven't been able to reproduce this.  I'm unable to run a guest with maxmem > 256G on our machine, because it's not able to allocate enough contiguous memory for the guest's hash page table, so doesn't start it at all.

What are the symptoms when the system doesn't start?  Is there any output on the console at all?

For simplicity can you also please try with no VGA or USB devices, just the spapr-vty console.

Comment 7 David Gibson 2016-09-30 05:48:28 UTC

Ok, after IRC discussion, I see that this is not the bug I thought it was.

There's a known problem with slow startup with large maxmem values, but that occurs before even the firmware executes.

This bug is a hang during actual kernel boot up, and only seems to occur with large enough maxmem *and* VGA+USB present.

Comment 8 Qunfang Zhang 2016-09-30 06:03:51 UTC

Yes, remove VGA, guest boots up successfully within a few seconds with 2048G maxmem:

 /usr/libexec/qemu-kvm -name test -machine pseries,accel=kvm,usb=off -m 32G,slots=4,maxmem=2048G -smp 4,sockets=1,cores=4,threads=1 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -serial stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=rhel72-ppc64le-virtio.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0  -drive if=none,id=drive-scsi0-0-1-0,readonly=on -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,bootindex=2,id=scsi0-0-1-0  -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c -usb -device usb-tablet,id=tablet1

Comment 9 David Gibson 2016-09-30 06:41:28 UTC

Some further observations:
  * Trips with just VGA, but not USB (in this case the guest doesn't use the vga as console, but we still see the hang on the vty console)
  * Doesn't trip with i6300esb (as a different emulated PCI device)

My best guess at this point would be somehow getting an overlap between the memory and some IO region, but I don't really know how.

Comment 10 Qunfang Zhang 2016-09-30 07:24:26 UTC

Re-test for the following configuration, didn't got difference results:

(1)RHEL7.3 guest and RHEL7.2-z
(2) std vga and virtio-vga

All can reproduce this bug.

Comment 11 Laurent Vivier 2016-10-03 11:52:50 UTC

An "info mtree" from qemu  monitor could help.

It seems the base address of PCI interface is at 0x0000010080000000, which is 1024 GiB + 2 GiB.

So I think memory is overlapping PCI I/O address space.

(qemu) info mtree
address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, RW): system
    0000000000000000-000000003fffffff (prio 0, RW): ppc_spapr.ram
    0000000040000000-0000000fffffffff (prio 0, RW): hotplug-memory
    0000010080000000-000001008000ffff (prio 0, RW): alias pci @pci 0000000000000000-000000000000ffff
    00000100a0000000-000001101fffffff (prio 0, RW): alias pci @pci 0000000080000000-0000000fffffffff

from include/hw/pci-host/spapr.h:

#define SPAPR_PCI_WINDOW_BASE        0x10000000000ULL
#define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x00080000000ULL

Comment 12 David Gibson 2016-10-04 01:45:13 UTC

Ouch.  I suspect some kind of RAM / IO collision, but that's rather less subtle than I expected.  I guess 1TiB of RAM seemed like an enormous about when I first wrote that constant.

Ok, so two things we need to do:
  1) In the short term, have qemu error out gracefully if >1TiB of RAM is requested.

  2) Longer term we need to change the default placement of the PHBs.  We wanted to change the spacing anyway to allow for more big-IO cards in each PCI domain (particularly for the nVidia cards which have enormous MMIO BARs).  We can fold these two changes together.

So, working out how to do the placement change without breaking compatibility or migration just took a big leap up my priority list.

Comment 13 David Gibson 2016-10-04 01:49:24 UTC

*** Bug 1380618 has been marked as a duplicate of this bug. ***

Comment 14 IBM Bug Proxy 2016-10-04 05:30:25 UTC

Created attachment 1207073 [details]
guest xml

Comment 15 IBM Bug Proxy 2016-10-04 05:30:32 UTC

Created attachment 1207074 [details]
guest console logs

Comment 16 IBM Bug Proxy 2016-10-04 05:40:43 UTC

------- Comment From ckumar27.com 2016-10-04 01:38 EDT-------
*** This bug has been marked as a duplicate of bug 147001 ***

Comment 17 David Gibson 2016-10-06 04:34:00 UTC

I've posted a series of patches to address this upstream.  It's structed as a minimal fix (2 patches) followed by a more extensive fix which fixes additional problems (2 more patches).

Once it's thrashed out upstream, my intention is to backport the minimal fix for 7.3.z.  7.4 should get the whole set via rebase.

Comment 18 David Gibson 2016-10-12 04:32:02 UTC

I've revised the series mentioned in comment 17, and posted upstream.  I'm hoping this one will be good enough to merge.

Comment 20 David Gibson 2016-11-18 02:27:09 UTC

This is now merged upstream, we should get the fix in the rebase.

Comment 22 Qunfang Zhang 2017-04-11 08:41:39 UTC

Verified this bug with qemu-kvm-rhev-2.9.0-0.el7.patchwork201703291116.ppc64le.rpm with the same steps in comment 0, the issue has been fixed. Guest could boot up successfully now with around 1 min. 

I'll re-test once the official qemu-kvm-rhev-2.9 comes out.

Comment 23 Qunfang Zhang 2017-05-05 09:13:29 UTC

This bug is verified pass with qemu-kvm-rhev-2.9.0-2.el7.ppc64le.

Comment 25 errata-xmlrpc 2017-08-01 23:37:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 26 errata-xmlrpc 2017-08-02 01:14:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 27 errata-xmlrpc 2017-08-02 02:06:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 28 errata-xmlrpc 2017-08-02 02:47:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 29 errata-xmlrpc 2017-08-02 03:12:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 30 errata-xmlrpc 2017-08-02 03:32:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Note You need to log in before you can comment on or make changes to this bug.