Bug 1263039
Summary: | SLOF doesn't allow enough room for CAS response with large maxmem | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | David Gibson <dgibson> |
Component: | SLOF | Assignee: | David Gibson <dgibson> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.2 | CC: | dgibson, gklein, hannsj_uhl, knoel, michal.skrivanek, michen, mrezanin, ngu, qzhang, shuyu, tlavigne, xuhan, xuma, zhengtli |
Target Milestone: | rc | ||
Target Release: | 7.2 | ||
Hardware: | ppc64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | SLOF-20150313-4.gitc89b0df.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-11-19 09:21:23 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1201513, 1261812, 1262143, 1263563, 1277183, 1277184 |
Description
David Gibson
2015-09-15 02:09:19 UTC
Karen, Miya, acks please. Draft build with fix at https://brewweb.devel.redhat.com/taskinfo?taskID=9835585 Hi, David Just test your build in comment 2, result is: 1) Boot up guest with "-m 2G,slots=2,maxmem=512G -smp 2,sockets=1,cores=2,threads=1": Guest could boots up with about 17s. (In the buggy official build SLOF-20150313-3.gitc89b0df.el7.noarch, could not boot up and prompt the error in comment 0. ) 2) Boot up guest with "-m 2G,slots=2,maxmem=1024G -smp 2,sockets=1,cores=2,threads=1" Guest could boot up with about 30 mins. 3) Boot up guest with "-m 2G,slots=2,maxmem=2048G -smp 2,sockets=1,cores=2,threads=1" Still reproduce the issue. (It takes about 27 mins to reproduce it. Qemu process consumes 100% cpu at first, and after about 27 mins, guest fails to boot up and prompts the error) (qemu) (qemu) qemu: error creating device tree: (spapr_populate_drconf_memory(spapr, fdt)): FDT_ERR_NOSPACE Hi, David Could you help confirm? Thanks, Qunfang Please ignore the guest boot up slow issue. After apply the qemu scratch build in bug 1262143 comment 5. guest could start booting quickly. But, still, with 2048G memory, guest fails to boot up and prompt: (qemu) (qemu) qemu: error creating device tree: (spapr_populate_drconf_memory(spapr, fdt)): FDT_ERR_NOSPACE I agree this is a blocker for ppc64le. This issue blocks support for large ppc64le guests, either lots of memory or many devices. The fix is low risk and fixes the issue. Qunfang, Thanks for the test. It's still crashing with maxmem=2T because the new buffer was sized for only 1T of maxmem. RHEV have already decided for other reasons that we should limit maxmem to 1T for the RHEL 7.2 release so I think the remaining problem can be deferred. Once the patch is merged, please just verify up to 1T maxmem. David, Got it, thanks! Fix included in SLOF-20150313-4.gitc89b0df.el7 Reproduced the bug on SLOF-20150313-3.gitc89b0df.el7.noarch. # /usr/libexec/qemu-kvm -name test -machine pseries,accel=kvm,usb=off -m 4G,slots=4,maxmem=512G -smp 4,sockets=1,cores=4,threads=1 -uuid 8aeab7e2-f341-4f8c-80e8-59e2968d85c2 -realtime mlock=off -nodefaults -monitor stdio -rtc base=utc -device spapr-vscsi,id=scsi0,reg=0x1000 -drive file=RHEL-7.2-LE-new.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,bootindex=1,id=scsi0-0-0-0 -drive if=none,id=drive-scsi0-0-1-0,readonly=on,format=raw -device scsi-cd,bus=scsi0.0,drive=drive-scsi0-0-1-0,bootindex=2,id=scsi0-0-1-0 -vnc :10 -msg timestamp=on -usb -device usb-tablet,id=tablet1 -vga std -qmp tcp:0:4666,server,nowait -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:54:5a:5f:5b:5c QEMU 2.3.0 monitor - type 'help' for more information (qemu) (qemu) (qemu) (qemu) qemu: error creating device tree: (spapr_populate_drconf_memory(spapr, fdt)): FDT_ERR_NOSPACE /etc/qemu-ifdown: could not launch network script Verified pass on SLOF-20150313-4.gitc89b0df.el7.noarch.rpm. Boot the guest with "maxmem=512G" and "maxmem=1024G", guest could boot up successfully, reboot and shutdown guest, all works well. So this bug is fixed. Changing hardware back to ppc64. SLOF is technically big-endian (ppc64) even if guest and host are little endian. (In reply to David Gibson from comment #11) > Changing hardware back to ppc64. SLOF is technically big-endian (ppc64) > even if guest and host are little endian. Oops, good to know, thanks for correction. Setting to VERIFIED according to comment 10. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2286.html |