Description of problem: As described in the summary, the current Rawhide kernel does not boot on qemu-system-ppc64 on either ppc64 or ppc64le architectures. This worked only a few days ago, so it's a recent regression in something or other. Version-Release number of selected component (if applicable): Last tested versions which *worked*: qemu 2:2.8.0-2.fc26 kernel 4.11.0-0.rc0.git2.1.fc26 libguestfs-1.35.27-1.fc26 Failing versions: qemu 2:2.8.0-2.fc26 kernel 4.11.0-0.rc0.git3.1.fc26 libguestfs-1.35.28-1.fc26 How reproducible: At least once. Steps to Reproduce: 1. Boot Linux on qemu. Please see the full command lines used when I attach the build logs.
Created attachment 1257483 [details] build.log for ppc64
Created attachment 1257484 [details] build.log for ppc64le
Proposed as a Blocker for 26-alpha by Fedora user michelmno using the blocker tracking app because: current linux kernel in last compose 20170302 make the POWERPC boot to hang in qemu env.
Neither ppc64 nor ppc64le is a release-blocking arch, so this cannot possibly be a release blocker. I'm gonna take the initiative to change this to a proposed freeze exception.
Discussed during the 2017-03-06 blocker review meeting: [1] The decision was made to accept this bug as an Alpha Freeze Exception as this would be a blocker on a blocking-arch, and as such, affects the usage of the ppc64 and ppc64le significantly to warrant such action. [1] https://meetbot.fedoraproject.org/fedora-blocker-review/2017-03-06/f26-blocker-review.2017-03-06-17.02.txt
Still failing in the same way with: kernel 4.11.0-0.rc1.git0.1.fc27 qemu 2:2.8.0-2.fc26
should be something KVM or qemu related, as the latest kernel boots on bare metal [root@ibm-p8-generic-01 ~]# uname -a Linux ibm-p8-generic-01.lab.eng.brq.redhat.com 4.11.0-0.rc1.git1.2.fc27.ppc64le #1 SMP Thu Mar 9 03:59:12 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
Note that we have the same problem if the host is a F25 with following kernel and qemu version (1) when trying to create a guest with f26 compose from (2) which is the kernel 4.11 (1) === $qemu-ppc64le --version qemu-ppc64lef version 2.7.1(qemu-2.7.1-2.fc25), Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers $uname -a Linux fenix.test.toulouse-stg.fr.ibm.com 4.9.9-200.fc25.ppc64le #1 SMP Thu Feb 16 16:10:02 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux === (2) https://kojipkgs.fedoraproject.org/compose/branched/latest-Fedora-26/compose/Server/ppc64le/iso/ ===
Do you know if the same kernel boots under PowerVM or on bare metal? In other words, I'm trying to isolate if the problem is in the guest kernel or in qemu.
(In reply to David Gibson from comment #9) > Do you know if the same kernel boots under PowerVM or on bare metal? > > In other words, I'm trying to isolate if the problem is in the guest kernel > or in qemu. above comment #7 reported boot success on bare metal
I tried to boot the Fedora-Server-dvd-ppc64le-26-20170309.n.0.iso image on a LPAR and got: OF stdout device is: /vdevice/vty@30000000 Preparing to boot Linux version 4.11.0-0.rc1.git0.1.fc26.ppc64le (mockbuild.fedoraproject.org) (gcc version 7.0.1 20170225 (Red Hat 7.0.1-0.10) (GCC) ) #1 SMP Mon Mar 6 18:25:13 UTC 2017 Detected machine type: 0000000000000101 Max number of cores passed to firmware: 128 (NR_CPUS = 1024) Calling ibm,client-architecture-support... done command line: BOOT_IMAGE=/ppc/ppc64/vmlinuz ro memory layout at init: memory_limit : 0000000000000000 (16 MB aligned) alloc_bottom : 000000000ec80000 alloc_top : 0000000010000000 alloc_top_hi : 0000000010000000 rmo_top : 0000000010000000 ram_top : 0000000010000000 instantiating rtas at 0x000000000eca0000... done prom_hold_cpus: skipped copying OF device tree... Building dt strings... Building dt structure... No memory for flatten_device_tree (no room) EXIT called ok 0 >
(In reply to David Gibson from comment #9) > Do you know if the same kernel boots under PowerVM or on bare metal? > > In other words, I'm trying to isolate if the problem is in the guest kernel > or in qemu. comment #7 reports successful boot using OPAL firmware platform : PowerNV model : 8247-21L machine : PowerNV 8247-21L firmware : OPAL So isn't the kernel image too big (again)?
(In reply to Richard W.M. Jones from comment #0) > As described in the summary, the current Rawhide kernel does not > boot on qemu-system-ppc64 on either ppc64 or ppc64le architectures. Do you use TCG, KVM PR or KVM HV?
Please see the log file attached which shows exactly how the kernel is being booted, including the full qemu command line. In brief - it's TCG. You can try to reproduce the problem trivially by doing something like: qemu-system-ppc64 -machine pseries-2.8,accel=tcg,usb=off,dump-guest-core=off -m 768 -kernel /boot/<name-of-vmlinuz> -append 'panic=1 console=hvc0 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=vt100' As can also be seen if you look at the log file, it hung after printing: Booting Linux via __start() @ 0x0000000000400000 ...
For information, Now with last rawhide compose 20170310 with kernel 4.11.0-0.rc1.git3.1.fc27.ppc64le the boot completed without hang. While still failing on F26 compose 20170311 with kernel 4.11.0-0.rc1.git0.1.fc26.ppc64le So I assume we need to update the kernel for f26 branch in git tree http://pkgs.fedoraproject.org/cgit/rpms/kernel.git/
Is there any commit between 4.11.0-0.rc1.git0 and 4.11.0-0.rc1.git3 that would explain the change? Or any other idea from anyone?
(In reply to Dan Horák from comment #16) > Is there any commit between 4.11.0-0.rc1.git0 and 4.11.0-0.rc1.git3 that > would explain the change? Or any other idea from anyone? This is the rc1 -> rc2 changelog: https://lwn.net/Articles/716899/https://lwn.net/Articles/716899/ The one that standard out is: powerpc/64: Avoid panic during boot due to divide by zero in init_cache_info()
But there's also some mm work, some bugs around OPAL, and some other bits around page tables and some other early boot stuff looking through that changelog
bcm283x-firmware-20170314-2.509beaa.fc26 kernel-4.11.0-0.rc2.git0.1.fc26 linux-firmware-20170313-72.git695f2d6d.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-51a9cd79a0
bcm283x-firmware-20170314-2.509beaa.fc26, kernel-4.11.0-0.rc2.git0.1.fc26, linux-firmware-20170313-72.git695f2d6d.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-51a9cd79a0
so testing this on a F-25 userspace (just updating the kernel) I get the following panic (only ppc64le tested): OF stdout device is: /vdevice/vty@30000000 Preparing to boot Linux version 4.11.0-0.rc2.git0.1.fc26.ppc64le (mockbuild.fedoraproject.org) (gcc version 7.0.1 20170225 (Red Hat 7.0.1-0.10) (GCC) ) #1 SMP Mon Mar 13 16:51:13 UTC 2017 Detected machine type: 0000000000000101 command line: BOOT_IMAGE=/vmlinuz-4.11.0-0.rc2.git0.1.fc26.ppc64le root=UUID=9ffac9f6-7a7c-4034-8431-09439b815c3d ro net.ifnames=0 rhgb quiet console=ttyS0 LANG=en_US.UTF-8 Max number of cores passed to firmware: 1024 (NR_CPUS = 1024) Calling ibm,client-architecture-support... done memory layout at init: memory_limit : 0000000000000000 (16 MB aligned) alloc_bottom : 0000000004270000 alloc_top : 0000000030000000 alloc_top_hi : 0000000280000000 rmo_top : 0000000030000000 ram_top : 0000000280000000 instantiating rtas at 0x000000002fff0000... done prom_hold_cpus: skipped copying OF device tree... Building dt strings... Building dt structure... Device tree strings 0x0000000004280000 -> 0x0000000004280a9e Device tree struct 0x0000000004290000 -> 0x00000000042a0000 Quiescing Open Firmware ... Booting Linux via __start() @ 0x0000000002000000 ... -> smp_release_cpus() spinning_secondaries = 3 <- smp_release_cpus() Linux ppc64le #1 SMP Mon Mar 1[ 0.465500] Warning: unable to open an initial console. [ 0.967975] Unable to handle kernel paging request for data at address 0x2d326f6974726976 [ 0.969153] Faulting instruction address: 0xc00000000033e98c [ 0.969638] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.970076] SMP NR_CPUS=1024 [ 0.970077] NUMA [ 0.970322] pSeries [ 0.970636] Modules linked in: virtio_console virtio_blk virtio_pci virtio_ring virtio [ 0.971308] CPU: 1 PID: 227 Comm: systemd Not tainted 4.11.0-0.rc2.git0.1.fc26.ppc64le #1 [ 0.971937] task: c0000000034fb500 task.stack: c000000003460000 [ 0.972388] NIP: c00000000033e98c LR: c00000000033ea04 CTR: c00000000051c510 [ 0.972933] REGS: c0000000034639f0 TRAP: 0380 Not tainted (4.11.0-0.rc2.git0.1.fc26.ppc64le) [ 0.973596] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> [ 0.973601] CR: 22022884 XER: 20000000 [ 0.974312] CFAR: c00000000033e908 SOFTE: 1 [ 0.974312] GPR00: c00000000033ea04 c000000003463c70 c00000000134fa00 0000000000000000 [ 0.974312] GPR04: 00000000017000c0 000000000000001a 0000000000000001 c00000027fa63d00 [ 0.974312] GPR08: 000000027eb40000 0000000000000000 0000000000000000 0000000000000c00 [ 0.974312] GPR12: 0000000000002200 c00000000fdc0900 00003fffb3e97360 c0000000034dda80 [ 0.974312] GPR16: 00003fffb3e8d560 00003fffb3e973c8 00003fffb3e8cf60 00003fffb3e96f98 [ 0.974312] GPR20: 00003fffb3e96b98 c0000000034fb500 0000000000000000 0000000000000000 [ 0.974312] GPR24: ffffffffffffffff 0000000000000000 c0000000000e9bbc c00000027e01e880 [ 0.974312] GPR28: ffffffffffffffff 00000000017000c0 2d326f6974726976 c00000027e01e880 [ 0.979553] NIP [c00000000033e98c] kmem_cache_alloc_node+0x13c/0x330 [ 0.980037] LR [c00000000033ea04] kmem_cache_alloc_node+0x1b4/0x330 [ 0.980514] Call Trace: [ 0.980701] [c000000003463c70] [c00000000033ea04] kmem_cache_alloc_node+0x1b4/0x330 (unreliable) [ 0.981371] [c000000003463cd0] [c0000000000e9bbc] copy_process.isra.5.part.6+0x18c/0x19f0 [ 0.981993] [c000000003463db0] [c0000000000eb63c] _do_fork+0xec/0x490 [ 0.982486] [c000000003463e30] [c00000000000bb88] ppc_clone+0x8/0xc [ 0.982972] Instruction dump: [ 0.983199] 7c0803a6 eb61ffd8 eb81ffe0 7d908120 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 [ 0.983793] 60420000 e95f0022 e93f0000 79290720 <7f1e502a> 0b090000 0b190000 39200000 [ 0.984400] ---[ end trace 89b319055f8dd539 ]--- [ 0.984752] [ 1.000829] Unable to handle kernel paging request for data at address 0x2d326f6974726976 [ 1.001516] Faulting instruction address: 0xc00000000033e98c [ 1.001972] Oops: Kernel access of bad area, sig: 11 [#2] [ 1.002407] SMP NR_CPUS=1024 [ 1.002408] NUMA [ 1.002638] pSeries [ 1.002975] Modules linked in: virtio_console virtio_blk virtio_pci virtio_ring virtio [ 1.003637] CPU: 1 PID: 1 Comm: systemd Tainted: G D 4.11.0-0.rc2.git0.1.fc26.ppc64le #1 [ 1.004354] task: c00000027bceda00 task.stack: c00000027e108000 [ 1.004814] NIP: c00000000033e98c LR: c00000000033ea04 CTR: c00000000051c510 [ 1.005363] REGS: c00000027e10b9f0 TRAP: 0380 Tainted: G D (4.11.0-0.rc2.git0.1.fc26.ppc64le) [ 1.006132] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> [ 1.006135] CR: 22022884 XER: 20000000 [ 1.006860] CFAR: c00000000033e908 SOFTE: 1 [ 1.006860] GPR00: c00000000033ea04 c00000027e10bc70 c00000000134fa00 0000000000000000 [ 1.006860] GPR04: 00000000017000c0 000000000000001c 0000000000000001 c00000027fa63d00 [ 1.006860] GPR08: 000000027eb40000 0000000000000000 0000000000000000 0000000000000c01 [ 1.006860] GPR12: 0000000000002200 c00000000fdc0900 0000000000000269 c0000000034f9080 [ 1.006860] GPR16: 000000003ea38d20 000000003ea189d0 00000100381a0cd0 0000000000000000 [ 1.006860] GPR20: 000000003ea34e38 c00000027bceda00 0000000000000000 0000000000000000 [ 1.006860] GPR24: ffffffffffffffff 0000000000000000 c0000000000e9bbc c00000027e01e880 [ 1.006860] GPR28: ffffffffffffffff 00000000017000c0 2d326f6974726976 c00000027e01e880 [ 1.012187] NIP [c00000000033e98c] kmem_cache_alloc_node+0x13c/0x330 [ 1.012683] LR [c00000000033ea04] kmem_cache_alloc_node+0x1b4/0x330 [ 1.013175] Call Trace: [ 1.013366] [c00000027e10bc70] [c00000000033ea04] kmem_cache_alloc_node+0x1b4/0x330 (unreliable) [ 1.014049] [c00000027e10bcd0] [c0000000000e9bbc] copy_process.isra.5.part.6+0x18c/0x19f0 [ 1.014684] [c00000027e10bdb0] [c0000000000eb63c] _do_fork+0xec/0x490 [ 1.015186] [c00000027e10be30] [c00000000000bb88] ppc_clone+0x8/0xc [ 1.015672] Instruction dump: [ 1.015903] 7c0803a6 eb61ffd8 eb81ffe0 7d908120 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 [ 1.016507] 60420000 e95f0022 e93f0000 79290720 <7f1e502a> 0b090000 0b190000 39200000 [ 1.017124] ---[ end trace 89b319055f8dd53a ]--- [ 1.017482] [ 3.019479] systemd: 18 output lines suppressed due to ratelimiting [ 3.019988] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 3.019988] [ 3.020842] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 3.020842]
Note that Fedora ppc64le Rawhide of 20170314 is failing also to boot. It uses kernel kernel-4.11.0-0.rc2.git0.2 And I confirm comment 15, Fedora ppc64le Rawhide of 20170313 boot ok with kernel-4.11.0-0.rc1.git3.1
Please try kernel-4.11.0-0.rc2.git1.1 or greater. There was a fix https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce70df089143c49385b4f32f39d41fb50fbf6a7c that came in right after -rc2 which would affect powerpc
(In reply to Laura Abbott from comment #23) > Please try kernel-4.11.0-0.rc2.git1.1 or greater. There was a fix > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=ce70df089143c49385b4f32f39d41fb50fbf6a7c that came in right after -rc2 > which would affect powerpc 4.11.0-0.rc2.git1.1.fc27.ppc64le boots for me on the above mentioned machine :)
bcm283x-firmware-20170314-2.509beaa.fc26 kernel-4.11.0-0.rc2.git2.2.fc26 linux-firmware-20170313-72.git695f2d6d.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-51a9cd79a0
bcm283x-firmware-20170314-2.509beaa.fc26, kernel-4.11.0-0.rc2.git2.2.fc26, linux-firmware-20170313-72.git695f2d6d.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-51a9cd79a0
pbrobinson: does rc2.git2.2 boot for you? that's scheduled to be in Alpha currently.
I updated a f25 with f26 packages, installed kernel rc2.git2.2 and reboot was ok on ppc64le.
bcm283x-firmware-20170314-2.509beaa.fc26, kernel-4.11.0-0.rc2.git2.2.fc26, linux-firmware-20170313-72.git695f2d6d.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.
Think we can close this.