Bug 1676475

Summary: Nested kvm/qemu virtualization is not working on Power8 LPARs
Product: [Fedora] Fedora Reporter: Jakub Čajka <jcajka>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 29CC: airlied, bskeggs, bugproxy, dan, hannsj_uhl, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, mchehab, mjg59, normand, skumari, steved, surajjs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-17 20:01:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1071880    

Description Jakub Čajka 2019-02-12 12:06:54 UTC
1. Please describe the problem:
   Nested VM doesn't boot on POWER8 LPAR systems.

2. What is the Version-Release number of the kernel:
   4.20.6-200.fc29.ppc64le

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
   I haven't been able to dive in to bisecting when this started to happen, question is even if it ever worked(can be supported).

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
   Run nested virt VM on Power8 LPAR(first level of virtualization is working)

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
   I haven't been able to grab the rawhide images.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
   No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.
   Snipped from log of the booting VM where it gets stuck.
...
Booting from memory...
OF stdout device is: /vdevice/vty@71000000
Preparing to boot Linux version 4.20.6-200.fc29.ppc64le (mockbuild.fedoraproject.org) (gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC)) #1 SMP Thu Jan 31 15:31:01 UTC 2019
Detected machine type: 0000000000000101
command line: panic=1 console=hvc0 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm
Max number of cores passed to firmware: 1024 (NR_CPUS = 1024)
Calling ibm,client-architecture-support... done
memory layout at init:
  memory_limit : 0000000000000000 (16 MB aligned)
  alloc_bottom : 0000000001cc0000
  alloc_top    : 0000000030000000
  alloc_top_hi : 0000000040000000
  rmo_top      : 0000000030000000
  ram_top      : 0000000040000000
instantiating rtas at 0x000000002fff0000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000001ed0000 -> 0x0000000001ed0aa6
Device tree struct  0x0000000001ee0000 -> 0x0000000001ef0000
Quiescing Open Firmware ...
Booting Linux via __start() @ 0x0000000000400000 ...

Investigating the qemu process with gdb it seems to be busy looping/polling:
#0  0x00007fffb5194364 in __GI_ppoll (fds=0x202, nfds=6, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x00000001235d982c in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at /usr/src/debug/qemu-2.12.0-3.fc29.ppc64le/util/qemu-timer.c:322
#3  0x00000001235daf94 in os_host_main_loop_wait (timeout=<optimized out>) at /usr/src/debug/qemu-2.12.0-3.fc29.ppc64le/util/main-loop.c:258
#4  main_loop_wait (nonblocking=<optimized out>) at /usr/src/debug/qemu-2.12.0-3.fc29.ppc64le/util/main-loop.c:522
#5  0x0000000123019a40 in main_loop () at /usr/src/debug/qemu-2.12.0-3.fc29.ppc64le/vl.c:1943
#6  main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /usr/src/debug/qemu-2.12.0-3.fc29.ppc64le/vl.c:4734

For the record originally mentioned/reported as part of BZ#1668751

Comment 1 Dan Horák 2019-02-12 12:28:16 UTC
For the record the nested virt doesn't work on bare-metal P8 too. Host is Fedora 29, first level guest is Fedora 28 (with qemu 2.11) and second level guest can't be started with the same symptoms Jakub mentions.

Comment 2 Michel Normand 2019-02-12 12:53:46 UTC
FYI: nested virt doesn't work either on bare-metal P8 with f29, and first level guest f29 (with qemu 3.0) and second level f29 guest install failed at initial kernel start with infinit loop of faulting instruction.
(the f29 in Host and first guest are the same and updated with last updates)
===
[    0.032382] kernel tried to execute exec-protected page (c0000000015d0a44) -exploit attempt? (uid: 0)
[    0.033323] Unable to handle kernel paging request for instruction fetch
[    0.033993] Faulting instruction address: 0xc0000000015d0a44
[    0.034677] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.035213] LE SMP NR_CPUS=1024 NUMA pSeries
[    0.035749] Modules linked in:
[    0.036154] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.16-300.fc29.ppc64le #1
[    0.036957] NIP:  c0000000015d0a44 LR: c00000000000acec CTR: 0000000000000000
[    0.037756] REGS: c00000003fffba90 TRAP: 0400   Not tainted  (4.18.16-300.fc29.ppc64le)
[    0.038558] MSR:  a000000010001033 <SF,ME,IR,DR,RI,LE>  CR: 28000848  XER: 00000000
[    0.039361] CFAR: 0000000000000000 IRQMASK: 1 
[    0.039361] GPR00: 0000000000000000 c00000003fffbd10 c000000001530d00 c00000003fffbd80 
[    0.039361] GPR04: c00000000006ae48 0000000048000000 0000000000000009 000000004aa98684 
[    0.039361] GPR08: 000000007d210164 0000000000000000 0000000000000002 0000000000000900 
[    0.039361] GPR12: a000000002009033 c000000001810000 c00000000006eefc 0000000049567908 
[    0.039361] GPR16: 0000000000000078 c0000000015cf510 c000000000e24df0 000000007c1b03a6 
[    0.039361] GPR20: 000000007c1ffaa6 c0000000015d274c c0000000013e5548 000000007c1303a6 
[    0.039361] GPR24: 000000007c1643a6 000000007c1a03a6 c0000000015cf508 ffffffffebc0f008 
[    0.039361] GPR28: ffffffffebc0f000 c00000000006b604 c00000000006b600 0000000003e00000 
[    0.040132] kernel tried to execute exec-protected page (c0000000015d0a44) -exploit attempt? (uid: 0)
[    0.045886] NIP [c0000000015d0a44] kvm_tmp+0x1534/0x100000
[    0.045888] LR [c00000000000acec] soft_nmi_common+0xcc/0xd0
[    0.045889] Call Trace:
[    0.045889] Instruction dump:
[    0.045891] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    0.045893] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    0.045896] ---[ end trace 985ba14ce79fd515 ]---
[    0.045896] 
[    0.046628] Unable to handle kernel paging request for instruction fetch
[    0.046727] Faulting instruction address: 0xc0000000015d0a44
[    0.046839] Oops: Kernel access of bad area, sig: 11 [#2]
===
Signature similar but not identical as older bug#1652845 virt-install.log https://bugzilla.redhat.com/attachment.cgi?id=1508237

Comment 3 Sinny Kumari 2019-02-12 14:39:55 UTC
Fedora Atomic Host with kernel 4.20+ boots fine for me as well on P9 nested virt with host having qemu-3.1, libvirt-5.0.0 and kernel-4.20.6

Comment 4 Suraj 2019-02-21 05:16:52 UTC
If you boot the KVM-PR guest with the following added to the qemu command line:
-machine pseries,cap-hpt-max-page-size=16777216
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Does it make any difference?

Comment 5 Sinny Kumari 2019-02-21 06:18:07 UTC
(In reply to Suraj from comment #4)
> If you boot the KVM-PR guest with the following added to the qemu command
> line:
> -machine pseries,cap-hpt-max-page-size=16777216
>                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Hi Suraj, I am using virt-install command instead of using qemu directly. Can you please provide an equivalent option to be used with virt-install ?


> Does it make any difference?

Comment 6 Dan Horák 2019-02-21 09:31:59 UTC
Sinny, I think you can follow https://libvirt.org/formatdomain.html#elementsFeatures

Comment 7 Sinny Kumari 2019-02-26 15:06:27 UTC
Edited nested guest vm with following:
  <features>
    <hpt resizing='required'>
      <maxpagesize unit='KiB'>16384</maxpagesize>
    </hpt>
  </features>

It gives following error:
2019-02-26T15:04:52.899630Z qemu-system-ppc64: KVM doesn't support page shift 24/12

Comment 8 Sinny Kumari 2019-02-26 15:07:54 UTC
Forgot to mention guest host detail:
F29 with kernel-4.19.4-300.fc29.ppc64le, libvirt-client-4.9.0-1.fc29.ppc64le and qemu-3.0.0-3.fc29.ppc64le

Comment 9 Laura Abbott 2019-04-09 20:43:57 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 29 kernel bugs.
 
Fedora XX has now been rebased to 5.0.6  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 30, and are still experiencing this issue, please change the version to Fedora 30.
 
If you experience different issues, please open a new bug report for those.

Comment 10 Justin M. Forbes 2019-09-17 20:01:46 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 3 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.