Bug 1733118 - rawhide 20190724 install failed with Exception in kernel mode
Summary: rawhide 20190724 install failed with Exception in kernel mode
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: ppc64le
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: PPCTracker
TreeView+ depends on / blocked
 
Reported: 2019-07-25 09:07 UTC by Michel Normand
Modified: 2019-07-26 07:51 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)
rawhide_20190724_install_failure_console.log (1.21 MB, text/plain)
2019-07-25 09:07 UTC, Michel Normand
no flags Details
virsh_dumpxml_rawhide.xml (2.46 KB, text/plain)
2019-07-25 12:19 UTC, Michel Normand
no flags Details

Description Michel Normand 2019-07-25 09:07:57 UTC
Created attachment 1593346 [details]
rawhide_20190724_install_failure_console.log

rawhide 20190724 install failed with Exception in kernel mode as extracted below:


```
         Starting Initialize the iW…nd/RDMA stack in the kernel...
[   34.411966] Oops: Exception in kernel mode, sig: 5 [#1] 
...
[   34.413097] CPU: 5 PID: 639 Comm: kworker/5:2 Not tainted 5.3.0-0.rc1.git1.1.fc31.ppc64le #1
[   34.413286] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
...
[   34.416015] Call Trace:
[   34.416066] [c0000000febdfaa0] [c0000000007e03fc] __list_del_entry_valid+0x8c/0x100 (unreliable)
[   34.416292] [c0000000febdfb00] [c0000000004d5698] balloon_page_enqueue_one+0x78/0x1a0   
[   34.416461] [c0000000febdfb50] [c0000000004d5920] balloon_page_enqueue+0x50/0x80   
[   34.416631] [c0000000febdfb90] [c008000001411da8] update_balloon_size_func+0x1d0/0x400 [virtio_balloon]
[   34.416829] [c0000000febdfc50] [c000000000168b0c] process_one_work+0x30c/0x7f0
[   34.416995] [c0000000febdfd20] [c000000000169078] worker_thread+0x88/0x500
[   34.417124] [c0000000febdfdb0] [c000000000174964] kthread+0x164/0x1b0
[   34.417258] [c0000000febdfe20] [c00000000000c0cc] ret_from_kernel_thread+0x5c/0x70 
[   34.417413] Instruction dump:
[   34.417487] 40820034 38600001 38210060 4e800020 7c0802a6 7c641b78 3c62ff59 7d455378 
[   34.417657] 38637c10 f8010070 4ba14ae5 60000000 <0fe00000> 7c0802a6 3c62ff59 38637cc0   
[   34.417818] ---[ end trace 1527c4a6204440df ]---
...
[   34.425518] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:38
...
[   93.363101] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 58s!  
...
[ 4804.403111] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 4769s!
...
[ 4841.203116] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 4806s!
```

1. Please describe the problem:
* retrieve the last rawhide ppc64le dvd iso Fedora-Server-dvd-ppc64le-Rawhide-20190724.n.0.iso and use it for install of a qemu guest.
* as per attached console log, kernel reports Exception and workqueue lockup.

2. What is the Version-Release number of the kernel:
* in same log the reported kernel version is 5.3.0-0.rc1.git1.1.fc31.ppc64le 

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
Last working iso was 20190717 with kernel 5.3.0-0.rc0.git4.1.fc31.ppc64le

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
* as said above a simple install with above iso of ppc64le guest on a ppc64le host.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
* Yes

6. Are you running any modules that not shipped with directly Fedora's kernel?:
* No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Michel Normand 2019-07-25 09:41:41 UTC
Wrong information in above point 3.

iso 20190717 with kernel 5.3.0-0.rc0.git4.1.fc31.ppc64le also failed similarly.

iso 20190707 with kernel 5.2.0-0.rc7.git1.1.fc31.ppc64le do not have such problem.

I will do trials with other composes between the two.

Comment 2 Michel Normand 2019-07-25 12:19:32 UTC
Created attachment 1593387 [details]
virsh_dumpxml_rawhide.xml

To complete point 4.0 of initial Description, the guest qemu is created via virtmgr.

attached xml is the virtmgr xml configuration used to create my qemu guest.
This is generating qemu command line:
```
/usr/bin/qemu-system-ppc64 -name guest=rawhide,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/home/normand/.config/libvirt/qemu/lib/domain-1-rawhide/master-key.aes -machine pseries-2.2,accel=kvm,usb=off,dump-guest-core=off -m 32768 -realtime mlock=off -smp 8,sockets=1,cores=2,threads=4 -uuid 32591c31-1a49-4d44-a492-7517ec8049d4 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/home/normand/.config/libvirt/qemu/lib/domain-1-rawhide/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x1 -device spapr-vscsi,id=scsi0,reg=0x2000 -drive file=/home/normand/images/rawhide.disk1.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 -drive file=http://sf1.test.toulouse-stg.fr.ibm.com:80/pub/linux/fedora/rawhide/ppc64le/iso/latest,format=raw,if=none,id=drive-scsi0-0-0-4,readonly=on -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=4,drive=drive-scsi0-0-0-4,id=scsi0-0-0-4,bootindex=1 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e0:1c:a4,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30001000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg timestamp=on
```

Comment 3 Michel Normand 2019-07-25 13:06:03 UTC
(In reply to Michel Normand from comment #1)
> ...
> iso 20190717 with kernel 5.3.0-0.rc0.git4.1.fc31.ppc64le also failed
> similarly.
> 
> iso 20190707 with kernel 5.2.0-0.rc7.git1.1.fc31.ppc64le do not have such
> problem.
> 
> I will do trials with other composes between the two.

after some trial I identified two successive compose with related kernel versions:

 compose 20170712: passed   Kernel 5.3.0-0.rc0.git3.1.fc31.ppc64le
 compose 20170713: failure  Kernel 5.3.0-0.rc0.git4.1.fc31.ppc64le

Comment 4 Laura Abbott 2019-07-25 14:22:23 UTC
Looks like something that should be fixed by https://lore.kernel.org/kvm/1563442040-13510-1-git-send-email-wei.w.wang@intel.com/, not sure if this is in tree yet.

Comment 5 Michel Normand 2019-07-26 07:51:26 UTC
(In reply to Laura Abbott from comment #4)
> Looks like something that should be fixed by
> https://lore.kernel.org/kvm/1563442040-13510-1-git-send-email-wei.w.
> wang@intel.com/, not sure if this is in tree yet.

Thank you,
I will be interested to know when will be available in fedora.


Note You need to log in before you can comment on or make changes to this bug.