Created attachment 1593346 [details] rawhide_20190724_install_failure_console.log rawhide 20190724 install failed with Exception in kernel mode as extracted below: ``` Starting Initialize the iW…nd/RDMA stack in the kernel... [ 34.411966] Oops: Exception in kernel mode, sig: 5 [#1] ... [ 34.413097] CPU: 5 PID: 639 Comm: kworker/5:2 Not tainted 5.3.0-0.rc1.git1.1.fc31.ppc64le #1 [ 34.413286] Workqueue: events_freezable update_balloon_size_func [virtio_balloon] ... [ 34.416015] Call Trace: [ 34.416066] [c0000000febdfaa0] [c0000000007e03fc] __list_del_entry_valid+0x8c/0x100 (unreliable) [ 34.416292] [c0000000febdfb00] [c0000000004d5698] balloon_page_enqueue_one+0x78/0x1a0 [ 34.416461] [c0000000febdfb50] [c0000000004d5920] balloon_page_enqueue+0x50/0x80 [ 34.416631] [c0000000febdfb90] [c008000001411da8] update_balloon_size_func+0x1d0/0x400 [virtio_balloon] [ 34.416829] [c0000000febdfc50] [c000000000168b0c] process_one_work+0x30c/0x7f0 [ 34.416995] [c0000000febdfd20] [c000000000169078] worker_thread+0x88/0x500 [ 34.417124] [c0000000febdfdb0] [c000000000174964] kthread+0x164/0x1b0 [ 34.417258] [c0000000febdfe20] [c00000000000c0cc] ret_from_kernel_thread+0x5c/0x70 [ 34.417413] Instruction dump: [ 34.417487] 40820034 38600001 38210060 4e800020 7c0802a6 7c641b78 3c62ff59 7d455378 [ 34.417657] 38637c10 f8010070 4ba14ae5 60000000 <0fe00000> 7c0802a6 3c62ff59 38637cc0 [ 34.417818] ---[ end trace 1527c4a6204440df ]--- ... [ 34.425518] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:38 ... [ 93.363101] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 58s! ... [ 4804.403111] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 4769s! ... [ 4841.203116] BUG: workqueue lockup - pool cpus=5 node=0 flags=0x0 nice=0 stuck for 4806s! ``` 1. Please describe the problem: * retrieve the last rawhide ppc64le dvd iso Fedora-Server-dvd-ppc64le-Rawhide-20190724.n.0.iso and use it for install of a qemu guest. * as per attached console log, kernel reports Exception and workqueue lockup. 2. What is the Version-Release number of the kernel: * in same log the reported kernel version is 5.3.0-0.rc1.git1.1.fc31.ppc64le 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Last working iso was 20190717 with kernel 5.3.0-0.rc0.git4.1.fc31.ppc64le 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: * as said above a simple install with above iso of ppc64le guest on a ppc64le host. 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: * Yes 6. Are you running any modules that not shipped with directly Fedora's kernel?: * No 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag.
Wrong information in above point 3. iso 20190717 with kernel 5.3.0-0.rc0.git4.1.fc31.ppc64le also failed similarly. iso 20190707 with kernel 5.2.0-0.rc7.git1.1.fc31.ppc64le do not have such problem. I will do trials with other composes between the two.
Created attachment 1593387 [details] virsh_dumpxml_rawhide.xml To complete point 4.0 of initial Description, the guest qemu is created via virtmgr. attached xml is the virtmgr xml configuration used to create my qemu guest. This is generating qemu command line: ``` /usr/bin/qemu-system-ppc64 -name guest=rawhide,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/home/normand/.config/libvirt/qemu/lib/domain-1-rawhide/master-key.aes -machine pseries-2.2,accel=kvm,usb=off,dump-guest-core=off -m 32768 -realtime mlock=off -smp 8,sockets=1,cores=2,threads=4 -uuid 32591c31-1a49-4d44-a492-7517ec8049d4 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/home/normand/.config/libvirt/qemu/lib/domain-1-rawhide/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device qemu-xhci,id=usb,bus=pci.0,addr=0x1 -device spapr-vscsi,id=scsi0,reg=0x2000 -drive file=/home/normand/images/rawhide.disk1.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 -drive file=http://sf1.test.toulouse-stg.fr.ibm.com:80/pub/linux/fedora/rawhide/ppc64le/iso/latest,format=raw,if=none,id=drive-scsi0-0-0-4,readonly=on -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=4,drive=drive-scsi0-0-0-4,id=scsi0-0-0-4,bootindex=1 -netdev tap,fd=29,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e0:1c:a4,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x30001000 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg timestamp=on ```
(In reply to Michel Normand from comment #1) > ... > iso 20190717 with kernel 5.3.0-0.rc0.git4.1.fc31.ppc64le also failed > similarly. > > iso 20190707 with kernel 5.2.0-0.rc7.git1.1.fc31.ppc64le do not have such > problem. > > I will do trials with other composes between the two. after some trial I identified two successive compose with related kernel versions: compose 20170712: passed Kernel 5.3.0-0.rc0.git3.1.fc31.ppc64le compose 20170713: failure Kernel 5.3.0-0.rc0.git4.1.fc31.ppc64le
Looks like something that should be fixed by https://lore.kernel.org/kvm/1563442040-13510-1-git-send-email-wei.w.wang@intel.com/, not sure if this is in tree yet.
(In reply to Laura Abbott from comment #4) > Looks like something that should be fixed by > https://lore.kernel.org/kvm/1563442040-13510-1-git-send-email-wei.w. > wang/, not sure if this is in tree yet. Thank you, I will be interested to know when will be available in fedora.