Bug 1747110
| Summary: | llvmpipe-4[4678]: unhandled signal 11 at 00007fffffff0000 nip 00007fff942b000c lr 00007fff868aca24 code 1 hit coredump when boot guest | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Zhenyu Zhang <zhenyzha> | ||||
| Component: | mesa | Assignee: | Ben Crocker <bcrocker> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Desktop QE <desktop-qa-list> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 8.1 | CC: | bcrocker, coli, csoriano, dgibson, jkoten, juzhang, knoel, lvivier, mdeng, micai, ngu, qzhang, tpelka, virt-maint, xuma, xuwei, yhong | ||||
| Target Milestone: | rc | Keywords: | Reopened | ||||
| Target Release: | 8.2 | Flags: | knoel:
mirror+
|
||||
| Hardware: | ppc64le | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-04-28 15:41:41 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Additional info second scene:
I/O errors and trigger Core Dump when boot guest with a qcow2/raw image doing mkfs.xfs on POWER8
but echo $? shown as 0,and no I/O errors messages.
1.boot guest:
/usr/libexec/qemu-kvm \
-name 'zhenyzha' \
-machine pseries \
-device VGA,bus=pci.0,addr=0x2 \
-nodefaults \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control \
-chardev socket,nowait,path=/var/tmp/serial-serial0,id=chardev_serial0,server \
-device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \
-device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
-drive id=drive_image1,format=qcow2,if=none,aio=native,media=disk,cache=none,werror=stop,rerror=stop,file=rhel810-ppc64le-virtio.qcow2 \
-device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pci.0,addr=0x4 \
-device virtio-net-pci,mac=8a:43:4c:8b:de:21,id=id46J4ui,netdev=idizNzdA,bus=pci.0,addr=0x5 \
-netdev tap,id=idizNzdA,vhost=on \
-m 20G \
-vnc :30 \
-smp 64,maxcpus=64,cores=32,threads=1,sockets=2 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-rtc base=utc,clock=host \
-boot order=cdn,once=c,menu=off,strict=off \
-enable-kvm \
-monitor stdio \
-drive id=my2,format=raw,media=disk,if=none,file=storage0.raw \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=my2,id=virtio0-0-2,bootindex=1 \
2.Try to mkfs.xfs the data disk
[root@dhcp70-199 ~]# ll /dev/vd*
ll /dev/vd*
brw-rw----. 1 root disk 252, 0 Aug 29 22:57 /dev/vda
brw-rw----. 1 root disk 252, 1 Aug 29 22:57 /dev/vda1
brw-rw----. 1 root disk 252, 2 Aug 29 22:57 /dev/vda2
brw-rw----. 1 root disk 252, 3 Aug 29 22:57 /dev/vda3
brw-rw----. 1 root disk 252, 16 Aug 29 22:57 /dev/vdb
[root@dhcp70-199 ~]# mkfs.xfs /dev/vdb
mkfs.xfs /dev/vdb
meta-data=/dev/vdb isize=512 agcount=4, agsize=1310720 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=5242880, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@dhcp70-199 ~]# echo $?
0 ------------------------------------------------------------------------------------------------display 0
[root@dhcp70-199 ~]# dmesg | grep vdb
dmesg | grep vdb
[ 2.320417] virtio_blk virtio2: [vdb] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB)
[ 77.148481] print_req_error: operation not supported error, dev vdb, sector 41943034 flags 3 -----no print_req_error: I/O error
3.check /var/log/messages
Aug 29 23:07:04 dhcp70-199 kernel: print_req_error: operation not supported error, dev vdb, sector 41943034 flags 3
Aug 29 23:10:37 dhcp70-199 systemd[1]: Starting system activity accounting tool...
Aug 29 23:10:37 dhcp70-199 systemd[1]: Started system activity accounting tool.
Aug 29 23:11:02 dhcp70-199 kernel: llvmpipe-8[8793]: unhandled signal 11 at 00007fffef620000 nip 00007fff7c090020 lr 00007fff75ecb774 code 1
Aug 29 23:11:02 dhcp70-199 kernel: llvmpipe-0[8785]: unhandled signal 11 at 00007fffef620000 nip 00007fff7c090020 lr 00007fff75ecb774 code 1
Aug 29 23:11:02 dhcp70-199 kernel: llvmpipe-3[8788]: unhandled signal 11 at 00007fffef620000 nip 00007fff7c090020 lr 00007fff75ecb774 code 1
Aug 29 23:11:02 dhcp70-199 kernel: llvmpipe-9[8794]: unhandled signal 11 at 00007fffef620000 nip 00007fff7c090020 lr 00007fff75ecb774 code 1
Aug 29 23:11:02 dhcp70-199 kernel: llvmpipe-2[8787]: unhandled signal 11 at 00007fffef620000 nip 00007fff7c090020 lr 00007fff75ecca24 code 1
Aug 29 23:11:02 dhcp70-199 kernel: llvmpipe-11[8796]: unhandled signal 11 at 00007fffef620000 nip 00007fff7c090020 lr 00007fff75ecb774 code 1
Aug 29 23:11:02 dhcp70-199 kernel: llvmpipe-5[8790]: unhandled signal 11 at 00007fffef620000 nip 00007fff7c090020 lr 00007fff75ecca24 code 1
Aug 29 23:11:02 dhcp70-199 kernel: llvmpipe-12[8797]: unhandled signal 11 at 00007fffef620000 nip 00007fff7c090020 lr 00007fff75ecb774 code 1
Aug 29 23:11:02 dhcp70-199 systemd[1]: Created slice system-systemd\x2dcoredump.slice.
Aug 29 23:11:02 dhcp70-199 systemd[1]: Started Process Core Dump (PID 9824/UID 0). -------------------------------------------------------------Core Dump
Aug 29 23:11:03 dhcp70-199 journal[9146]: Error reading events from display: Broken pipe
Aug 29 23:11:03 dhcp70-199 gnome-session[8691]: gnome-session-binary[8691]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11
Aug 29 23:11:03 dhcp70-199 org.gnome.Shell.desktop[8769]: (EE) failed to read Wayland events: Broken pipe
Aug 29 23:11:03 dhcp70-199 gnome-session-binary[8691]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11
Aug 29 23:11:03 dhcp70-199 gnome-session-binary[8691]: Unrecoverable failure in required component org.gnome.Shell.desktop
Aug 29 23:11:03 dhcp70-199 journal[9176]: gsd-color: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Aug 29 23:11:03 dhcp70-199 journal[9143]: gsd-xsettings: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Aug 29 23:11:03 dhcp70-199 journal[9175]: gsd-clipboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Aug 29 23:11:03 dhcp70-199 journal[9124]: gsd-power: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Aug 29 23:11:03 dhcp70-199 journal[9184]: gsd-keyboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Aug 29 23:11:03 dhcp70-199 at-spi-bus-launcher[8843]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
Aug 29 23:11:03 dhcp70-199 at-spi-bus-launcher[8843]: after 21 requests (21 known processed) with 0 events remaining.
Aug 29 23:11:03 dhcp70-199 journal[9185]: gsd-media-keys: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Aug 29 23:11:03 dhcp70-199 systemd-logind[8382]: Session 1 logged out. Waiting for processes to exit.
Aug 29 23:11:03 dhcp70-199 systemd[1]: Started /run/user/42 mount wrapper.
Aug 29 23:11:03 dhcp70-199 systemd[1]: Created slice User Slice of UID 42.
Aug 29 23:11:03 dhcp70-199 systemd[1]: Starting User Manager for UID 42...
Aug 29 23:11:03 dhcp70-199 systemd-logind[8382]: New session c1 of user gdm.
Aug 29 23:11:03 dhcp70-199 systemd[1]: Started Session c1 of user gdm.
Aug 29 23:11:03 dhcp70-199 systemd-logind[8382]: Removed session 1.
Aug 29 23:11:03 dhcp70-199 systemd[9833]: Reached target Paths.
Additional info:
Tested the same steps on POWER9,hit operation not supported error but no Core Dump.
P9
[root@dhcp19-129-51 ~]# mkfs.xfs /dev/vdb
meta-data=/dev/vdb isize=512 agcount=4, agsize=1310720 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=5242880, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@dhcp19-129-51 ~]# echo $?
0
[root@dhcp19-129-51 ~]# dmesg | grep vdb
[ 1.941869] virtio_blk virtio2: [vdb] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB)
[ 92.624135] print_req_error: operation not supported error, dev vdb, sector 41943034 flags 3
check /var/log/messages
Aug 30 00:27:48 dhcp19-129-51 systemd[1]: Started Session 4 of user root.
Aug 30 00:28:12 dhcp19-129-51 kernel: print_req_error: operation not supported error, dev vdb, sector 41943034 flags 3
Aug 30 00:30:21 dhcp19-129-51 systemd[1]: Starting system activity accounting tool...
Aug 30 00:30:21 dhcp19-129-51 systemd[1]: Started system activity accounting tool.
Aug 30 00:37:21 dhcp19-129-51 systemd[1]: Starting dnf makecache...
Aug 30 00:37:22 dhcp19-129-51 dnf[3900]: Updating Subscription Management repositories.
Aug 30 00:37:22 dhcp19-129-51 dnf[3900]: Unable to read consumer identity
Aug 30 00:37:22 dhcp19-129-51 dnf[3900]: This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Aug 30 00:37:22 dhcp19-129-51 dnf[3900]: Metadata cache refreshed recently.
Aug 30 00:37:22 dhcp19-129-51 systemd[1]: Started dnf makecache.
Tested the same steps on x86_64,no hit this issue. host: # uname -r 4.18.0-135.el8.x86_64 [root@hp-dl380g9-02 ~]# /usr/libexec/qemu-kvm -version QEMU emulator version 4.1.0 (qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc) guest: # uname -r 4.18.0-138.el8.x86_64 I think this is very likely to have the same root cause as bug 1730315. They both seem to involve the disk somehow, and both end up with crashes in the gnome graphics pipeline. Unfortunately, how these two things can be related, I have no good ideas so far. Let me see what I can find. Is this a regression? Could you try with qemu 4.0 or 3.1? Ok. As noted, I'm pretty sure this is a dupe of bug 1730315, but I'm mostly working with the steps from comment 1 in this bug, since those are the simplest. I have reproduced the "operation not supported" error, however I suspect this is not actually related to the much more serious crashes in llvmpipe. I suspect this error is just because of the scsi=off option, although it is odd that it appears on the 4.1 qemu but not the 2.12 qemu. I haven't managed to reproduce the gnome crashes yet, which leaves me limited options for investigating myself. Here's a bunch of questions which I'm hoping will give me some more clues as to what's going on. 1) This bug only seems to happen on a POWER8 host, not a POWER9 host. Is that also true for bug 1730315 2) If I understand the description in comment 1 correctly, with that variant you are not hotplugging a drive, but you are starting the machine with two drives - one for the main disk and one scratch / data disk. Can you confirm that is correct? 3) Does either problem (error message or crash) happen with the qemu-2.12 based package? Does it happen with the qemu-3.1 based package from RHEL-AV-8.0? 4) When you trigger the problem, do you have a VNC client connected to the guest? If so have you logged into gnome via VNC? 5) Can you reproduce the problem if you use -smp 1 instead of -smp 64? (In reply to Laurent Vivier from comment #5) > Is this a regression? Could you try with qemu 4.0 or 3.1? OK,I will try, and then update the results. (In reply to David Gibson from comment #6) > Ok. As noted, I'm pretty sure this is a dupe of bug 1730315, but I'm mostly > working with the steps from comment 1 in this bug, since those are the > simplest. > > I have reproduced the "operation not supported" error, however I suspect > this is not actually related to the much more serious crashes in llvmpipe. > I suspect this error is just because of the scsi=off option, although it is > odd that it appears on the 4.1 qemu but not the 2.12 qemu. > > I haven't managed to reproduce the gnome crashes yet, which leaves me > limited options for investigating myself. > > Here's a bunch of questions which I'm hoping will give me some more clues as > to what's going on. > > 1) This bug only seems to happen on a POWER8 host, not a POWER9 host. Is > that also true for bug 1730315 > > 2) If I understand the description in comment 1 correctly, with that variant > you are not hotplugging a drive, but you are starting the machine with two > drives - one for the main disk and one scratch / data disk. Can you confirm > that is correct? > > 3) Does either problem (error message or crash) happen with the qemu-2.12 > based package? Does it happen with the qemu-3.1 based package from > RHEL-AV-8.0? > > 4) When you trigger the problem, do you have a VNC client connected to the > guest? If so have you logged into gnome via VNC? > > 5) Can you reproduce the problem if you use -smp 1 instead of -smp 64? OK,I will try, and then update the results. Today I tried to reproduce this issue on the other two P8-host with last version but not hit this issue.
qemu-2.12 (qemu-kvm-2.12.0-85.scrmod+el8.1.0+4090+e8e6ad83) no hit this issue
qemu-3.1 (qemu-kvm-3.1.0-30.module+el8.0.1+3755+6782b0ed) no hit this issue
qemu-4.0 (qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3) no hit this issue(But the display of "rint_req_error: operation not supported error" on qemu4.0 dmesg.)
qemu-4.1 (qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc) no hit this issue(But the display of "rint_req_error: operation not supported error" on qemu4.0 dmesg.)
like this:
[root@dhcp70-199 ~]# mkfs.xfs /dev/vda
meta-data=/dev/vda isize=512 agcount=4, agsize=1310720 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=5242880, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@dhcp70-199 ~]# echo $?
0
[root@dhcp70-199 ~]# dmesg | grep vda
[ 93.941071] virtio_blk virtio2: [vda] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB)
[ 134.964851] print_req_error: operation not supported error, dev vda, sector 41943034 flags 3
host:
4.18.0-141.el8.ppc64le
# /usr/libexec/qemu-kvm -version
QEMU emulator version 4.1.0 (qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc)
guest:
4.18.0-141.el8.ppc64le
Additional info:
host:
4.18.0-138.el8.ppc64le
# /usr/libexec/qemu-kvm -version
QEMU emulator version 4.1.0 (qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc)
guest:
4.18.0-135.el8.ppc64le
The same steps were tested on 4.18.0-138.el8.ppc64le, no hit this issue too.
The host that reproduces the problem before is "ibm-p8-garrison-03.rhts.eng.bos.redhat.com". Currently, the host status is: Broken
So suspect this problem is a hardware failure.
Interesting, that does indeed sound like it's likely to be a hardware problem. Let's keep this around for a little longer, but if it doesn't show up again, we can close as NOTABUG. I believe the print_req_error is an unrelated issue, and possibly expected behaviour. Can you check if the print_req_error message occurs with the same steps on an x86 host? Also does it occur if you put "scsi=on" on the second disk options? (In reply to David Gibson from comment #10) > Interesting, that does indeed sound like it's likely to be a hardware > problem. > > Let's keep this around for a little longer, but if it doesn't show up again, > we can close as NOTABUG. > > > I believe the print_req_error is an unrelated issue, and possibly expected > behaviour. Can you check if the print_req_error message occurs with the > same steps on an x86 host? Also does it occur if you put "scsi=on" on the > second disk options? This print_req_error message does‘t appear on x86 host: [root@vm-198-26 ~]# mkfs.xfs /dev/vda meta-data=/dev/vda isize=512 agcount=4, agsize=1310720 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=5242880, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 [root@vm-198-26 ~]# echo $? 0 [root@vm-198-26 ~]# dmesg | grep vda dmesg | grep vda [ 127.341564] virtio_blk virtio2: [vda] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB) [root@vm-198-26 ~]# And on ppc64le when set scsi=on, display: (qemu) drive_add auto id=drive_stg0,if=none,snapshot=off,cache=none,format=raw,file=storage0.raw (qemu) device_add driver=virtio-blk-pci,scsi=on,id=stg0,drive=drive_stg0,bus=pci.0 Error: Please set scsi=off for virtio-blk devices in order to use virtio 1.0 Found that the trigger condition must be on 4k disk, there is a bug here to track it.but not hit the Core Dump So I will continue to investigate, what is the reason for triggering Core Dump Bug 1738839 - I/O error when virtio-blk disk is backed by a raw image on 4k disk --Comment 27 Thomas Huth 2019-09-03 05:07:22 UTC The guest wants to use 512 byte sectors, but the raw disk image on the host is located on a DASD disk with 4k sectors. So it's about the way QEMU deals with the disk image on the host - which is 4k, but without the fix for this BZ, QEMU was not able to detect it properly with sparse raw files, so it tried to access the disk image in a wrong way, leading to an error which it then passed to the guest. Sep 4 15:35:21 dhcp16-201-169 kernel: print_req_error: operation not supported error, dev vda, sector 41943034 flags 3 Sep 4 15:35:21 dhcp16-201-169 kernel: print_req_error: I/O error, dev vda, sector 41942784 flags 8801 Bug 1738839 might be the cause of the I/O error shown in the first instance here. However, I think that is unrelated to both the "print_req_error: operation not supported error" and the crash. I've cloned bug 1738839 for RHEL-AV. So, I still strongly suspect the crashes are not related to the "operation not supported" error. However, because it's relatively easy to do so, I've tracked that down with a bisect, just in case it's useful: $ git bisect log git bisect start # good: [32a1a94dd324d33578dca1dc96d7896a0244d768] Update version for v3.1.0 release git bisect good 32a1a94dd324d33578dca1dc96d7896a0244d768 # bad: [9e06029aea3b2eca1d5261352e695edc1e7d7b8b] Update version for v4.1.0 release git bisect bad 9e06029aea3b2eca1d5261352e695edc1e7d7b8b # bad: [9e06029aea3b2eca1d5261352e695edc1e7d7b8b] Update version for v4.1.0 release git bisect bad 9e06029aea3b2eca1d5261352e695edc1e7d7b8b # bad: [eda1df0345f5a1e337e30367124dcb0e802bdfde] Merge remote-tracking branch 'remotes/armbru/tags/pull-pflash-2019-03-11' into staging git bisect bad eda1df0345f5a1e337e30367124dcb0e802bdfde # good: [467657b3b70fff20704e9aa8d7ab989e768eeb96] ppc: remove the interrupt presenters from under PowerPCCPU git bisect good 467657b3b70fff20704e9aa8d7ab989e768eeb96 # bad: [1c9af3a9e05c1607a36df4943f8f5393d7621a91] linux-user: Enable HWCAP_ASIMDFHM, HWCAP_JSCVT git bisect bad 1c9af3a9e05c1607a36df4943f8f5393d7621a91 # good: [7e407466b1efbd65225cc72fe09c0c5ec79df75b] Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging git bisect good 7e407466b1efbd65225cc72fe09c0c5ec79df75b # bad: [ca1a98042bad3fe749c5d3d882f3bd5e956b7690] tests/virtio-blk: add virtio_blk_fix_dwz_hdr() function git bisect bad ca1a98042bad3fe749c5d3d882f3bd5e956b7690 # good: [3272752a8b51cd91d8633048bf6f844117a4879c] xics: Drop the KVM ICS class git bisect good 3272752a8b51cd91d8633048bf6f844117a4879c # good: [2e68b8620637a4ee8c79b5724144b726af1e261b] Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20190219' into staging git bisect good 2e68b8620637a4ee8c79b5724144b726af1e261b # good: [ee7a883ace6f19d3b3d2c2ef7e4ca86232ab523a] block/commit: use QEMU_IOVEC_INIT_BUF git bisect good ee7a883ace6f19d3b3d2c2ef7e4ca86232ab523a # good: [e5863d49e41b1b4b695f854c711a55d2584ee367] hw/ide: drop iov field from IDEState git bisect good e5863d49e41b1b4b695f854c711a55d2584ee367 # bad: [5c81161f804144b146607f890e84613a4cbad95c] virtio-blk: add "discard" and "write-zeroes" properties git bisect bad 5c81161f804144b146607f890e84613a4cbad95c # good: [9942586b3f1879244d51de4efb44e18b93514b9a] hw/ide: drop iov field from IDEDMA git bisect good 9942586b3f1879244d51de4efb44e18b93514b9a # good: [bbe8bd4d85d80442f87774d7bffaca11f2c02b9b] virtio-blk: add host_features field in VirtIOBlock git bisect good bbe8bd4d85d80442f87774d7bffaca11f2c02b9b # first bad commit: [5c81161f804144b146607f890e84613a4cbad95c] virtio-blk: add "discard" and "write-zeroes" properties Based on comment 14, I suspect the cause of the "operation not supported" error is that the device is advertising support for DISCARD, but for some reason (maybe because it's a 'raw' backing image) it's not actually working. mkfs is attempting this and hitting the error. I think this is probably unrelated to the gnome crash, but just to be sure: zhenyzha, Can you please retry this with 'discard=off' added to the options for the virtio-blk-pci device. That should prevent the "operation not supported" error, I'm interested to see if the crash still occurs. (In reply to David Gibson from comment #15) > Based on comment 14, I suspect the cause of the "operation not supported" > error is that the device is advertising support for DISCARD, but for some > reason (maybe because it's a 'raw' backing image) it's not actually working. > mkfs is attempting this and hitting the error. > > I think this is probably unrelated to the gnome crash, but just to be sure: > > zhenyzha, > > Can you please retry this with 'discard=off' added to the options for the > virtio-blk-pci device. That should prevent the "operation not supported" > error, I'm interested to see if the crash still occurs. OK,I will try, and then update the results. > zhenyzha,
>
> Can you please retry this with 'discard=off' added to the options for the
> virtio-blk-pci device. That should prevent the "operation not supported"
> error, I'm interested to see if the crash still occurs.
when added 'discard=off', no hit "operation not supported"
(qemu) drive_add auto id=drive_stg0,if=none,snapshot=off,cache=none,format=raw,file=storage0.raw
OK
(qemu) device_add driver=virtio-blk-pci,discard=off,id=stg0,drive=drive_stg0,bus=pci.0
[root@localhost ~]# mkfs.xfs /dev/vda
meta-data=/dev/vda isize=512 agcount=4, agsize=1310720 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=5242880, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@localhost ~]# echo $?
0
[root@localhost ~]# dmesg | grep vda
[ 100.561394] virtio_blk virtio2: [vda] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB)
[root@localhost ~]#
when remove 'discard=off', hit "operation not supported"
host:4.18.0-141.el8.ppc64le
qemu-kvm-4.1.0-7.module+el8.1.0+4177+896cb282
# fdisk -l
Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
guest:4.18.0-141.el8.ppc64le
But I don't currently have 4k disk host, I will find a 4k disk host today to verify if Core Dump will be triggered.
on the 4k disk P8 host with latest qemu-kvm-core-4.1.0-8 and kernel:4.18.0-141.el8.ppc64le, when added 'discard=off', no hit "operation not supported" and no hit Core Dump. when remove 'discard=off', hit "operation not supported" and hit Core Dump. host:4.18.0-141.el8.ppc64le qemu-kvm-4.1.0-8.module+el8.1.0+4199+446e40fc # sudo lsmcode Version of System Firmware is FW860.50 (SV860_146) (t) FW860.50 (SV860_146) (p) FW860.50 (SV860_146) (b) [root@ibm-p8-kvm-02-qe test]# lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 160 On-line CPU(s) list: 0,8,16,24,32,40,48,56,64,72,80,88,96,104,112,120,128,136,144,152 Off-line CPU(s) list: 1-7,9-15,17-23,25-31,33-39,41-47,49-55,57-63,65-71,73-79,81-87,89-95,97-103,105-111,113-119,121-127,129-135,137-143,145-151,153-159 Thread(s) per core: 1 Core(s) per socket: 5 Socket(s): 4 NUMA node(s): 4 Model: 2.1 (pvr 004b 0201) Model name: POWER8E (raw), altivec supported CPU max MHz: 3690.0000 CPU min MHz: 2061.0000 L1d cache: 64K L1i cache: 32K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0,8,16,24,32 NUMA node1 CPU(s): 40,48,56,64,72 NUMA node16 CPU(s): 80,88,96,104,112 NUMA node17 CPU(s): 120,128,136,144,152 guest:4.18.0-141.el8.ppc64le [root@ibm-p8-kvm-02-qe test]# smartctl --all /dev/sda smartctl 6.6 2017-11-05 r4594 [ppc64le-linux-4.18.0-141.el8.ppc64le] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: IBM Product: IPR-0 5DAA8400 Revision: User Capacity: 283,794,997,248 bytes [283 GB] Logical block size: 4096 bytes Device type: disk Local Time is: Tue Sep 10 22:27:22 2019 EDT SMART support is: Unavailable - device lacks SMART capability. ---------------------------------------------------------------- when remove discard=off [root@dhcp16-201-169 ~]# dmesg | grep vda [ 116.559518] virtio_blk virtio2: [vda] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB) [ 133.965322] print_req_error: operation not supported error, dev vda, sector 41943034 flags 3 check /var/log/messages Sep 11 10:34:11 dhcp16-201-169 kernel: pci 0000:00:00.0: BAR 0: assigned [io 0x10400-0x1047f] Sep 11 10:34:11 dhcp16-201-169 kernel: virtio-pci 0000:00:00.0: enabling device (0000 -> 0003) Sep 11 10:34:11 dhcp16-201-169 kernel: virtio-pci 0000:00:00.0: Using 64-bit direct DMA at offset 800000000000000 Sep 11 10:34:11 dhcp16-201-169 kernel: virtio_blk virtio2: [vda] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB) Sep 11 10:34:29 dhcp16-201-169 kernel: print_req_error: operation not supported error, dev vda, sector 41943034 flags 3 Sep 11 10:37:31 dhcp16-201-169 kernel: llvmpipe-13[8986]: unhandled signal 11 at 00007ffffbfc0000 nip 00007fff88030020 lr 00007fff81e7b774 code 1 Sep 11 10:37:31 dhcp16-201-169 kernel: llvmpipe-5[8978]: unhandled signal 11 at 00007ffffbfc0000 nip 00007fff88030020 lr 00007fff81e7b774 code 1 Sep 11 10:37:31 dhcp16-201-169 kernel: llvmpipe-3[8976]: unhandled signal 11 at 00007ffffbfc0000 nip 00007fff88030020 lr 00007fff81e7b774 code 1 Sep 11 10:37:31 dhcp16-201-169 kernel: llvmpipe-4[8977]: unhandled signal 11 at 00007ffffbfc0000 nip 00007fff88030020 lr 00007fff81e7b774 code 1 Sep 11 10:37:31 dhcp16-201-169 kernel: llvmpipe-9[8982]: unhandled signal 11 at 00007ffffbfc0000 nip 00007fff88030020 lr 00007fff81e7b774 code 1 Sep 11 10:37:31 dhcp16-201-169 kernel: llvmpipe-11[8984]: unhandled signal 11 at 00007ffffbfc0000 nip 00007fff88030020 lr 00007fff81e7b774 code 1 Sep 11 10:37:31 dhcp16-201-169 kernel: llvmpipe-14[8987]: unhandled signal 11 at 00007ffffbfc0000 nip 00007fff88030020 lr 00007fff81e7b774 code 1 Sep 11 10:37:31 dhcp16-201-169 kernel: llvmpipe-2[8975]: unhandled signal 11 at 00007ffffbfc0000 nip 00007fff88030020 lr 00007fff81e7ca24 code 1 Sep 11 10:37:31 dhcp16-201-169 kernel: llvmpipe-6[8979]: unhandled signal 11 at 00007ffffbfc0000 nip 00007fff88030020 lr 00007fff81e7b774 code 1 Sep 11 10:37:31 dhcp16-201-169 kernel: llvmpipe-8[8981]: unhandled signal 11 at 00007ffffbfc0000 nip 00007fff88030020 lr 00007fff81e7b774 code 1 Sep 11 10:37:31 dhcp16-201-169 systemd[1]: Created slice system-systemd\x2dcoredump.slice. Sep 11 10:37:31 dhcp16-201-169 systemd[1]: Started Process Core Dump (PID 9994/UID 0). -------------------------------------------------------------Core Dump Sep 11 10:37:33 dhcp16-201-169 journal[9746]: Error reading events from display: Broken pipe Sep 11 10:37:33 dhcp16-201-169 journal[9676]: Error reading events from display: Broken pipe Sep 11 10:37:33 dhcp16-201-169 journal[9521]: Error reading events from display: Broken pipe Sep 11 10:37:33 dhcp16-201-169 gnome-session[8775]: gnome-session-binary[8775]: WARNING: App 'org.gnome.SettingsDaemon.Wacom.desktop' exited with code 1 Sep 11 10:37:33 dhcp16-201-169 gnome-session-binary[8775]: WARNING: App 'org.gnome.SettingsDaemon.Wacom.desktop' exited with code 1 Sep 11 10:37:33 dhcp16-201-169 journal[9997]: g_hash_table_destroy: assertion 'hash_table != NULL' failed Sep 11 10:37:33 dhcp16-201-169 org.gnome.Shell.desktop[8833]: (EE) failed to read Wayland events: Broken pipe Sep 11 10:37:33 dhcp16-201-169 gnome-session[8775]: gnome-session-binary[8775]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11 Sep 11 10:37:33 dhcp16-201-169 org.gnome.SettingsDaemon.Wacom.desktop[9997]: Unable to init server: Could not connect: Connection refused Sep 11 10:37:33 dhcp16-201-169 org.gnome.SettingsDaemon.Wacom.desktop[9997]: Cannot open display: Sep 11 10:37:33 dhcp16-201-169 gnome-session-binary[8775]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11 Sep 11 10:37:33 dhcp16-201-169 gnome-session-binary[8775]: Unrecoverable failure in required component org.gnome.Shell.desktop Sep 11 10:37:33 dhcp16-201-169 journal[9550]: gsd-media-keys: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 11 10:37:33 dhcp16-201-169 journal[9541]: gsd-color: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 11 10:37:33 dhcp16-201-169 journal[9546]: gsd-keyboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 11 10:37:33 dhcp16-201-169 journal[9538]: gsd-clipboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 11 10:37:33 dhcp16-201-169 journal[9496]: gsd-power: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 11 10:37:33 dhcp16-201-169 journal[9513]: gsd-xsettings: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 11 10:37:33 dhcp16-201-169 at-spi-bus-launcher[9205]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" Sep 11 10:37:33 dhcp16-201-169 at-spi-bus-launcher[9205]: after 21 requests (21 known processed) with 0 events remaining. Sep 11 10:37:33 dhcp16-201-169 systemd-logind[8387]: Session 1 logged out. Waiting for processes to exit. Sep 11 10:37:33 dhcp16-201-169 systemd[1]: Created slice User Slice of UID 42. Sep 11 10:37:33 dhcp16-201-169 systemd[1]: Started /run/user/42 mount wrapper. (In reply to zhenyzha from comment #18) hit this issue need connecting to qmp,and wait about 5 minutes after formatting the disk. # nc -U /var/tmp/monitor-qmpmonitor1 when without connecting qmp, wait 10 minutes,no hit the issues Update test results: When I tested on the same host same 4.18.0-141.el8.ppc64le with the latest version qemu-kvm-4.1.0-10 no hit this issues. and "Bug 1749134 - I/O error when virtio-blk disk is backed by a raw image on 4k disk " has been VERIFIED. so close this bug. This bug has worrying implications of state corruption somewhere, which makes me somewhat uncomfortable closing it without really understanding the root cause. On the other hand I'm pretty stumped as to what we could do next in terms of debugging. So, given that it no longer reproduces with current packages, I relunctantly agree that CLOSED CURRENTRELEASE is the right thing to do for now. Update test results: When boot guest on P9 host hit again the issues: But the host on the 512b disk, kernel:4.18.0-144.el8.ppc64le qemu-kvm-4.1.0-9.module+el8.1.0+4210+23b2046a guest kernet:4.18.0-144.el8.ppc64le # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 4 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Model: 2.3 (pvr 004e 1203) Model name: POWER9, altivec supported CPU max MHz: 3800.0000 CPU min MHz: 2300.0000 L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-63 NUMA node8 CPU(s): 64-127 So I will continue to investigate the trigger and change the bug title How reproducible: 2/5 1.boot guest command Line: /usr/libexec/qemu-kvm \ -name 'nested-L1' \ -sandbox off \ -machine pseries,cap-nested-hv=on \ -device VGA,bus=pci.0,addr=0x2 \ -nodefaults \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,nowait,path=/var/tmp/serial-serial0,id=chardev_serial0,server \ -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \ -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \ -drive id=drive_image1,format=qcow2,if=none,aio=native,media=disk,cache=none,werror=stop,rerror=stop,file=os.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,write-cache=on,bus=pci.0,addr=0x4 \ -device virtio-net-pci,mac=8a:43:4c:8b:de:21,id=id46J4ui,netdev=idizNzdA,bus=pci.0,addr=0x5 \ -netdev tap,id=idizNzdA,vhost=on \ -m 100G \ -vnc :30 \ -smp 64,maxcpus=64,cores=32,threads=1,sockets=2 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -rtc base=utc,clock=host \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -monitor stdio 2.check /var/log/messages Sep 17 16:20:04 localhost systemd[1]: Starting system activity accounting tool... Sep 17 16:20:04 localhost systemd[1]: Started system activity accounting tool. Sep 17 16:20:54 localhost kernel: llvmpipe-12[4686]: unhandled signal 11 at 00007fffffff0000 nip 00007fff942b000c lr 00007fff868aca24 code 1 Sep 17 16:20:54 localhost kernel: llvmpipe-4[4678]: unhandled signal 11 at 00007fffffff0000 nip 00007fff942b000c lr 00007fff868aca24 code 1 Sep 17 16:20:54 localhost systemd[1]: Created slice system-systemd\x2dcoredump.slice. Sep 17 16:20:54 localhost systemd[1]: Started Process Core Dump (PID 5884/UID 0). Sep 17 16:20:54 localhost journal[5348]: Error reading events from display: Broken pipe Sep 17 16:20:54 localhost gnome-session-binary[4580]: WARNING: App 'org.gnome.SettingsDaemon.Wacom.desktop' exited with code 1 Sep 17 16:20:54 localhost gnome-session[4580]: gnome-session-binary[4580]: WARNING: App 'org.gnome.SettingsDaemon.Wacom.desktop' exited with code 1 Sep 17 16:20:55 localhost journal[5886]: g_hash_table_destroy: assertion 'hash_table != NULL' failed Sep 17 16:20:55 localhost gnome-session-binary[4580]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11 Sep 17 16:20:55 localhost org.gnome.Shell.desktop[4665]: (EE) failed to read Wayland events: Broken pipe Sep 17 16:20:55 localhost gnome-session[4580]: gnome-session-binary[4580]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11 Sep 17 16:20:55 localhost gnome-session-binary[4580]: Unrecoverable failure in required component org.gnome.Shell.desktop Sep 17 16:20:55 localhost org.gnome.SettingsDaemon.Wacom.desktop[5886]: Unable to init server: Could not connect: Connection refused Sep 17 16:20:55 localhost org.gnome.SettingsDaemon.Wacom.desktop[5886]: Cannot open display: Sep 17 16:20:55 localhost at-spi-bus-launcher[4816]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" Sep 17 16:20:55 localhost at-spi-bus-launcher[4816]: after 21 requests (21 known processed) with 0 events remaining. Sep 17 16:20:55 localhost journal[5413]: gsd-media-keys: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 17 16:20:55 localhost journal[5390]: gsd-color: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 17 16:20:55 localhost journal[5342]: gsd-xsettings: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 17 16:20:55 localhost journal[5332]: gsd-power: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 17 16:20:55 localhost journal[5386]: gsd-clipboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 17 16:20:55 localhost journal[5409]: gsd-keyboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 17 16:20:55 localhost systemd-logind[4193]: Session 1 logged out. Waiting for processes to exit. Sep 17 16:20:55 localhost systemd[1]: Started /run/user/42 mount wrapper. Sep 17 16:20:55 localhost systemd[1]: Created slice User Slice of UID 42. Sep 17 16:20:55 localhost systemd[1]: Starting User Manager for UID 42... Sep 17 16:20:55 localhost systemd-logind[4193]: New session c1 of user gdm. Sep 17 16:20:55 localhost systemd[1]: Started Session c1 of user gdm. Sep 17 16:20:55 localhost systemd-logind[4193]: Removed session 1. Sep 17 16:20:55 localhost systemd[5900]: Listening on Multimedia System. Sep 17 16:20:55 localhost systemd[5900]: Reached target Paths. Sep 17 16:20:55 localhost systemd[5900]: Reached target Timers. Sep 17 16:20:55 localhost systemd[5900]: Listening on Sound System. Sep 17 16:20:55 localhost systemd[5900]: Starting D-Bus User Message Bus Socket. Sep 17 16:20:55 localhost systemd[5900]: Listening on D-Bus User Message Bus Socket. Sep 17 16:20:55 localhost systemd[5900]: Reached target Sockets. Sep 17 16:20:55 localhost systemd[5900]: Reached target Basic System. Sep 17 16:20:55 localhost systemd[1]: Started User Manager for UID 42. Sep 17 16:20:55 localhost systemd[5900]: Starting Sound Service... Sep 17 16:20:55 localhost systemd[5900]: Started D-Bus User Message Bus. Sep 17 16:20:55 localhost org.gnome.Shell.desktop[5931]: pci id for fd 11: 1234:1111, driver (null) Sep 17 16:20:55 localhost systemd-coredump[5885]: Process 4665 (gnome-shell) of user 0 dumped core.#012#012Stack trace of thread 4686:#012#0 0x00007fff942b000c n/a (n/a)#012#1 0x00007fff868aca24 lp_rast_shade_quads_mask (kms_swrast_dri.so)#012#2 0x00007fff868ae168 lp_rast_triangle_1 (kms_swrast_dri.so)#012#3 0x00007fff868abcb4 rasterize_scene (kms_swrast_dri.so)#012#4 0x00007fff868ac680 thread_function (kms_swrast_dri.so)#012#5 0x00007fff868ac3d0 impl_thrd_routine (kms_swrast_dri.so)#012#6 0x00007fff9a168ae0 start_thread (libpthread.so.0)#012#7 0x00007fff9b51e8f8 __clone (libc.so.6) Sep 17 16:20:55 localhost journal[5931]: Failed to initialize accelerated iGPU/dGPU framebuffer sharing: Do not want to use software renderer (llvmpipe (LLVM 8.0, 128 bits)), falling back to CPU copy path Sep 17 16:20:55 localhost systemd[5900]: Started Sound Service. Sep 17 16:20:55 localhost systemd[5900]: Reached target Default. Sep 17 16:20:55 localhost systemd[5900]: Startup finished in 622ms. Sep 17 16:20:56 localhost org.gnome.Shell.desktop[5931]: glamor: No eglstream capable devices found Sep 17 16:20:56 localhost org.gnome.Shell.desktop[5931]: glamor: 'wl_drm' not supported Sep 17 16:20:56 localhost org.gnome.Shell.desktop[5931]: Missing Wayland requirements for glamor GBM backend Sep 17 16:20:56 localhost org.gnome.Shell.desktop[5931]: Missing Wayland requirements for glamor EGLStream backend Sep 17 16:20:56 localhost org.gnome.Shell.desktop[5931]: Failed to initialize glamor, falling back to sw Sep 17 16:20:56 localhost dbus-daemon[5914]: [session uid=42 pid=5914] Activating via systemd: service name='org.a11y.Bus' unit='at-spi-dbus-bus.service' requested by ':1.15' (uid=42 pid=5931 comm="/usr/bin/gnome-shell " label="system_u:system_r:xdm_t:s0-s0:c0.c1023") Sep 17 16:20:56 localhost systemd[5900]: Starting Accessibility services bus... Sep 17 16:20:56 localhost dbus-daemon[5914]: [session uid=42 pid=5914] Successfully activated service 'org.a11y.Bus' Sep 17 16:20:56 localhost systemd[5900]: Started Accessibility services bus. Sep 17 16:20:56 localhost at-spi-bus-launcher[5999]: dbus-daemon[6004]: Activating service name='org.a11y.atspi.Registry' requested by ':1.0' (uid=42 pid=5931 comm="/usr/bin/gnome-shell " label="system_u:system_r:xdm_t:s0-s0:c0.c1023") Sep 17 16:20:56 localhost at-spi-bus-launcher[5999]: dbus-daemon[6004]: Successfully activated service 'org.a11y.atspi.Registry' Sep 17 16:20:56 localhost at-spi-bus-launcher[5999]: SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry Sep 17 16:20:56 localhost journal[5931]: drmModeSetCursor2 failed with (No such device or address), drawing cursor with OpenGL from now on Sep 17 16:20:56 localhost dbus-daemon[4099]: [system] Activating via systemd: service name='org.freedesktop.locale1' unit='dbus-org.freedesktop.locale1.service' requested by ':1.112' (uid=42 pid=5931 comm="/usr/bin/gnome-shell " label="system_u:system_r:xdm_t:s0-s0:c0.c1023") Sep 17 16:20:56 localhost systemd[1]: Starting Locale Service... Sep 17 16:20:56 localhost dbus-daemon[5914]: [session uid=42 pid=5914] Activating service name='org.freedesktop.portal.IBus' requested by ':1.18' (uid=42 pid=6019 comm="ibus-daemon --xim --panel disable " label="system_u:system_r:xdm_t:s0-s0:c0.c1023") Sep 17 16:20:56 localhost dbus-daemon[5914]: [session uid=42 pid=5914] Activating via systemd: service name='org.freedesktop.impl.portal.PermissionStore' unit='xdg-permission-store.service' requested by ':1.13' (uid=42 pid=5931 comm="/usr/bin/gnome-shell " label="system_u:system_r:xdm_t:s0-s0:c0.c1023") Sep 17 16:20:56 localhost systemd[5900]: Starting sandboxed app permission store... Sep 17 16:20:56 localhost dbus-daemon[5914]: [session uid=42 pid=5914] Successfully activated service 'org.freedesktop.portal.IBus' Sep 17 16:20:56 localhost dbus-daemon[5914]: [session uid=42 pid=5914] Successfully activated service 'org.freedesktop.impl.portal.PermissionStore' Sep 17 16:20:56 localhost systemd[5900]: Started sandboxed app permission store. With the upstream qemu:
89ea03a7dc83ca36b670ba7f787802791fcb04b1 hotplug + mkfs not hit this issues
boot guest ssh + qmp + serial port + TigerVNC connection not hit this issues
Install L2 guest no hit this issues
I will run the basic case for one night and see what happens.
89ea03a7dc83ca36b670ba7f787802791fcb04b1 not hit this issues after an overnight run basic case. qemu-kvm-4.1.0-10.module+el8.1.0+4234+33aa4f57 hit this issues host : 4.18.0-145.el8.ppc64le qemu-kvm-4.1.0-10.module+el8.1.0+4234+33aa4f57 [root@ibm-p9wr-07 ~]# update_flash_nv -d Firmware version: Product Version : witherspoon-ibm-OP9-v2.0.8-2.7 Product Extra : bmc-firmware-version-2.01 Product Extra : buildroot-2018.02.1-6-ga8d1126 Product Extra : capp-ucode-p9-dd2-v4 Product Extra : hcode-hw082318a.op920 Product Extra : hostboot-d033213-pf8eafb0 Product Extra : hostboot-binaries-hw080418a.op920 Product Extra : linux-4.16.13-openpower1-pcc4b089 Product Extra : machine-xml-7cd20a6 Product Extra : occ-084756c Product Extra : petitboot-v1.7.2-p26e7ade Product Extra : sbe-55d6eb2 Product Extra : skiboot-v6.0.8 # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 176 On-line CPU(s) list: 0-175 Thread(s) per core: 4 Core(s) per socket: 22 Socket(s): 2 NUMA node(s): 6 Model: 2.2 (pvr 004e 1202) Model name: POWER9, altivec supported CPU max MHz: 3800.0000 CPU min MHz: 2300.0000 L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-87 NUMA node8 CPU(s): 88-175 NUMA node252 CPU(s): NUMA node253 CPU(s): NUMA node254 CPU(s): NUMA node255 CPU(s): guest: 4.18.0-145.el8.ppc64le How reproducible: 3/5 Steps to Reproduce: 1.boot guest: /usr/libexec/qemu-kvm \ -name 'nested-L1' \ -machine pseries,cap-nested-hv=on \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,nowait,path=/var/tmp/serial-serial0,id=chardev_serial0,server \ -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \ -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,addr=0x4 \ -drive id=drive_image1,format=qcow2,if=none,aio=native,media=disk,cache=none,werror=stop,rerror=stop,file=os.qcow2 \ -device scsi-hd,drive=drive_image1,bus=virtio_scsi_pci0.0,bootindex=0 \ -device virtio-net-pci,mac=9a:33:4d:7a:de:21,id=id46J4ui,netdev=idizNzdA,bus=pci.0,addr=0x5 \ -netdev tap,id=idizNzdA,vhost=on \ -m 100G \ -smp 60,maxcpus=60,cores=30,threads=1,sockets=2 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :30 \ -rtc base=utc,clock=host \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -monitor stdio 2.nc -U /var/tmp/serial-serial0 (qmp + TigerVNC not connection ) check guest IP address 3.ssh connection the guest and wait about 10 minutes. [ 312.006163] llvmpipe-2[4415]: unhandled signal 11 at 000080000e540000 nip 00007fff9a41000c lr 00007fff9825ca24 code 1 [ 312.006175] llvmpipe-11[4424]: unhandled signal 11 at 000080000e540000 nip 00007fff9a41000c lr 00007fff9825b774 code 1 [ 312.006203] llvmpipe-7[4420]: unhandled signal 11 at 000080000e540000 nip 00007fff9a41000c lr 00007fff9825b774 code 1 [ 312.007277] llvmpipe-3[4416]: unhandled signal 11 at 000080000e540000 nip 00007fff9a41000c lr 00007fff9825b774 code 1 [ 312.008124] llvmpipe-15[4428]: unhandled signal 11 at 000080000e540000 nip 00007fff9a41000c lr 00007fff9825b774 code 1 and check /var/log/messages hit this issues Sep 23 14:53:34 dhcp19-129-51 kernel: llvmpipe-3[4416]: unhandled signal 11 at 000080000e540000 nip 00007fff9a41000c lr 00007fff9825b774 code 1 Sep 23 14:53:34 dhcp19-129-51 kernel: llvmpipe-15[4428]: unhandled signal 11 at 000080000e540000 nip 00007fff9a41000c lr 00007fff9825b774 code 1 Sep 23 14:53:34 dhcp19-129-51 systemd[1]: Created slice system-systemd\x2dcoredump.slice. Sep 23 14:53:34 dhcp19-129-51 systemd[1]: Started Process Core Dump (PID 5476/UID 0). Sep 23 14:53:35 dhcp19-129-51 journal[5036]: Error reading events from display: Broken pipe Sep 23 14:53:35 dhcp19-129-51 gnome-session[4336]: gnome-session-binary[4336]: WARNING: App 'org.gnome.SettingsDaemon.Wacom.desktop' exited with code 1 Sep 23 14:53:35 dhcp19-129-51 gnome-session-binary[4336]: WARNING: App 'org.gnome.SettingsDaemon.Wacom.desktop' exited with code 1 Sep 23 14:53:35 dhcp19-129-51 org.gnome.Shell.desktop[4404]: (EE) failed to read Wayland events: Broken pipe Sep 23 14:53:35 dhcp19-129-51 gnome-session[4336]: gnome-session-binary[4336]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11 Sep 23 14:53:35 dhcp19-129-51 gnome-session-binary[4336]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11 Sep 23 14:53:35 dhcp19-129-51 journal[5479]: g_hash_table_destroy: assertion 'hash_table != NULL' failed Sep 23 14:53:35 dhcp19-129-51 gnome-session-binary[4336]: Unrecoverable failure in required component org.gnome.Shell.desktop Sep 23 14:53:35 dhcp19-129-51 journal[5060]: gsd-color: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 23 14:53:35 dhcp19-129-51 at-spi-bus-launcher[4529]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" Sep 23 14:53:35 dhcp19-129-51 at-spi-bus-launcher[4529]: after 21 requests (21 known processed) with 0 events remaining. Sep 23 14:53:35 dhcp19-129-51 org.gnome.SettingsDaemon.Wacom.desktop[5479]: Unable to init server: Could not connect: Connection refused Sep 23 14:53:35 dhcp19-129-51 org.gnome.SettingsDaemon.Wacom.desktop[5479]: Cannot open display: Sep 23 14:53:35 dhcp19-129-51 journal[5068]: gsd-keyboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 23 14:53:35 dhcp19-129-51 journal[5058]: gsd-clipboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 23 14:53:35 dhcp19-129-51 journal[5071]: gsd-media-keys: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 23 14:53:35 dhcp19-129-51 journal[5034]: gsd-xsettings: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 23 14:53:35 dhcp19-129-51 journal[5020]: gsd-power: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Sep 23 14:53:35 dhcp19-129-51 journal[4812]: GChildWatchSource: Exit status of a child process was requested but ECHILD was received by waitpid(). See the documentation of g_child_watch_source_new() for possible causes. Sep 23 14:53:35 dhcp19-129-51 systemd-logind[3956]: Session 1 logged out. Waiting for processes to exit. Sep 23 14:53:35 dhcp19-129-51 systemd[1]: Created slice User Slice of UID 42. Sep 23 14:53:35 dhcp19-129-51 systemd[1]: Started /run/user/42 mount wrapper. Sep 23 14:53:35 dhcp19-129-51 systemd[1]: Starting User Manager for UID 42... Sep 23 14:53:35 dhcp19-129-51 systemd-logind[3956]: New session c1 of user gdm. Sep 23 14:53:35 dhcp19-129-51 systemd[1]: Started Session c1 of user gdm. Sep 23 14:53:35 dhcp19-129-51 systemd-logind[3956]: Removed session 1. Sep 23 14:53:35 dhcp19-129-51 systemd[5491]: Listening on Multimedia System. Sep 23 14:53:35 dhcp19-129-51 systemd[5491]: Reached target Paths. Sep 23 14:53:35 dhcp19-129-51 systemd[5491]: Reached target Timers. Sep 23 14:53:35 dhcp19-129-51 systemd[5491]: Starting D-Bus User Message Bus Socket. Sep 23 14:53:35 dhcp19-129-51 systemd[5491]: Listening on Sound System. Sep 23 14:53:35 dhcp19-129-51 systemd[5491]: Listening on D-Bus User Message Bus Socket. Sep 23 14:53:35 dhcp19-129-51 systemd[5491]: Reached target Sockets. Sep 23 14:53:35 dhcp19-129-51 systemd[5491]: Reached target Basic System. Sep 23 14:53:35 dhcp19-129-51 systemd[1]: Started User Manager for UID 42. Sep 23 14:53:35 dhcp19-129-51 systemd[5491]: Starting Sound Service... Sep 23 14:53:35 dhcp19-129-51 systemd[5491]: Started D-Bus User Message Bus. Sep 23 14:53:35 dhcp19-129-51 systemd-coredump[5477]: Process 4404 (gnome-shell) of user 0 dumped core.#012#012Stack trace of thread 4424:#012#0 0x00007fff9a41000c n/a (n/a)#012#1 0x00007fff9825b774 lp_rast_shade_tile (kms_swrast_dri.so)#012#2 0x00007fff9825bcb4 rasterize_scene (kms_swrast_dri.so)#012#3 0x00007fff9825c680 thread_function (kms_swrast_dri.so)#012#4 0x00007fff9825c3d0 impl_thrd_routine (kms_swrast_dri.so)#012#5 0x00007fffa02c8ae0 start_thread (libpthread.so.0)#012#6 0x00007fffa167e8f8 __clone (libc.so.6) Sep 23 14:53:35 dhcp19-129-51 org.gnome.Shell.desktop[5528]: pci id for fd 12: 1234:1111, driver (null) Sep 23 14:53:36 dhcp19-129-51 systemd[5491]: Started Sound Service. Sep 23 14:53:36 dhcp19-129-51 systemd[5491]: Reached target Default. Sep 23 14:53:36 dhcp19-129-51 systemd[5491]: Startup finished in 499ms. qemu-kvm-4.1.0-11.module+el8.1.0+4250+4f5fbfdc hit this issues host : 4.18.0-145.el8.ppc64le qemu-kvm-4.1.0-11.module+el8.1.0+4250+4f5fbfdc [root@ibm-p9wr-07 ~]# update_flash_nv -d Firmware version: Product Version : witherspoon-ibm-OP9-v2.0.8-2.7 Product Extra : bmc-firmware-version-2.01 Product Extra : buildroot-2018.02.1-6-ga8d1126 Product Extra : capp-ucode-p9-dd2-v4 Product Extra : hcode-hw082318a.op920 Product Extra : hostboot-d033213-pf8eafb0 Product Extra : hostboot-binaries-hw080418a.op920 Product Extra : linux-4.16.13-openpower1-pcc4b089 Product Extra : machine-xml-7cd20a6 Product Extra : occ-084756c Product Extra : petitboot-v1.7.2-p26e7ade Product Extra : sbe-55d6eb2 Product Extra : skiboot-v6.0.8 # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 176 On-line CPU(s) list: 0-175 Thread(s) per core: 4 Core(s) per socket: 22 Socket(s): 2 NUMA node(s): 6 Model: 2.2 (pvr 004e 1202) Model name: POWER9, altivec supported CPU max MHz: 3800.0000 CPU min MHz: 2300.0000 L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-87 NUMA node8 CPU(s): 88-175 NUMA node252 CPU(s): NUMA node253 CPU(s): NUMA node254 CPU(s): NUMA node255 CPU(s): guest: 4.18.0-145.el8.ppc64le Tried to bisect upstream, but there is no fix. Reproducer is not reliable and the bisect process points to an unrelated commit (and after several loops I found that even the HEAD was broken). I will try to bisect to find the commit introducing the regression. I have reproduced the problem with qemu-3.1.0 I have reproduced the problem with qemu-3.0.0 Reproduced with qemu-2.12.0. I suspect the bug is not in QEMU. Reproduced with host installed with RHEL-7.7-updates-20190924.0 Server ppc64le qemu-kvm-rhev-2.12.0-33.el7_7.4.ppc64le kernel-3.10.0-1062.3.1.el7.ppc64le I've been able to reproduce it on the host with vncserver (tigervnc-server), so it's definitively not a virtualization bug.
[ 9166.379561] llvmpipe-3[12586]: unhandled signal 11 at 00007ffff8000000 nip 00007fff804a0134 lr 00007fff7209ca24 code 1
[ 9166.379564] llvmpipe-6[12589]: unhandled signal 11 at 00007ffff80000d8 nip 00007fff804a09b8 lr 00007fff7209b774 code 1
[ 9166.379567] llvmpipe-7[12590]: unhandled signal 11 at 00007ffff80000d8 nip 00007fff804a09b8 lr 00007fff7209b774 code 1
[ 9166.379572] llvmpipe-10[12593]: unhandled signal 11 at 00007ffff80000d8 nip 00007fff804a09b8 lr 00007fff7209b774 code 1
Checking journalctl, it seems to have produced a coredump:
$ coredumpctl list
TIME PID UID GID SIG COREFILE EXE
Tue 2019-10-08 13:17:28 EDT 12576 2003 100 11 present /usr/bin/gnome-shell
$ sudo coredumpctl info /usr/bin/gnome-shell
PID: 12576 (gnome-shell)
UID: 2003 (lvivier)
GID: 100 (users)
Signal: 11 (SEGV)
Timestamp: Tue 2019-10-08 13:17:26 EDT (55min ago)
Command Line: /usr/bin/gnome-shell
Executable: /usr/bin/gnome-shell
Control Group: /
Slice: -.slice
Boot ID: 9173da5faab840e2a3765a81f22ed01e
Machine ID: 36aec30ec47d4ea38bb74d0dbc3c0e02
Hostname: ibm-p8-virt-02.pnr.lab.eng.rdu2.redhat.com
Storage: /var/lib/systemd/coredump/core.gnome-shell.2003.9173da5faab840e2a3765a81f22ed01e.12576.1570555046000000.lz4
Message: Process 12576 (gnome-shell) of user 2003 dumped core.
Stack trace of thread 12589:
#0 0x00007fff804a09b8 n/a (n/a)
#1 0x00007fff7209b774 lp_rast_shade_tile (swrast_dri.so)
#2 0x00007fff7209bcb4 rasterize_scene (swrast_dri.so)
#3 0x00007fff7209c680 thread_function (swrast_dri.so)
#4 0x00007fff7209c3d0 impl_thrd_routine (swrast_dri.so)
#5 0x00007fff86368ba0 start_thread (libpthread.so.0)
#6 0x00007fff87721398 __clone (libc.so.6)
(gdb) bt
#0 0x00007fff804a09b8 in ()
#1 0x00007fff7209b774 in lp_rast_shade_tile (task=0x1002d07cf98, arg=...) at ../src/gallium/drivers/llvmpipe/lp_rast.c:353
#2 0x000001002d07d018 in ()
(gdb) up
#1 0x00007fff7209b774 in lp_rast_shade_tile (task=0x1002d07cf98, arg=...) at ../src/gallium/drivers/llvmpipe/lp_rast.c:353
353 variant->jit_function[RAST_WHOLE]( &state->jit_context,
(gdb) list
348 /* Propagate non-interpolated raster state. */
349 task->thread_data.raster_state.viewport_index = inputs->viewport_index;
350
351 /* run shader on 4x4 block */
352 BEGIN_JIT_CALL(state, task);
353 variant->jit_function[RAST_WHOLE]( &state->jit_context,
354 tile_x + x, tile_y + y,
355 inputs->frontfacing,
356 GET_A0(inputs),
357 GET_DADX(inputs),
358 GET_DADY(inputs),
359 color,
360 depth,
361 0xffff,
362 &task->thread_data,
363 stride,
364 depth_stride);
365 END_JIT_CALL();
366 }
367 }
Perhaps running the llvmpipe test suite on the host would help to diagnose, but as I don't know how to do that I re-assign the BZ to the mesa product.
(In reply to Laurent Vivier from comment #33) > I've been able to reproduce it on the host with vncserver (tigervnc-server), More details: Fresh install from RHEL-8.2.0-20191006.n.0 BaseOS ppc64le + groupinstall "Server with GUI" mesa-libgbm-19.1.4-2.el8.ppc64le mesa-dri-drivers-debuginfo-19.1.4-2.el8.ppc64le mesa-libEGL-debuginfo-19.1.4-2.el8.ppc64le mesa-filesystem-19.1.4-2.el8.ppc64le mesa-libGL-19.1.4-2.el8.ppc64le mesa-libglapi-19.1.4-2.el8.ppc64le mesa-libEGL-19.1.4-2.el8.ppc64le mesa-debuginfo-19.1.4-2.el8.ppc64le mesa-libglapi-debuginfo-19.1.4-2.el8.ppc64le mesa-libGL-debuginfo-19.1.4-2.el8.ppc64le mesa-dri-drivers-19.1.4-2.el8.ppc64le mesa-debugsource-19.1.4-2.el8.ppc64le mesa-libgbm-debuginfo-19.1.4-2.el8.ppc64le tigervnc-license-1.9.0-10.el8.noarch tigervnc-server-minimal-1.9.0-10.el8.ppc64le libvncserver-0.9.11-9.el8.ppc64le kernel-4.18.0-147.3.el8.ppc64le llvm-9.0.0-3.module+el8.2.0+4364+f996812f.ppc64le llvm-devel-9.0.0-3.module+el8.2.0+4364+f996812f.ppc64le llvm-compat-libs-8.0.1-2.module+el8.2.0+4344+da5775f3.ppc64le llvm-libs-9.0.0-3.module+el8.2.0+4364+f996812f.ppc64le $ lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 88 On-line CPU(s) list: 0,8,16,24,32,48,56,64,72,80,88 Off-line CPU(s) list: 1-7,9-15,17-23,25-31,33-39,49-55,57-63,65-71,73-79,81-87,89-95 Thread(s) per core: 1 Core(s) per socket: 5 Socket(s): 2 NUMA node(s): 2 Model: 2.1 (pvr 004b 0201) Model name: POWER8E (raw), altivec supported CPU max MHz: 3325.0000 CPU min MHz: 2061.0000 L1d cache: 64K L1i cache: 32K L2 cache: 512K L3 cache: 8192K NUMA node0 CPU(s): 0,8,16,24,32 NUMA node1 CPU(s): 48,56,64,72,80,88 (gdb) bt
#0 0x00007fff804a09b8 in ()
#1 0x00007fff7209b774 in lp_rast_shade_tile (task=0x1002d07cf98, arg=...) at ../src/gallium/drivers/llvmpipe/lp_rast.c:353
#2 0x000001002d07d018 in ()
(gdb) up
1 0x00007fff7209b774 in lp_rast_shade_tile (task=0x1002d07cf98, arg=...) at ../src/gallium/drivers/llvmpipe/lp_rast.c:353
353 variant->jit_function[RAST_WHOLE]( &state->jit_context,
(gdb) p/x variant->jit_function
$1 = {0x7fff804a0990, 0x7fff804a0000}
(gdb) disass 0x7fff804a0990,0x00007fff804a09c0
Dump of assembler code from 0x7fff804a0990 to 0x7fff804a09c0:
0x00007fff804a0990: addis r2,r12,30646
0x00007fff804a0994: addi r2,r2,30320
0x00007fff804a0998: stdu r1,-896(r1)
0x00007fff804a099c: li r6,496
0x00007fff804a09a0: mtfprwa f1,r4
0x00007fff804a09a4: lvx v1,0,r8
0x00007fff804a09a8: lvx v7,0,r7
0x00007fff804a09ac: std r30,752(r1)
0x00007fff804a09b0: vspltisb v2,-1
0x00007fff804a09b4: li r4,16
0x00007fff804a09b8: ld r11,-32552(r2)
0x00007fff804a09bc: ld r0,-32544(r2)
The code you presented in Comment 36 is from the beginning of an LLVM-generated fragment shader program. Could you also please include the output of (gdb) info reg ? Thanks! (In reply to Ben Crocker from comment #38) > The code you presented in Comment 36 is from the beginning > of an LLVM-generated fragment shader program. > > Could you also please include the output of > > (gdb) info reg Machine has been reinstalled since, but I've saved the coredump: (gdb) bt #0 0x00007fff804a09b8 in () #1 0x00007fff7209b774 in lp_rast_shade_tile (task=0x1002d07cf98, arg=...) at ../src/gallium/drivers/llvmpipe/lp_rast.c:353 #2 0x000001002d07d018 in () (gdb) up #1 0x00007fff7209b774 in lp_rast_shade_tile (task=0x1002d07cf98, arg=...) at ../src/gallium/drivers/llvmpipe/lp_rast.c:353 353 variant->jit_function[RAST_WHOLE]( &state->jit_context, (gdb) p/x variant->jit_function $1 = {0x7fff804a0990, 0x7fff804a0000} (gdb) disass 0x7fff804a0990,0x00007fff804a09c0 Dump of assembler code from 0x7fff804a0990 to 0x7fff804a09c0: 0x00007fff804a0990: addis r2,r12,30646 0x00007fff804a0994: addi r2,r2,30320 0x00007fff804a0998: stdu r1,-896(r1) 0x00007fff804a099c: li r6,496 0x00007fff804a09a0: mtfprwa f1,r4 0x00007fff804a09a4: lvx v1,0,r8 0x00007fff804a09a8: lvx v7,0,r7 0x00007fff804a09ac: std r30,752(r1) 0x00007fff804a09b0: vspltisb v2,-1 0x00007fff804a09b4: li r4,16 0x00007fff804a09b8: ld r11,-32552(r2) 0x00007fff804a09bc: ld r0,-32544(r2) End of assembler dump. (gdb) info reg r0 0x0 0 r1 0x7fff5e11db10 140734771616528 r2 0x7ffff8008000 140737354170368 r3 0x1002d4be6d0 1100271576784 r4 0x10 16 r5 0x0 0 r6 0x1f0 496 r7 0x1002d5b00a0 1100272566432 r8 0x1002d5b00e0 1100272566496 r9 0x1002d5b0120 1100272566560 r10 0x7fff5e11df48 140734771617608 r11 0x7fff804a0990 140735345723792 r12 0x7fff804a0990 140735345723792 r13 0x7fff5e126320 140734771651360 r14 0x0 0 r15 0x1002d180af0 1100268178160 r16 0x0 0 r17 0x100300a08e0 1100317591776 r18 0x0 0 r19 0x0 0 r20 0x80 128 r21 0x1002d5b00a0 1100272566432 r22 0x1002d07d018 1100267114520 r23 0xffff 65535 r24 0x1002d4be6d0 1100271576784 r25 0x0 0 r26 0x7fff5e11df48 140734771617608 r27 0x7fff5e11df28 140734771617576 r28 0x1002d1807f0 1100268177392 r29 0x0 0 r30 0x1002d07cf98 1100267114392 r31 0x1002d5b0090 1100272566416 pc 0x7fff7209b774 0x7fff7209b774 <lp_rast_shade_tile+660> msr 0x900000000280f033 10376293541503627315 cr 0x28444882 675563650 lr 0x7fff7209b774 0x7fff7209b774 <lp_rast_shade_tile+660> ctr 0x7fff804a0990 140735345723792 xer 0x0 0 fpscr 0xb3004100 3003138304 vscr 0x1 1 vrsave 0xffffffff -1 ppr 0xc000000000000 3377699720527872 dscr 0x0 0 tar 0x0 0 bescr <unavailable> ebbhr <unavailable> ebbrr <unavailable> mmcr0 0x0 0 mmcr2 0x0 0 siar 0x0 0 sdar 0x0 0 sier 0x0 0 tfhar 0x0 0 texasr 0x0 0 tfiar 0x0 0 cr0 <unavailable> cr1 <unavailable> cr2 <unavailable> cr3 <unavailable> cr4 <unavailable> cr5 <unavailable> cr6 <unavailable> cr7 <unavailable> cr8 <unavailable> cr9 <unavailable> cr10 <unavailable> cr11 <unavailable> cr12 <unavailable> cr13 <unavailable> cr14 <unavailable> cr15 <unavailable> cr16 <unavailable> cr17 <unavailable> cr18 <unavailable> cr19 <unavailable> cr20 <unavailable> cr21 <unavailable> cr22 <unavailable> cr23 <unavailable> cr24 <unavailable> cr25 <unavailable> cr26 <unavailable> cr27 <unavailable> cr28 <unavailable> cr29 <unavailable> cr30 <unavailable> cr31 <unavailable> ccr <unavailable> cxer <unavailable> clr <unavailable> cctr <unavailable> cfpscr <unavailable> cvscr <unavailable> cvrsave <unavailable> cppr <unavailable> cdscr <unavailable> ctar <unavailable> orig_r3 0x7fff7209b770 140735106627440 trap 0x300 768 Laurent, thank you! This shows me that the prologue: 0x00007fff804a0990: addis r2,r12,30646 ;; 0x7fff804a0990 + 0x77b6(0000) = 0x7ffff8000990... 0x00007fff804a0994: addi r2,r2,30320 ;; ... + 0x7670 = 0x7ffff8008000 HAS been executed correctly, i.e. r2 contains the expected value based on the value of r12 (i.e. the address of the entrypoint of the function) and the two values added to it. Would you please also do this: maintenance info sections ALLOBJ (yes, this is case-sensitive) and post the results here; you should get a long list of sections grouped by object file. Created attachment 1627962 [details]
maintenance info sections ALLOBJ
Fixed in mesa-19.3.0-3.rc4.el8, build ID 1018360. Cf. bug 1753327 Same bug, same fix as bug 1753327, bug 1582226. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1633 |
Description of problem: I/O errors and trigger Core Dump when hotplug block disk with a qcow2/raw image doing mkfs.xfs on POWER8 Version-Release number of selected component (if applicable): host: # uname -r 4.18.0-134.el8.ppc64le # /usr/libexec/qemu-kvm -version QEMU emulator version 4.1.0 (qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc) guest: 4.18.0-139.el8.ppc64le How reproducible: 3/3 Steps to Reproduce: 1.boot guest /usr/libexec/qemu-kvm \ -name 'zhenyzha' \ -machine pseries \ -device VGA,bus=pci.0,addr=0x2 \ -nodefaults \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,nowait,path=/var/tmp/serial-serial0,id=chardev_serial0,server \ -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \ -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \ -drive id=drive_image1,if=none,snapshot=off,cache=none,format=qcow2,file=rhel810-ppc64le-virtio-scsi.qcow2 \ -device scsi-hd,id=image1,drive=drive_image1,bootindex=0 \ -device virtio-net-pci,mac=8a:43:4c:8b:de:21,id=id46J4ui,netdev=idizNzdA,bus=pci.0,addr=0x5 \ -netdev tap,id=idizNzdA,vhost=on \ -m 20G \ -vnc :30 \ -smp 64,maxcpus=64,cores=32,threads=1,sockets=2 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -rtc base=utc,clock=host \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -monitor stdio 2.hotplug block disk with a raw image on qmp {"execute": "human-monitor-command", "arguments": {"command-line": "drive_add auto id=drive_stg0,if=none,snapshot=off,cache=none,format=raw,file=storage0.raw"} } {"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg0", "drive": "drive_stg0", "bus": "pci.0"}} 3.doing mkfs.xfs [root@dhcp70-199 ~]# mkfs.xfs /dev/vda meta-data=/dev/vda isize=512 agcount=4, agsize=1310720 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=5242880, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 mkfs.xfs: pwrite failed: Input/output error [root@dhcp70-199 ~]# echo $? 1 ------------------------------------------------------------------------------------------------This is shown as 1 but is shown here as 0 on P9. [root@dhcp70-199 ~]# dmesg | grep vda [ 197.852032] virtio_blk virtio2: [vda] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB) [ 233.538361] print_req_error: operation not supported error, dev vda, sector 41943034 flags 3 [ 233.538469] print_req_error: I/O error, dev vda, sector 41942784 flags 8801 4.check /var/log/messages check /var/log/messages Aug 29 23:26:13 dhcp70-199 kernel: print_req_error: operation not supported error, dev vda, sector 41943034 flags 3 Aug 29 23:26:13 dhcp70-199 kernel: print_req_error: I/O error, dev vda, sector 41942784 flags 8801 Aug 29 23:27:28 dhcp70-199 kernel: llvmpipe-3[9072]: unhandled signal 11 at 000080000ce80000 nip 00007fff958f0020 lr 00007fff870bb774 code 3 ------------need to wait a few minutes Aug 29 23:27:28 dhcp70-199 kernel: llvmpipe-5[9074]: unhandled signal 11 at 000080000ce80000 nip 00007fff958f0020 lr 00007fff870bb774 code 3 Aug 29 23:27:28 dhcp70-199 kernel: llvmpipe-8[9077]: unhandled signal 11 at 000080000ce80000 nip 00007fff958f0020 lr 00007fff870bb774 code 3 Aug 29 23:27:28 dhcp70-199 kernel: llvmpipe-11[9080]: unhandled signal 11 at 000080000ce80000 nip 00007fff958f0020 lr 00007fff870bb774 code 3 Aug 29 23:27:28 dhcp70-199 kernel: llvmpipe-12[9081]: unhandled signal 11 at 000080000ce80000 nip 00007fff958f0020 lr 00007fff870bb774 code 3 Aug 29 23:27:28 dhcp70-199 kernel: llvmpipe-2[9071]: unhandled signal 11 at 000080000ce80000 nip 00007fff958f0020 lr 00007fff870bca24 code 3 Aug 29 23:27:28 dhcp70-199 kernel: llvmpipe-15[9084]: unhandled signal 11 at 000080000ce80000 nip 00007fff958f0020 lr 00007fff870bb774 code 3 Aug 29 23:27:28 dhcp70-199 kernel: llvmpipe-13[9082]: unhandled signal 11 at 000080000ce80000 nip 00007fff958f0020 lr 00007fff870bb774 code 3 Aug 29 23:27:28 dhcp70-199 kernel: llvmpipe-1[9070]: unhandled signal 11 at 000080000ce80000 nip 00007fff958f0020 lr 00007fff870bca24 code 3 Aug 29 23:27:28 dhcp70-199 systemd[1]: Created slice system-systemd\x2dcoredump.slice. Aug 29 23:27:28 dhcp70-199 systemd[1]: Started Process Core Dump (PID 10122/UID 0). -------------------------------------------------------------Core Dump Aug 29 23:27:29 dhcp70-199 journal[9470]: Error reading events from display: Broken pipe Aug 29 23:27:29 dhcp70-199 gnome-session[8932]: gnome-session-binary[8932]: WARNING: App 'org.gnome.SettingsDaemon.Wacom.desktop' exited with code 1 Aug 29 23:27:29 dhcp70-199 gnome-session-binary[8932]: WARNING: App 'org.gnome.SettingsDaemon.Wacom.desktop' exited with code 1 Aug 29 23:27:29 dhcp70-199 journal[10124]: g_hash_table_destroy: assertion 'hash_table != NULL' failed Aug 29 23:27:29 dhcp70-199 org.gnome.Shell.desktop[9001]: (EE) failed to read Wayland events: Broken pipe Aug 29 23:27:29 dhcp70-199 gnome-session-binary[8932]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11 Aug 29 23:27:29 dhcp70-199 gnome-session[8932]: gnome-session-binary[8932]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 11 Aug 29 23:27:29 dhcp70-199 gnome-session-binary[8932]: Unrecoverable failure in required component org.gnome.Shell.desktop Aug 29 23:27:29 dhcp70-199 org.gnome.SettingsDaemon.Wacom.desktop[10124]: Unable to init server: Could not connect: Connection refused Aug 29 23:27:29 dhcp70-199 org.gnome.SettingsDaemon.Wacom.desktop[10124]: Cannot open display: Aug 29 23:27:29 dhcp70-199 journal[9440]: gsd-power: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Aug 29 23:27:29 dhcp70-199 at-spi-bus-launcher[9135]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0" Aug 29 23:27:29 dhcp70-199 at-spi-bus-launcher[9135]: after 21 requests (21 known processed) with 0 events remaining. Aug 29 23:27:29 dhcp70-199 journal[9464]: gsd-xsettings: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Aug 29 23:27:29 dhcp70-199 journal[9502]: gsd-media-keys: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Aug 29 23:27:29 dhcp70-199 journal[9492]: gsd-clipboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Aug 29 23:27:29 dhcp70-199 journal[9500]: gsd-keyboard: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Aug 29 23:27:29 dhcp70-199 journal[9494]: gsd-color: Fatal IO error 11 (Resource temporarily unavailable) on X server :0. Aug 29 23:27:29 dhcp70-199 journal[9200]: GChildWatchSource: Exit status of a child process was requested but ECHILD was received by waitpid(). See the documentation of g_child_watch_source_new() for possible causes. Aug 29 23:27:29 dhcp70-199 systemd-logind[8608]: Session 1 logged out. Waiting for processes to exit. Aug 29 23:27:29 dhcp70-199 systemd[1]: Created slice User Slice of UID 42. Aug 29 23:27:29 dhcp70-199 systemd[1]: Started /run/user/42 mount wrapper. Aug 29 23:27:29 dhcp70-199 systemd[1]: Starting User Manager for UID 42... Aug 29 23:27:29 dhcp70-199 systemd-logind[8608]: New session c1 of user gdm. Aug 29 23:27:29 dhcp70-199 systemd[1]: Started Session c1 of user gdm. Aug 29 23:27:29 dhcp70-199 systemd-logind[8608]: Removed session 1. Aug 29 23:27:30 dhcp70-199 systemd[10137]: Starting D-Bus User Message Bus Socket. Aug 29 23:27:30 dhcp70-199 systemd[10137]: Listening on Multimedia System. Aug 29 23:27:30 dhcp70-199 systemd[10137]: Listening on Sound System. Aug 29 23:27:30 dhcp70-199 systemd[10137]: Reached target Timers. Aug 29 23:27:30 dhcp70-199 systemd[10137]: Reached target Paths. Aug 29 23:27:30 dhcp70-199 systemd[10137]: Listening on D-Bus User Message Bus Socket. Aug 29 23:27:30 dhcp70-199 systemd[10137]: Reached target Sockets. Aug 29 23:27:30 dhcp70-199 systemd[10137]: Reached target Basic System. Aug 29 23:27:30 dhcp70-199 systemd[1]: Started User Manager for UID 42. Aug 29 23:27:30 dhcp70-199 systemd[10137]: Starting Sound Service... Aug 29 23:27:30 dhcp70-199 systemd[10137]: Started D-Bus User Message Bus. Aug 29 23:27:30 dhcp70-199 org.gnome.Shell.desktop[10169]: pci id for fd 11: 1234:1111, driver (null) Aug 29 23:27:30 dhcp70-199 systemd-coredump[10123]: Process 9001 (gnome-shell) of user 0 dumped core.#012#012Stack trace of thread 9073:#012#0 0x00007fff958f0020 n/a (n/a)#012#1 0x00007fff870bb774 lp_rast_shade_tile (kms_swrast_dri.so)#012#2 0x00007fff870bbcb4 rasterize_scene (kms_swrast_dri.so)#012#3 0x00007fff870bc680 thread_function (kms_swrast_dri.so)#012#4 0x00007fff870bc3d0 impl_thrd_routine (kms_swrast_dri.so)#012#5 0x00007fff9b7b8ba0 start_thread (libpthread.so.0)#012#6 0x00007fff9cb71398 __clone (libc.so.6) Aug 29 23:27:30 dhcp70-199 journal[10169]: Failed to initialize accelerated iGPU/dGPU framebuffer sharing: Do not want to use software renderer (llvmpipe (LLVM 8.0, 128 bits)), falling back to CPU copy path Aug 29 23:27:30 dhcp70-199 org.gnome.Shell.desktop[10169]: glamor: No eglstream capable devices found Aug 29 23:27:30 dhcp70-199 org.gnome.Shell.desktop[10169]: glamor: 'wl_drm' not supported Aug 29 23:27:30 dhcp70-199 org.gnome.Shell.desktop[10169]: Missing Wayland requirements for glamor GBM backend Aug 29 23:27:30 dhcp70-199 org.gnome.Shell.desktop[10169]: Missing Wayland requirements for glamor EGLStream backend Aug 29 23:27:30 dhcp70-199 org.gnome.Shell.desktop[10169]: Failed to initialize glamor, falling back to sw Actual results: I/O errors and trigger Core Dump Expected results: Additional info: 1.with virtio-blk-pci boot guest hit this issue too 2.Tested the same steps on POWER9,hit I/O errors but no Core Dump P9 1.hotplug [root@dhcp19-129-51 ~]# mkfs.xfs /dev/vdb mkfs.xfs /dev/vdb meta-data=/dev/vdb isize=512 agcount=4, agsize=1310720 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=5242880, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 [root@dhcp19-129-51 ~]# echo $? 0 ------------------------------------------------------------------------------------------------display 0 [root@dhcp19-129-51 ~]# dmesg | grep vdb [ 146.627552] virtio_blk virtio2: [vdb] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB) [ 174.795014] print_req_error: operation not supported error, dev vdb, sector 41943034 flags 3 check /var/log/messages Aug 30 00:09:17 dhcp19-129-51 kernel: virtio_blk virtio2: [vdb] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB) Aug 30 00:09:46 dhcp19-129-51 kernel: print_req_error: operation not supported error, dev vdb, sector 41943034 flags 3 Aug 30 00:10:06 dhcp19-129-51 systemd[1]: Starting system activity accounting tool... Aug 30 00:10:06 dhcp19-129-51 systemd[1]: Started system activity accounting tool. Aug 30 00:17:26 dhcp19-129-51 systemd[1]: Starting dnf makecache... Aug 30 00:17:28 dhcp19-129-51 dnf[3826]: Updating Subscription Management repositories.