Bug 1618950 - kernel BUG at mm/slub.c:296!
Summary: kernel BUG at mm/slub.c:296!
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati
Version: 28
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1619701 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-18 18:28 UTC by Scott Cohen
Modified: 2019-05-28 23:25 UTC (History)
29 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-28 23:25:31 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
The kernel bug from dmesg. (3.58 KB, text/plain)
2018-08-18 18:28 UTC, Scott Cohen
no flags Details
19th up to the 21st journalctl messages (692.10 KB, text/x-vhdl)
2018-08-20 16:32 UTC, Scott Cohen
no flags Details

Description Scott Cohen 2018-08-18 18:28:22 UTC
Created attachment 1476793 [details]
The kernel bug from dmesg.

Description of problem:
Attached is a copy of the output from dmesg. I don't know when this happened, but I left my computer on over night and I had a Windows 10 VM running in VirtualBox, firefox-esr running in two different windows with a total of about 60 tabs open, some terminal applications running in terminal windows, all in xfce4. I can still ssh -X into my host and my virtual machine (how I got the dmesg), but my desktop environment seems to be toast as doing a remote desktop into the host yields no reply. The GPU is connected to two monitors, one with DVI, the other with HDMI and the one through DVI has a screen filled with GPU artifacts and the HDMI doesn't detect the GPU. I'm not sure if glxgears is supposed to be slow over ssh over wifi, but I get an average of .6614 FPS over 5 intervals.  

This is the current GPU in use:
Advanced Micro Devices, Inc. [AMD/ATI] Lexa PRO [Radeon RX 550/550X] (rev c7)  

Version-Release number of selected component (if applicable):
4.17.11-200.fc28.x86_64

How reproducible:
This is the first time this has happened with this GPU. Before I had an AMD Radeon HD 5500 or 5550 (I forget which) and this occurred every time I played a game for more than 10 minutes. No games or 3d graphics were running during this time. I don't know how to reproduce this.

Comment 1 Laura Abbott 2018-08-20 15:03:48 UTC
Moving to the graphics team to take a look

Comment 2 Jérôme Glisse 2018-08-20 16:07:05 UTC
Could not find anything in newer kernel that would address this and in the code i do not see how double kfree() might happen with stream inside amdgpu_dm_connector_mode_valid()

It might be that an earlier double free corrupted the slub in which stream is and thus the issue is somewhere else (possibly not in amdgpu driver).

Did you save the full kernel log ? It would be useful to take a look see if there is any warning before.

Comment 3 Scott Cohen 2018-08-20 16:32:08 UTC
Created attachment 1477310 [details]
19th up to the 21st journalctl messages

I'm not sure if fedora still uses some form of syslog at least as an output file, but I may have inadvertently deleted it when I deleted the entry in abrt (I'm not sure that deletes logs though). I deleted the entry because it was unusable and thus reporting was disabled. But here is the output from journalctl getting time from the 19th up to the 21st if it is worth anything.

Comment 4 Laura Abbott 2018-08-21 18:08:09 UTC
*** Bug 1619701 has been marked as a duplicate of this bug. ***

Comment 5 Matt Corallo 2018-12-05 16:10:18 UTC
Also likely duplicate: 1602958. Saw the same issue on a "Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon Pro WX 4100]" on  ppc64el on 4.17.2 on Debian so whatever it is its neither a x86-specific bug nor a Fedora-specific bug.

[1519662.914938] kernel BUG at /build-4.17-debian-git/linux/mm/slub.c:296!
[1519662.915005] Oops: Exception in kernel mode, sig: 5 [#1]
[1519662.915059] LE SMP NR_CPUS=2048 NUMA PowerNV
[1519662.915146] Modules linked in: nls_ascii nls_cp437 vfat fat cfg80211 nfnetlink_queue nfnetlink_log bluetooth ecdh_generic rfkill binfmt_misc netlink_diag sctp xt_nat veth nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nft_chain_route_ipv4 ipip tunnel4 ip_tunnel ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat nf_conntrack xt_DSCP xt_dscp tun nft_counter nft_meta xt_tcpudp nft_compat nf_tables nfnetlink fuse ofpart amdgpu powernv_flash mtd ipmi_powernv ipmi_devintf ipmi_msghandler opal_prd at24 snd_hda_codec_hdmi ast chash evdev gpu_sched ttm snd_hda_intel drm_kms_helper sg snd_hda_codec drm snd_hda_core snd_hwdep snd_pcm drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops i2c_algo_bit snd_timer snd soundcore ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto
[1519662.916068]  btrfs zstd_decompress zstd_compress xxhash uas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid sd_mod raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod usb_storage dm_crypt dm_mod algif_skcipher af_alg ecb xts xhci_pci xhci_hcd vmx_crypto usbcore mpt3sas nvme tg3 nvme_core raid_class scsi_transport_sas libphy usb_common
[1519662.916579] CPU: 17 PID: 1940 Comm: Xorg Not tainted 4.17.0-trunk-powerpc64le #1 Debian 4.17.2-1~exp1
[1519662.916717] NIP:  c000000000382304 LR: c00800000b0808c0 CTR: c000000000382ef0
[1519662.916826] REGS: c000000facb6b4b0 TRAP: 0700   Not tainted  (4.17.0-trunk-powerpc64le Debian 4.17.2-1~exp1)
[1519662.916968] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48222222  XER: 20040000
[1519662.917094] CFAR: c00000000038227c SOFTE: 0 
                 GPR00: c00800000b0808c0 c000000facb6b730 c0000000010bff00 c000000fbe006a00 
                 GPR04: c00a00801541d400 c000200550753400 c000200550753400 c000200550753400 
                 GPR08: 0000000000000000 0000000000000000 0000000000000001 c00800000b0928c0 
                 GPR12: c000000000382ef0 c000000fffff4600 00003ffff7820f80 4480000000000000 
                 GPR16: 0000000000000000 00003ffff781f9d4 0000000000000000 0000000000004000 
                 GPR20: 0000000000004000 c000000fa546333c c000200550753400 c000000fbe006a00 
                 GPR24: c000200550753400 0000000000000031 0000000000210d00 0000000000000001 
                 GPR28: 0000000000000001 000000000020000a c000200550753400 c00a00801541d400 
[1519662.918046] NIP [c000000000382304] __slab_free+0x114/0x4c0
[1519662.918169] LR [c00800000b0808c0] dc_sink_release+0x88/0xb0 [amdgpu]
[1519662.918236] Call Trace:
[1519662.918304] [c000000facb6b830] [c00800000b0808c0] dc_sink_release+0x88/0xb0 [amdgpu]
[1519662.918433] [c000000facb6b860] [c00800000b086ea8] dc_stream_release+0x80/0xd0 [amdgpu]
[1519662.918575] [c000000facb6b890] [c00800000b010118] amdgpu_dm_connector_mode_valid+0xb0/0x230 [amdgpu]
[1519662.918694] [c000000facb6b930] [c00800000a47524c] drm_helper_probe_single_connector_modes+0x634/0x8c0 [drm_kms_helper]
[1519662.918847] [c000000facb6ba60] [c00800000a1e414c] drm_mode_getconnector+0x144/0x440 [drm]
[1519662.918975] [c000000facb6bb20] [c00800000a1c1cc8] drm_ioctl_kernel+0xa0/0x140 [drm]
[1519662.919085] [c000000facb6bb70] [c00800000a1c2194] drm_ioctl+0x1ac/0x4d0 [drm]
[1519662.919217] [c000000facb6bcb0] [c00800000ae95078] amdgpu_drm_ioctl+0x70/0xd0 [amdgpu]
[1519662.919324] [c000000facb6bd00] [c0000000003df7dc] do_vfs_ioctl+0xdc/0x8a0
[1519662.919420] [c000000facb6bda0] [c0000000003e00a4] ksys_ioctl+0x104/0x120
[1519662.919515] [c000000facb6bdf0] [c0000000003e0100] sys_ioctl+0x40/0xa0
[1519662.919603] [c000000facb6be30] [c00000000000b9e0] system_call+0x58/0x6c
[1519662.919682] Instruction dump:
[1519662.919736] 60000000 7c210b78 7c421378 2fb50000 409e017c ebdf0010 81170020 83bf0018 
[1519662.919840] 7ecaf278 7cf64214 7d4a0074 794ad182 <0b0a0000> 93a10078 e9370138 7d5be850

Comment 6 Matt Corallo 2018-12-05 16:26:37 UTC
As to the previous question about other warnings at the time, mine triggered during relatively heavy BTRFS usage (removing a device from an array), but otherwise pretty normal system. That said, as far as I can tell it triggered at ~exactly the time I woke up my monitors after getting into the office in the morning, so would be somewhat surprised to find the running-all-night BTRFS stuff having been the cause.

Comment 7 Ben Cotton 2019-05-02 20:10:11 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 Ben Cotton 2019-05-28 23:25:31 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.