Bug 1851855 - Kernel NULL pointer dereference in amdgpu on Radeon VII with kernel 5.7.*
Summary: Kernel NULL pointer dereference in amdgpu on Radeon VII with kernel 5.7.*
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 32
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-29 09:15 UTC by Ivan Mironov
Modified: 2020-07-09 19:18 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)

Description Ivan Mironov 2020-06-29 09:15:12 UTC
1. Please describe the problem:

[79260.489120] BUG: kernel NULL pointer dereference, address: 0000000000000128
[79260.489123] #PF: supervisor write access in kernel mode
[79260.489123] #PF: error_code(0x0002) - not-present page
[79260.489124] PGD 0 P4D 0 
[79260.489125] Oops: 0002 [#1] SMP NOPTI
[79260.489127] CPU: 0 PID: 17315 Comm: modprobe Tainted: G            E     5.7.5-200.fc32.x86_64 #1
[79260.489127] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 2204 06/17/2020
[79260.489173] RIP: 0010:lock_bus+0x42/0x60 [amdgpu]
[79260.489174] Code: 53 be 01 00 00 00 48 8b 9f 70 bb 00 00 48 8b bf f8 f3 ff ff e8 df 8f 26 c8 85 c0 74 0d 48 c7 c7 48 bf eb c0 5b e9 3e 9e da ff <c6> 83 28 01 00 00 01 5b c3 48 c7 c7 48 bf eb c0 e9 29 9e da ff 66
[79260.489175] RSP: 0018:ffffb34445117c48 EFLAGS: 00010246
[79260.489176] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[79260.489177] RDX: ffff9d57092b2680 RSI: 000000000001629a RDI: ffff9d57981cb818
[79260.489177] RBP: ffff9d5789627058 R08: 00000000ffffffff R09: 0000000000000000
[79260.489177] R10: 0000000000000002 R11: 00000000000000f0 R12: 0000000000000000
[79260.489178] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[79260.489179] FS:  00007fe04f0b5740(0000) GS:ffff9d57aea00000(0000) knlGS:0000000000000000
[79260.489179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[79260.489180] CR2: 0000000000000128 CR3: 000000123e312000 CR4: 0000000000340ef0
[79260.489180] Call Trace:
[79260.489186]  i2c_smbus_xfer+0x3d/0xf0
[79260.489187]  i2c_default_probe+0xf3/0x130
[79260.489189]  i2c_detect.isra.0+0xfe/0x2b0
[79260.489191]  ? kfree+0xa3/0x200
[79260.489193]  ? kobject_uevent_env+0x11f/0x6a0
[79260.489193]  ? i2c_detect.isra.0+0x2b0/0x2b0
[79260.489194]  __process_new_driver+0x1b/0x20
[79260.489196]  bus_for_each_dev+0x64/0x90
[79260.489197]  ? 0xffffffffc13ff000
[79260.489198]  i2c_register_driver+0x73/0xc0
[79260.489200]  do_one_initcall+0x46/0x200
[79260.489202]  ? _cond_resched+0x16/0x40
[79260.489203]  ? kmem_cache_alloc_trace+0x167/0x220
[79260.489205]  ? do_init_module+0x23/0x260
[79260.489206]  do_init_module+0x5c/0x260
[79260.489207]  __do_sys_init_module+0x14f/0x170
[79260.489208]  do_syscall_64+0x5b/0xf0
[79260.489209]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[79260.489210] RIP: 0033:0x7fe04f1e540e
[79260.489211] Code: 48 8b 0d 8d 0a 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5a 0a 0c 00 f7 d8 64 89 01 48
[79260.489212] RSP: 002b:00007ffdb0489df8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[79260.489212] RAX: ffffffffffffffda RBX: 000055b7bf883ae0 RCX: 00007fe04f1e540e
[79260.489213] RDX: 000055b7bd962288 RSI: 000000000000385e RDI: 000055b7bf892810
[79260.489213] RBP: 000055b7bf892810 R08: 0000000000000000 R09: 000055b7bf892860
[79260.489214] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[79260.489214] R13: 000055b7bd962288 R14: 000055b7bf883c80 R15: 000055b7bf883c60
[79260.489216] Modules linked in: jc42(+) uinput rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp tun bridge nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter cmac bnep rpcrdma ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad iw_cxgb4 ib_uverbs rdma_cm iw_cm ib_cm ib_core snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi uvcvideo snd_hda_intel snd_intel_dspcfg videobuf2_vmalloc snd_hda_codec videobuf2_memops videobuf2_v4l2 btusb snd_usb_audio raid1 btrtl videobuf2_common snd_hda_core btbcm amd64_edac_mod
[79260.489231]  snd_usbmidi_lib btintel edac_mce_amd videodev snd_hwdep snd_seq bluetooth kvm_amd snd_rawmidi eeepc_wmi mc asus_wmi xpad kvm snd_seq_device joydev sparse_keymap snd_pcm ecdh_generic ff_memless rfkill irqbypass ecc video snd_timer wmi_bmof snd pcspkr sp5100_tco soundcore i2c_piix4 k10temp acpi_cpufreq ip_tables isofs squashfs dm_multipath amdgpu amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper 8021q garp mrp stp llc crct10dif_pclmul crc32_pclmul drm ghash_clmulni_intel ccp hpsa r8169 scsi_transport_sas wmi pinctrl_amd uas usb_storage btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi loop fuse scsi_transport_iscsi [last unloaded: minix]
[79260.489248] CR2: 0000000000000128
[79260.489249] ---[ end trace ebc75789e03eebf1 ]---
[79260.489282] RIP: 0010:lock_bus+0x42/0x60 [amdgpu]
[79260.489283] Code: 53 be 01 00 00 00 48 8b 9f 70 bb 00 00 48 8b bf f8 f3 ff ff e8 df 8f 26 c8 85 c0 74 0d 48 c7 c7 48 bf eb c0 5b e9 3e 9e da ff <c6> 83 28 01 00 00 01 5b c3 48 c7 c7 48 bf eb c0 e9 29 9e da ff 66
[79260.489283] RSP: 0018:ffffb34445117c48 EFLAGS: 00010246
[79260.489284] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[79260.489284] RDX: ffff9d57092b2680 RSI: 000000000001629a RDI: ffff9d57981cb818
[79260.489285] RBP: ffff9d5789627058 R08: 00000000ffffffff R09: 0000000000000000
[79260.489285] R10: 0000000000000002 R11: 00000000000000f0 R12: 0000000000000000
[79260.489286] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[79260.489286] FS:  00007fe04f0b5740(0000) GS:ffff9d57aea00000(0000) knlGS:0000000000000000
[79260.489287] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[79260.489287] CR2: 0000000000000128 CR3: 000000123e312000 CR4: 0000000000340ef0

This happens when some i2c device driver tries to scan for devices on i2c bus. In my case it is triggered by `modprobe jc42`.

Here is the fix: https://lkml.org/lkml/2020/6/25/624


2. What is the Version-Release number of the kernel:

5.7.5-200.fc32.x86_64 (from Test Day:2020-06-22 Kernel 5.7 Test Week)


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

This appeared on 5.7 mainline kernel.


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

1. Boot kernel 5.7.* on system with Radeon VII.
2. `modprobe jc42`.
3. See `dmesg`.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

Not tried with Rawhide kernel, but tried to build and run torvalds/linux master. Problem still occur there.


6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Fredrik Chabot 2020-07-09 19:18:02 UTC
The GUI just hangs the moment the AMDGPU module is loaded on 5.7.7-200.fc32.x86_64.

jul 09 20:48:34 localhost.localdomain boltd[840]: [0008ea78-ae9b-Core X                     ] authorize: finished: ok (status: authorized, flags: 2)
jul 09 20:48:34 localhost.localdomain boltd[840]: [0008ea78-ae9b-Core X                     ] auto-auth: authorization successful
jul 09 20:48:35 localhost.localdomain boltd[840]: [0008ea78-ae9b-Core X                     ] udev: device changed: authorized -> authorized
jul 09 20:48:35 localhost.localdomain kernel: [drm] amdgpu kernel modesetting enabled.
jul 09 20:48:35 localhost.localdomain kernel: CRAT table not found
jul 09 20:48:35 localhost.localdomain kernel: Virtual CRAT table created for CPU
jul 09 20:48:35 localhost.localdomain kernel: Parsing CRAT table with 1 nodes
jul 09 20:48:35 localhost.localdomain kernel: Creating topology SYSFS entries
jul 09 20:48:35 localhost.localdomain kernel: Topology: Add CPU node
jul 09 20:48:35 localhost.localdomain kernel: Finished initializing topology
jul 09 20:48:35 localhost.localdomain kernel: amdgpu 0000:08:00.0: enabling device (0000 -> 0003)
jul 09 20:48:35 localhost.localdomain kernel: [drm] initializing kernel modesetting (NAVI14 0x1002:0x7340 0x1462:0x3822 0xC5).
jul 09 20:48:35 localhost.localdomain kernel: [drm] register mmio base: 0x80000000
jul 09 20:48:35 localhost.localdomain kernel: [drm] register mmio size: 524288
jul 09 20:48:35 localhost.localdomain kernel: [drm] PCIE atomic ops is not supported
jul 09 20:48:36 localhost.localdomain kernel: hrtimer: interrupt took 250582589 ns
jul 09 20:48:36 localhost.localdomain kernel: [drm:amdgpu_discovery_init [amdgpu]] *ERROR* invalid ip discovery binary signature
jul 09 20:48:36 localhost.localdomain kernel: amdgpu 0000:08:00.0: amdgpu_discovery_init failed
jul 09 20:48:36 localhost.localdomain kernel: amdgpu 0000:08:00.0: Fatal error during GPU init
jul 09 20:48:36 localhost.localdomain kernel: [drm] amdgpu: finishing device.
jul 09 20:48:36 localhost.localdomain kernel: BUG: kernel NULL pointer dereference, address: 00000000000000b0
jul 09 20:48:36 localhost.localdomain kernel: #PF: supervisor read access in kernel mode
jul 09 20:48:36 localhost.localdomain kernel: #PF: error_code(0x0000) - not-present page
jul 09 20:48:36 localhost.localdomain kernel: PGD 0 P4D 0 
jul 09 20:48:36 localhost.localdomain kernel: Oops: 0000 [#1] SMP NOPTI
jul 09 20:48:36 localhost.localdomain kernel: CPU: 7 PID: 3472 Comm: systemd-udevd Not tainted 5.7.7-200.fc32.x86_64 #1
jul 09 20:48:36 localhost.localdomain kernel: Hardware name: Notebook                         N150CU                          /N150CU                          , BIOS 1.>
jul 09 20:48:36 localhost.localdomain kernel: RIP: 0010:drm_plane_register_all+0x2d/0x60 [drm]
jul 09 20:48:36 localhost.localdomain kernel: Code: 00 00 55 48 8d af d0 02 00 00 53 48 8b 87 d0 02 00 00 48 39 c5 74 32 48 8d 58 f8 eb 0d 48 8b 43 08 48 8d 58 f8 48 39>
jul 09 20:48:36 localhost.localdomain kernel: RSP: 0018:ffffabe58338bbc8 EFLAGS: 00010282
jul 09 20:48:36 localhost.localdomain kernel: RAX: 0000000000000000 RBX: fffffffffffffff8 RCX: 0000000000001a73
jul 09 20:48:36 localhost.localdomain kernel: RDX: ffffffffc19b0120 RSI: fbf9c06674c9dbb4 RDI: ffff9cb17ee99800
jul 09 20:48:36 localhost.localdomain kernel: RBP: ffff9cb17ee99ad0 R08: 0000000000000000 R09: ffff9cb19d066c10
jul 09 20:48:36 localhost.localdomain kernel: R10: ffff9cb160f75f70 R11: 0000000000000000 R12: 0000000000000000
jul 09 20:48:36 localhost.localdomain kernel: R13: 000000000000001a R14: ffff9cb160f75f70 R15: 0000000000000000
jul 09 20:48:36 localhost.localdomain kernel: FS:  00007f3c03d0cb80(0000) GS:ffff9cb1a07c0000(0000) knlGS:0000000000000000
jul 09 20:48:36 localhost.localdomain kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 09 20:48:36 localhost.localdomain kernel: CR2: 00000000000000b0 CR3: 000000081dab0002 CR4: 00000000003606e0
jul 09 20:48:36 localhost.localdomain kernel: Call Trace:
jul 09 20:48:36 localhost.localdomain kernel:  drm_modeset_register_all+0x10/0x70 [drm]
jul 09 20:48:36 localhost.localdomain kernel:  drm_dev_register+0x15d/0x180 [drm]
jul 09 20:48:36 localhost.localdomain kernel:  amdgpu_pci_probe+0x100/0x180 [amdgpu]
jul 09 20:48:36 localhost.localdomain kernel:  local_pci_probe+0x42/0x80
jul 09 20:48:36 localhost.localdomain kernel:  ? _cond_resched+0x16/0x40
jul 09 20:48:36 localhost.localdomain kernel:  pci_device_probe+0xd9/0x190
jul 09 20:48:36 localhost.localdomain kernel:  really_probe+0x167/0x410
jul 09 20:48:36 localhost.localdomain kernel:  driver_probe_device+0xb6/0x100
jul 09 20:48:36 localhost.localdomain kernel:  device_driver_attach+0xa1/0xb0
jul 09 20:48:36 localhost.localdomain kernel:  __driver_attach+0x8a/0x150
jul 09 20:48:36 localhost.localdomain kernel:  ? device_driver_attach+0xb0/0xb0
jul 09 20:48:36 localhost.localdomain kernel:  ? device_driver_attach+0xb0/0xb0
jul 09 20:48:36 localhost.localdomain kernel:  bus_for_each_dev+0x64/0x90
jul 09 20:48:36 localhost.localdomain kernel:  bus_add_driver+0x12b/0x1e0
jul 09 20:48:36 localhost.localdomain kernel:  driver_register+0x8b/0xe0
jul 09 20:48:36 localhost.localdomain kernel:  ? 0xffffffffc1ab2000
jul 09 20:48:36 localhost.localdomain kernel:  do_one_initcall+0x46/0x200
jul 09 20:48:36 localhost.localdomain kernel:  ? _cond_resched+0x16/0x40
jul 09 20:48:36 localhost.localdomain kernel:  ? kmem_cache_alloc_trace+0x167/0x220
jul 09 20:48:36 localhost.localdomain kernel:  ? do_init_module+0x23/0x260
jul 09 20:48:36 localhost.localdomain kernel:  do_init_module+0x5c/0x260
jul 09 20:48:36 localhost.localdomain kernel:  __do_sys_init_module+0x14f/0x170
jul 09 20:48:36 localhost.localdomain kernel:  do_syscall_64+0x5b/0xf0
jul 09 20:48:36 localhost.localdomain kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
jul 09 20:48:36 localhost.localdomain kernel: RIP: 0033:0x7f3c04e5e40e
jul 09 20:48:36 localhost.localdomain kernel: Code: 48 8b 0d 8d 0a 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00>
jul 09 20:48:36 localhost.localdomain kernel: RSP: 002b:00007ffd39eb6948 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
jul 09 20:48:36 localhost.localdomain kernel: RAX: ffffffffffffffda RBX: 0000564ee6754490 RCX: 00007f3c04e5e40e
jul 09 20:48:36 localhost.localdomain kernel: RDX: 00007f3c04ab895d RSI: 00000000009e4006 RDI: 00007f3c01105010
jul 09 20:48:36 localhost.localdomain kernel: RBP: 00007f3c01105010 R08: 0000564ee66c40c0 R09: 00000000009e4010
jul 09 20:48:36 localhost.localdomain kernel: R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
jul 09 20:48:36 localhost.localdomain kernel: R13: 00007f3c04ab895d R14: 0000564ee6750590 R15: 0000564ee66280a0
jul 09 20:48:36 localhost.localdomain kernel: Modules linked in: amdgpu(+) amd_iommu_v2 gpu_sched ttm uinput rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf>
jul 09 20:48:36 localhost.localdomain kernel:  videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc ecdh_generic rfkill ecc snd_hda_codec_hdmi>
jul 09 20:48:36 localhost.localdomain kernel: CR2: 00000000000000b0
jul 09 20:48:36 localhost.localdomain kernel: ---[ end trace ed01d9c9e912db76 ]---
jul 09 20:48:36 localhost.localdomain kernel: RIP: 0010:drm_plane_register_all+0x2d/0x60 [drm]
jul 09 20:48:36 localhost.localdomain kernel: Code: 00 00 55 48 8d af d0 02 00 00 53 48 8b 87 d0 02 00 00 48 39 c5 74 32 48 8d 58 f8 eb 0d 48 8b 43 08 48 8d 58 f8 48 39>
jul 09 20:48:36 localhost.localdomain kernel: RSP: 0018:ffffabe58338bbc8 EFLAGS: 00010282
jul 09 20:48:36 localhost.localdomain kernel: RAX: 0000000000000000 RBX: fffffffffffffff8 RCX: 0000000000001a73
jul 09 20:48:36 localhost.localdomain kernel: RDX: ffffffffc19b0120 RSI: fbf9c06674c9dbb4 RDI: ffff9cb17ee99800
jul 09 20:48:36 localhost.localdomain kernel: RBP: ffff9cb17ee99ad0 R08: 0000000000000000 R09: ffff9cb19d066c10
jul 09 20:48:36 localhost.localdomain kernel: R10: ffff9cb160f75f70 R11: 0000000000000000 R12: 0000000000000000
jul 09 20:48:36 localhost.localdomain kernel: R13: 000000000000001a R14: ffff9cb160f75f70 R15: 0000000000000000
jul 09 20:48:36 localhost.localdomain kernel: FS:  00007f3c03d0cb80(0000) GS:ffff9cb1a07c0000(0000) knlGS:0000000000000000
jul 09 20:48:36 localhost.localdomain kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jul 09 20:48:36 localhost.localdomain kernel: CR2: 00000000000000b0 CR3: 000000081dab0002 CR4: 00000000003606e0
jul 09 20:48:36 localhost.localdomain systemd-udevd[613]: Worker [3472] terminated by signal 9 (KILL)
jul 09 20:48:36 localhost.localdomain systemd-udevd[613]: 0000:08:00.0: Worker [3472] failed
jul 09 20:48:36 localhost.localdomain gnome-shell[1894]: Failed to hotplug secondary gpu '/dev/dri/renderD129': GDBus.Error:System.Error.ENODEV: No


It used to kinda work on 5.6.18-300.fc32


Note You need to log in before you can comment on or make changes to this bug.