Bug 2103663 - [ark] mgag200_pci_probe+0x26/0x5b0 [mgag200] RIP: 0010:kernfs_find_and_get_ns+0x11/0x70
Summary: [ark] mgag200_pci_probe+0x26/0x5b0 [mgag200] RIP: 0010:kernfs_find_and_get_ns...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jocelyn Falempe
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-04 12:02 UTC by Bruno Goncalves
Modified: 2022-07-20 08:52 UTC (History)
19 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2022-07-20 08:52:03 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Bruno Goncalves 2022-07-04 12:02:28 UTC
1. Please describe the problem:

The following issue happened when booting up a machine.

dracu[   16.855926] ccp 0000:a1:00.1: no command queues available 
t-initqueue.…i[   16.862367] ccp 0000:a1:00.1: psp enabled 
ce   
 - dracut [   16.867973] Microchip SmartPQI Driver (v2.1.14-035) 
initqueue hook..[   16.874127] smartpqi 0000:43:00.0: Microchip Smart Family Controller found 
. 
[   16.883871] BUG: kernel NULL pointer dereference, address: 0000000000000008 
[   16.890887] #PF: supervisor read access in kernel mode 
[   16.896067] #PF: error_code(0x0000) - not-present page 
[   16.896070] PGD 0 P4D 0  
[   16.896075] Oops: 0000 [#1] PREEMPT SMP NOPTI 
[   16.896079] CPU: 0 PID: 1616 Comm: kworker/0:3 Not tainted 023656 #1 
[   16.896084] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/08/2021 
[   16.896087] Workqueue: events work_for_cpu_fn 
[   16.896096] RIP: 0010:kernfs_find_and_get_ns+0x11/0x70 
[   16.896104] Code: 08 48 83 40 40 01 49 8b 46 08 48 83 40 58 01 31 c0 eb d1 66 0f 1f 44 00 00 0f 1f 44 00 00 41 55 49 89 d5 41 54 49 89 f4 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83 c5 60 
[   16.896106] RSP: 0018:ffffabc291e0fcc0 EFLAGS: 00010246 
[   16.896110] RAX: 0000000000000000 RBX: ffffffff9b323680 RCX: ffffabc291e0fca0 
[   16.896113] RDX: 0000000000000000 RSI: ffffffff9b3237c8 RDI: 0000000000000000 
[   16.896115] RBP: 0000000000000000 R08: 0000000000000040 R09: 00000000e5000000 
[   16.896117] R10: 0000000000000000 R11: ffff9966bb61729c R12: ffffffff9b3237c8 
[   16.896119] R13: 0000000000000000 R14: ffff996686f83bc0 R15: 0000000000000000 
[   16.896121] FS:  0000000000000000(0000) GS:ffff99857d000000(0000) knlGS:0000000000000000 
[   16.896123] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[   16.896125] CR2: 0000000000000008 CR3: 00000001428a2000 CR4: 0000000000350ef0 
[   16.896128] Call Trace: 
[   16.896131]  <TASK> 
[   16.896135]  sysfs_unmerge_group+0x18/0x60 
[   16.896141]  dpm_sysfs_remove+0x20/0x60 
[   16.896148]  device_del+0xb2/0x3f0 
[   16.896156]  platform_device_del.part.0+0x13/0x70 
[   16.896162]  platform_device_unregister+0x1c/0x30 
[   16.896165]  sysfb_disable+0x2b/0x60 
[   16.896171]  remove_conflicting_framebuffers+0x1b/0xc0 
[   16.896178]  remove_conflicting_pci_framebuffers+0xce/0x120 
[   16.896183]  drm_aperture_remove_conflicting_pci_framebuffers+0x57/0x80 
[   16.896190]  mgag200_pci_probe+0x26/0x5b0 [mgag200] 
[   16.896203]  local_pci_probe+0x41/0x80 
[   16.896210]  work_for_cpu_fn+0x16/0x20 
[   16.896214]  process_one_work+0x1c7/0x380 
[   16.896219]  worker_thread+0x1ab/0x380 
[   16.896224]  ? _raw_spin_lock_irqsave+0x23/0x50 
[   16.896232]  ? process_one_work+0x380/0x380 
[   16.896235]  kthread+0xe9/0x110 
[   16.896241]  ? kthread_complete_and_exit+0x20/0x20 
[   16.896244]  ret_from_fork+0x22/0x30 
[   16.896254]  </TASK> 
[   16.896255] Modules linked in: smartpqi(+) ghash_clmulni_intel ccp usb_storage mgag200(+) hpwdt scsi_transport_sas sp5100_tco wmi ipmi_devintf ipmi_msghandler 
[   16.896272] CR2: 0000000000000008 
[   16.896276] ---[ end trace 0000000000000000 ]--- 
[   17.003538] RIP: 0010:kernfs_find_and_get_ns+0x11/0x70 
[   17.233698] Code: 08 48 83 40 40 01 49 8b 46 08 48 83 40 58 01 31 c0 eb d1 66 0f 1f 44 00 00 0f 1f 44 00 00 41 55 49 89 d5 41 54 49 89 f4 55 53 <48> 8b 47 08 48 89 fb 48 85 c0 48 0f 44 c7 48 8b 68 50 48 83 c5 60 
[   17.233702] RSP: 0018:ffffabc291e0fcc0 EFLAGS: 00010246 
[   17.233707] RAX: 0000000000000000 RBX: ffffffff9b323680 RCX: ffffabc291e0fca0 
[   17.233709] RDX: 0000000000000000 RSI: ffffffff9b3237c8 RDI: 0000000000000000 
[   17.233711] RBP: 0000000000000000 R08: 0000000000000040 R09: 00000000e5000000 
[   17.233715] R10: 0000000000000000 R11: ffff9966bb61729c R12: ffffffff9b3237c8 
[   17.286590] R13: 0000000000000000 R14: ffff996686f83bc0 R15: 0000000000000000 
[   17.286593] FS:  0000000000000000(0000) GS:ffff99857d000000(0000) knlGS:0000000000000000 
[   17.286595] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[   17.286597] CR2: 0000000000000008 CR3: 00000001428a2000 CR4: 0000000000350ef0 
   
   
 [-1;-1f         Starting         
plymouth-start.se…[0m - Show Plymouth Boot Screen...  

2. What is the Version-Release number of the kernel:
kernel-5.19.0-0.rc4.a175eca0f3d7.36.test.fc37.x86_64

3. more logs: https://datawarehouse.cki-project.org/kcidb/tests/4180584

Comment 2 Jocelyn Falempe 2022-07-06 12:28:49 UTC
From Javier Martinez Canillas, there is one patch missing in v5.19-rc4:
commit fb84efa28a48 ("drm/aperture: Run fbdev removal before internal helpers")

This commit is in drm-fixes and should be in the rc5.

There is also a tmp MR to test this on rawhide:
https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1904

@bgoncalv is it possible to check this MR on the "problematic" hardware, to confirm the fix is working ?

Thanks,

Comment 3 Bruno Goncalves 2022-07-13 12:12:45 UTC
Jocelyn,

I was not able to build a kernel using that MR, but I did test with kernel ark 5.19.0-0.rc5.e8a4e1c1bb69.44.test.fc37 on the same machine and I didn't hit the problem.

 [-1;-1f[   16.741249] Microchip SmartPQI Driver (v2.1.14-035) 
         Startin[   16.746405] smartpqi 0000:43:00.0: Microchip Smart Family Controller found 
g         
plymo[   16.755822] mgag200 0000:61:00.1: vgaarb: deactivate vga console 
uth-start.se…[0m - Show Plymouth Boot Screen...  
[      
  OK     
] Started [   16.768060] usbcore: registered new interface driver uas 
        
plymout[   16.768247] [drm] Initialized mgag200 1.0.0 20110418 for 0000:61:00.1 on minor 0 
h-start.ser…e   
 - Show Plymo[   16.785333] fbcon: mgag200drmfb (fb0) is primary device 
uth Boot Screen.[   16.785337] fbcon: Deferring console take-over 
  
[      
  OK [   16.797196] mgag200 0000:61:00.1: [drm] fb0: mgag200drmfb frame buffer device 
    
] Started         
systemd-ask-passwo…uests to Plymouth Directory Watch.

Comment 4 Jocelyn Falempe 2022-07-13 12:35:56 UTC
Hi Bruno,

Thanks a lot for testing, and confirming the fix works and is included in 5.19-rc5


Note You need to log in before you can comment on or make changes to this bug.