Bug 2113008

Summary: kernel dereferences NULL pointer when booted in nosmp mode
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 36CC: acaringi, adscvr, airlied, alciregi, bskeggs, hdegoede, hpa, jarodwilson, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, ptalbert, steved
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-01 19:11:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 910269    

Description Richard W.M. Jones 2022-08-01 17:09:44 UTC
1. Please describe the problem:

When the kernel is booted with 1 CPU (nosmp) it crashes with:

[    0.003000] APIC: Switch to symmetric I/O mode setup
[    0.011000] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    0.011000] #PF: supervisor read access in kernel mode
[    0.011000] #PF: error_code(0x0000) - not-present page
[    0.011000] PGD 0 P4D 0 
[    0.011000] Oops: 0000 [#1] PREEMPT SMP NOPTI
[    0.011000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.13-200.fc36.x86_64 #1
[    0.011000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
[    0.011000] RIP: 0010:mask_ioapic_irq+0x16/0xb0
[    0.011000] Code: 24 04 00 00 00 00 eb d5 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 53 48 8b 5f 30 48 c7 c7 88 1c a1 9b e8 1a 11 d1 00 <48> 8b 0b 80 4b 12 01 48 89 c6 48 39 cb 74 74 8b 41 14 8b 7b 10 44
[    0.011000] RSP: 0000:ffffb8eb00003fa8 EFLAGS: 00000046
[    0.011000] RAX: 0000000000000086 RBX: 0000000000000000 RCX: ffffffff99e01177
[    0.011000] RDX: 0000000000000001 RSI: ffffffff9a7057bc RDI: 0000000000000001
[    0.011000] RBP: ffffa0568107eea4 R08: 0000000000000000 R09: 0000000000000000
[    0.011000] R10: 0000000000000000 R11: ffffb8eb00003ff8 R12: 0000000000000030
[    0.011000] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.011000] FS:  0000000000000000(0000) GS:ffffa056ce600000(0000) knlGS:0000000000000000
[    0.011000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.011000] CR2: 0000000000000000 CR3: 0000000027e10000 CR4: 00000000000406b0
[    0.011000] Call Trace:
[    0.011000]  <IRQ>
[    0.011000]  handle_level_irq+0x117/0x180
[    0.011000]  __common_interrupt+0x66/0x100
[    0.011000]  common_interrupt+0xb4/0xd0
[    0.011000]  </IRQ>
[    0.011000]  <TASK>
[    0.011000]  asm_common_interrupt+0x21/0x40
[    0.011000] RIP: 0010:mp_irqdomain_alloc+0xd7/0x270
[    0.011000] Code: 00 00 48 63 43 30 4d 89 3f 4d 89 7f 08 49 89 46 08 48 8b 05 cb 6c 9a 02 49 39 45 50 0f 84 5d 01 00 00 49 c7 46 18 c0 3e 19 9b <4c> 63 6c 24 08 8b 3c 24 4d 89 7e 30 4b 8d 44 ed 00 03 3c c5 98 f8
[    0.011000] RSP: 0000:ffffffff9ae03cf0 EFLAGS: 00000246
[    0.011000] RAX: ffffa0568107dd80 RBX: ffffffff9ae03de0 RCX: 0000000000000000
[    0.011000] RDX: 0000000000000002 RSI: 0000000000000202 RDI: 00000000ffffffff
[    0.011000] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[    0.011000] R10: ffffa0568107b1c0 R11: 0000000000000000 R12: 0000000000000000
[    0.011000] R13: ffffa056811d8600 R14: ffffa0568107ee28 R15: ffffa05681052500
[    0.011000]  __irq_domain_alloc_irqs+0xda/0x410
[    0.011000]  alloc_isa_irq_from_domain.constprop.0+0x9e/0xe0
[    0.011000]  mp_map_pin_to_irq+0x1a3/0x310
[    0.011000]  setup_IO_APIC+0x129/0x809
[    0.011000]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[    0.011000]  ? clear_IO_APIC_pin+0x169/0x240
[    0.011000]  apic_intr_mode_init+0x10f/0x114
[    0.011000]  x86_late_time_init+0x20/0x34
[    0.011000]  start_kernel+0x8a8/0x958
[    0.011000]  ? load_ucode_bsp+0x6d/0x103
[    0.011000]  secondary_startup_64_no_verify+0xd5/0xdb
[    0.011000]  </TASK>
[    0.011000] Modules linked in:
[    0.011000] CR2: 0000000000000000
[    0.011000] ---[ end trace 0000000000000000 ]---
[    0.011000] RIP: 0010:mask_ioapic_irq+0x16/0xb0
[    0.011000] Code: 24 04 00 00 00 00 eb d5 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 53 48 8b 5f 30 48 c7 c7 88 1c a1 9b e8 1a 11 d1 00 <48> 8b 0b 80 4b 12 01 48 89 c6 48 39 cb 74 74 8b 41 14 8b 7b 10 44
[    0.011000] RSP: 0000:ffffb8eb00003fa8 EFLAGS: 00000046
[    0.011000] RAX: 0000000000000086 RBX: 0000000000000000 RCX: ffffffff99e01177
[    0.011000] RDX: 0000000000000001 RSI: ffffffff9a7057bc RDI: 0000000000000001
[    0.011000] RBP: ffffa0568107eea4 R08: 0000000000000000 R09: 0000000000000000
[    0.011000] R10: 0000000000000000 R11: ffffb8eb00003ff8 R12: 0000000000000030
[    0.011000] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.011000] FS:  0000000000000000(0000) GS:ffffa056ce600000(0000) knlGS:0000000000000000
[    0.011000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.011000] CR2: 0000000000000000 CR3: 0000000027e10000 CR4: 00000000000406b0
[    0.011000] Kernel panic - not syncing: Fatal exception in interrupt

2. What is the Version-Release number of the kernel:

5.18.13-200.fc36.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Yes, it worked until recently.  Last known good kernel was:

5.18.11-200.fc36

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

a. Install the kernel.
b. Run: libguestfs-test-tool

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

No.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

See build.log here:
https://koji.fedoraproject.org/koji/taskinfo?taskID=90351515

Comment 1 Richard W.M. Jones 2022-08-01 17:10:04 UTC
Suspecting it could be related to this:
https://www.spinics.net/lists/kernel/msg4308594.html

Comment 2 Richard W.M. Jones 2022-08-01 17:10:53 UTC
(In reply to Richard W.M. Jones from comment #1)
> Suspecting it could be related to this:
> https://www.spinics.net/lists/kernel/msg4308594.html

Sorry ignore this, maybe not this one.

Comment 3 Richard W.M. Jones 2022-08-01 19:11:40 UTC
So this error is intermittent, and I'm no longer able to reproduce it
in Koji or on baremetal.