Bug 2355276

Summary: Boot failure on on Dell XPS 9640
Product: [Fedora] Fedora Reporter: a-team
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 41CC: 106238l, aamadeo, acaringi, adscvr, airlied, bskeggs, cpatrick08, d3d9, hdegoede, hpa, jforbes, josef, kernel-maint, linville, masami256, mchehab, peter, ptalbert, spotrh, steved, suraj.ghimire7, tts26
Target Milestone: ---Keywords: Desktop
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Dell Support help
none
Screenshot of boot attempt none

Description a-team 2025-03-27 09:36:24 UTC
1. Please describe the problem:

After the latest BIOS update to version 1.12.0, it's impossible to boot the machine. GRUB is displayed and I can choose between 3 kernel versions or the rescue option, but none of the 4 will boot. The screen remains black with a white cursor or the message "Booting Fedora ...". The only way to shutdown the laptop is keeping the power button pressed.

Note that I also tried to but from a USB stick with F40 and F42 without success

2. What is the Version-Release number of the kernel:

6.13.8-200.fc41.x56_64
6.13.7-200.fc41.x56_64
6.13.6-200.fc41.x56_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

6.13.8-200.fc41.x56_64


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Unfortunately, I can't access the command line to run journalctl


Reproducible: Always

Comment 1 d3d9 2025-04-07 21:06:01 UTC
Same here on Dell Inspiron 16 Plus 7640. In this case, version 1.13.0. Downgrade to 1.12.0 fixed the issue.
It occurred on various kernels from 6.12.6 to 6.13.7 and USB also didn't help.
journalctl doesn't show anything at all between the previous successful boot and the related shutdown a week ago and my current first successful boot after the downgrade.

According to a post on the Dell forums it also occurred with a linux mint USB as well as on fedora and for yet another model / bios variant with an update from the similar point in time / reason for the critical update (CVE-2024-38796). https://www.dell.com/community/en/conversations/inspiron/dell-bios-update-breaks-linux-installs/67d09c66c5ead74c2bf65cd5

Comment 2 Christopher Patrick 2025-05-01 04:21:31 UTC
Same here on Alienware m16 R2. I upgraded version 1.1.10+. I cannot downgrade back to 1.9.0 which is the last version that worked. I cannot boot openSUSE TW, Linux Mint or any Fedora ISO via Fedora USB Installer or Ventoy.

Comment 3 Christopher Patrick 2025-05-01 04:32:00 UTC
Created attachment 2087940 [details]
Dell Support help

I contacted Dell via Facebook Messenger and they following is what they had me try with no success.

Comment 4 Peter Williams 2025-05-01 20:51:15 UTC
I don't know anything about kernel debugging, so I suspect that this won't be helpful at all. But, I booted the Fedora 42 installer with parameters "acpi=off earlyprintk=efi earlycon=efifb nosmp nowatchdog console=" and was able to get some diagnostic output on a new XPS 9640 with the 1.12.0 BIOS. The issue happens around here:

```
Booting paravirtualized kernel on bare hardware
BUG: unable to handle page fault for address: ffffffffff5fc330
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
```

The key call trace lines look like they are:

```
? asm_exc_page_fault+0x26/030
? native_apic_mem_read+0x6/0x20
? intel_thermal_supported+0x5/0x30
? therm_lvt_init+0x23/0x30
```

But here's a report from someone else that looks pretty different: https://lore.kernel.org/lkml/Z-aD1ughy6fd8Ask@archimedes.dunstkreis.ch/T/#u . I think they didn't use "acpi=off", which might be the cause of the difference?

Comment 5 Orion Leidl Wilson 2025-05-08 12:57:24 UTC
I am having this issue too, with the m16 r2 bios 1.1.10

Comment 6 Tom "spot" Callaway 2025-05-08 16:44:48 UTC
(In reply to Peter Williams from comment #4)
> The issue happens around here:
> 
> ```
> Booting paravirtualized kernel on bare hardware
> BUG: unable to handle page fault for address: ffffffffff5fc330
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page


I have the exact same error on my Alienware M16 R2 (1.11.0 firmware) 6.14.4-300.fc42, booting with those options appended:

My first call trace:

? therm_lvt_init+0x23/0x30
? setup_arch+0x87c/0x8c0
? start_kernel+0x64/0x490
? x86_64_start_reservations+0x24/0x30
? x86_64_start_kernel+0xed/0xf0
? common_startup_64+0x13e/0x141

I'll take a photo and attach it too.

Comment 7 Tom "spot" Callaway 2025-05-08 16:49:46 UTC
Created attachment 2089090 [details]
Screenshot of boot attempt

Comment 9 Peter Williams 2025-05-09 13:32:43 UTC
Also: the people experiencing this issue are reporting that Ubuntu and OpenSUSE kernels can boot successfully, FWIW. So it's not an issue that's universal to *all* Linux kernel builds.

Comment 10 Albert Amadeo 2025-05-11 14:23:15 UTC
I have a Dell Inspiron Plus 7640 with the same issue trying to install first Rocky Linux and then Fedora 42.
In the grub screen I added a line with "set debug=all" after the initrd line and I got this output:

script/lexer.c:336:lexer token 259 text []
script/lexer.c:336:lexer token 0 text []
(same 2 lines once more)
loader/efi/linux.c:236:linux kernel_address: 0x10000000 handover_offset: 0x1015e70 params: 0x5251e000
loader/efi/linux.c:252:nx: Setting attributes for 0x10000000-0x5adefff to r-x
loader/efi/linux.c:252:nx: permissions for 0x10000000 are ---
loader/efi/linux.c:252:nx: Setting attributes for stack at 0xNumberA-0xNumberB to rw-
loader/efi/linux.c:252:nx: permissions for 0xNumberA are ---

*NumberA and NumberB is just me simplifying the numbers because I'm copying them manually.

I hope this helps

Comment 11 Justin M. Forbes 2025-05-13 17:20:37 UTC
I suppose the real task is to figure out why OpenSUSE kernel works and ours does not.

Comment 12 Christopher Patrick 2025-05-13 20:28:13 UTC
(In reply to Justin M. Forbes from comment #11)
> I suppose the real task is to figure out why OpenSUSE kernel works and ours
> does not.

I couldn't get openSUSE TW to boot a few weeks ago. Might need to figure out why Ubuntu will boot.

Comment 13 tts26 2025-05-13 21:06:33 UTC
Dell's release notes for BIOS update 1.10.0 (available at https://www.dell.com/support/kbdoc/en-us/000270384/dsa-2025-044) indicate that this update addresses security vulnerability DSA-2025-044. This vulnerability is further detailed in the Tianocore EDK2 security advisory: https://github.com/tianocore/edk2/security/advisories/GHSA-xpcr-7hjq-m6qm.

It appears that the fix implemented in BIOS version 1.10.0 (for Alienware M16 R2), while addressing the security vulnerability, has introduced a regression that prevents the operating system kernel from booting.

Comment 14 tts26 2025-05-13 21:08:34 UTC
above fix for https://www.dell.com/support/kbdoc/en-us/000270384/dsa-2025-044 was also pushed to XPS 16 9640 with a BIOS update.