Bug 2211784 - fedora 38 kernel 6.3.4 boot fail upon upgrade
Summary: fedora 38 kernel 6.3.4 boot fail upon upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 38
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL: https://discussion.fedoraproject.org/...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-01 22:58 UTC by Robert Koppelhuber
Modified: 2023-06-12 03:12 UTC (History)
23 users (show)

Fixed In Version: kernel-6.3.6-100.fc37 kernel-6.3.6-200.fc38
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-09 01:38:06 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
infostate & journalctl for pre&post upgrade to F38 but without 6.3.4 as unable to boot. So used 6.2.14 (2.36 MB, text/plain)
2023-06-01 23:02 UTC, Robert Koppelhuber
no flags Details
Another sudo journalctl -r --boot=-1 after 6.3.4-201 fail then boot 6.2.15-300 (563.91 KB, text/plain)
2023-06-05 22:47 UTC, Robert Koppelhuber
no flags Details
6.3.5-Successfull test (1.00 MB, text/plain)
2023-06-06 01:06 UTC, Robert Koppelhuber
no flags Details
Journalctl for 6.3.5-test-201 (430.31 KB, text/plain)
2023-06-06 21:57 UTC, Robert Koppelhuber
no flags Details

Description Robert Koppelhuber 2023-06-01 22:58:19 UTC
Initial report to per URL: https://discussion.fedoraproject.org/t/fedora-hangs-on-boot-after-upgrading-to-kernel-6-3-4/83605/9


Grub default boots line 1 with rhgb default but no quiet due to encrypted disk.
Hangs after printing first line

but will raise a potential new bug report that since 6.2.x unable to login users with Plasma xorg desktop. Only plasma wayland and gnome work. Probably unrelated to boot issue. But some nvidia issue?

In above link due to character restriction was not able to fully upload my diagnostic information.

Reproducible: Always

Steps to Reproduce:
1. Upgraded to F38.
2. Grub default boots line 1 with rhgb default but no quiet due to encrypted disk.
3. Hangs after printing first line and no further non-quiet output
4. Disk activity light flashing every 1 sec at regular intervals (Stuck in loop)
5. kernel 6.2.15 (last update) boots fine except continuing plasma desktop xorg login issue.
6. Repeat boot with grub command line rhgb edited out, get:
Booting a command list
then nothing.
7.
8. Power off and reboot with grub line kernel 6.2.15
9. Boots fine
10. infostate for this boot attached and journactl record for 1 June 2023
11. Unable to collect any diagnostics
Actual Results:  
Booting a command list and no output with rhgb and quiet removed. JUst hangs. Disk activitiy light flashing regularly. No journalctl collected for 6.3.4. Just older kernels get journalctl

Expected Results:  
No boot, hangs for 6.3.4 after 'Booting a command list', no output without rhgb or quiet, disk activity light flashing regularly at 1 second interval. Can only power off.

Booted with diagnostic information scolling through on screen without rhgb or quiet in command line until need for encrypted luks1 password for root disk.

Please also not very similar issue when I installed F37 on new nvme disk Samsung EVO 970 running in intel W2600cr board with 2 E5-2650 v2. Issue resolved by booting from /boot and /boot/efi located on two partitions on sata ssd with /etc/fstab listing / on nvme drive. Worked since F37 until now.

My 'gut' feel, initramfs missing nvme drivers in kernel build?

My mother bords when efi first came out. While I have latest efi/bios drivers until 2018 after which intel stopped support, none of their drivers ever had nvme support. Via UEFI command lines I can force boot by loading the nvme dxe driver when I first had problems when frist loading thumbdrive for the scripts and nvme driver.

No journalctl re 6.3.4 boot as it can not boot.

$ cat /proc/sys/kernel/tainted
516
$ for i in $(seq 18); do echo $(($i-1)) $(($(cat /proc/sys/kernel/tainted)>>($i-1)&1));done
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0

This suggest my Intel W2600cr motherboard is showing it's age? Maybe 6.2.5 is my limit? Please advise.

My main taint should be nvidia Titan Xp drivers from rpmfusion and vlc and audio drivers from there as well. After all it is a workstation.

S if the kernel is running on a processor or system that is out of specification: hardware has been put into an unsupported configuration, therefore proper execution cannot be guaranteed. Kernel will be tainted if, for example:

    on x86: PAE is forced through forcepae on intel CPUs (such as Pentium M) which do not report PAE but may have a functional implementation, an SMP kernel is running on non officially capable SMP Athlon CPUs, MSRs are being poked at from userspace.

    on arm: kernel running on certain CPUs (such as Keystone 2) without having certain kernel features enabled.

    on arm64: there are mismatched hardware features between CPUs, the bootloader has booted CPUs in different modes.

    certain drivers are being used on non supported architectures (such as scsi/snic on something else than x86_64, scsi/ips on non x86/x86_64/itanium, have broken firmware settings for the irqchip/irq-gic on arm64 ...).

    x86/x86_64: Microcode late loading is dangerous and will result in tainting the kernel. It requires that all CPUs rendezvous to make sure the update happens when the system is as quiescent as possible. However, a higher priority MCE/SMI/NMI can move control flow away from that rendezvous and interrupt the update, which can be detrimental to the machine.

W if a warning has previously been issued by the kernel. (Though some warnings may set more specific taint flags.)

Pre and Post F38 upgrade journalt ctl from 08:00 am  1 June 2023 Australian Eastern Standard time
 
# cat journalctl | grep 'Jun 01' => Can provide unable to upload in this bug report due to size.

Comment 1 Robert Koppelhuber 2023-06-01 23:02:51 UTC
Created attachment 1968441 [details]
infostate & journalctl for pre&post upgrade to F38 but without 6.3.4 as unable to boot. So used 6.2.14

Infostat and journalctl for 1st June 2023 ~ 8am Australian Eastern std time when upgrade process was started. Can only provide journalctl with kernel 6.2.14 as 6.3.4 upgrade will not boot or provide diagnostics.

Comment 2 Robert Koppelhuber 2023-06-01 23:11:38 UTC
Pls note that I have now a different email. Attempted to change but was unable to.

Refer email received when reporting below:

Hello!

RITM1501812 (Unable to change email re bugzilla) has been created.

Requested for: Guest

Thank you for contacting the RH Bugzilla team. Your request is in our system.

Item contains my new email. I have new account which initially raised the issue (see link of thread) with my content details as well.

Comment 3 Robert Koppelhuber 2023-06-02 02:47:12 UTC
Ignore last re email. All sorted bugzilla email servers was down.

Comment 4 Justin M. Forbes 2023-06-02 13:59:30 UTC
See if https://koji.fedoraproject.org/koji/taskinfo?taskID=101713227 works for you.  This might be the nvidia issue with simpledrm that was supposed to be fixed in the nvidia driver a couple of months ago, but seems it is not.

Comment 6 Justin M. Forbes 2023-06-04 00:23:09 UTC
(In reply to Christopher Klooz from comment #5)
> This is linked to
> https://discussion.fedoraproject.org/t/fedora-hangs-on-boot-after-upgrading-
> to-kernel-6-3-4/83605/20 and
> https://bugzilla.redhat.com/show_bug.cgi?id=2212012

It may or it may not be, but none of this is helpful, there are exactly 2 bugs where I put up the link to the scratch build to test a fix. This is one of them. Linking people away from this bug does nothing to get it resolved, I just need someone to actually test the fix and let me know if it solves it or not.   Rather than linking away from one of the bugs where I posted a possible solution, it would be much more helpful to close dups as dups of a bug with the solution. If anyone would bother testing at all, we might even have a fix pushed with 6.3.6 when I build it in the next day or so.

Comment 7 huupoke12 2023-06-04 09:50:30 UTC
(In reply to Justin M. Forbes from comment #4)
> See if https://koji.fedoraproject.org/koji/taskinfo?taskID=101713227 works
> for you.  This might be the nvidia issue with simpledrm that was supposed to
> be fixed in the nvidia driver a couple of months ago, but seems it is not.

This works for me.

Comment 8 Christopher Klooz 2023-06-04 13:12:02 UTC
(In reply to Justin M. Forbes from comment #6)
I do not link them away but led them here. The ask.fedora topic is actually the reason for this bug report to exist. Both users started there and then filed separated bug reports. Average users tend to end up at ask.fedora and not on bugzilla, which is why I link it. Some may be experienced and check out carefully all links and elaborations in both ask.fedora and the bug reports, but many less experienced users we have just click through "link by link" while hoping it leads at some point to an easy-to-implement solution. This is why I linked the reports. I cannot test your build since I do not experience this issue, but I can lead people here and ensure that those who accidentally end up on the wrong of the two bug reports, which have risen from the ask.fedora topic, are then led to the respective other one, but it is also to avoid that one issue is treated twice or even more often. If users miss the report that is relevant for them, they open another topic at ask.fedora, which might end up at another supporter who doesn't know of the other one (including the related bug reports), and so on... This shall be in your interest, and shall avoid that you need to paste your solution in the end to 10 reports, but also that all users at ask.fedora end up asap at your solution (so that one may gives feedback).

Comment 9 vincent 2023-06-04 13:52:02 UTC
(In reply to Justin M. Forbes from comment #4)
> See if https://koji.fedoraproject.org/koji/taskinfo?taskID=101713227 works
> for you.  This might be the nvidia issue with simpledrm that was supposed to
> be fixed in the nvidia driver a couple of months ago, but seems it is not.

Thank you, this works for me.

Comment 10 Robert Koppelhuber 2023-06-05 22:47:43 UTC
Created attachment 1969151 [details]
Another sudo journalctl -r --boot=-1 after 6.3.4-201 fail then boot 6.2.15-300

Another sudo journalctl -r --boot=-1 after 6.3.4-201 fail then boot 6.2.15-300

Comment 11 Robert Koppelhuber 2023-06-05 22:48:49 UTC
Proceeding now to try https://koji.fedoraproject.org/koji/taskinfo?taskID=101713227

Comment 13 Christopher Klooz 2023-06-05 22:56:39 UTC
(In reply Robert Koppelhuber from comment #12)

It's explained in the ask.fedora topic:  
https://discussion.fedoraproject.org/t/fedora-hangs-on-boot-after-upgrading-to-kernel-6-3-4/83605/24

Just use it to test if the bug is solved. Then everything should be fine.

Comment 14 Robert Koppelhuber 2023-06-06 01:06:57 UTC
Created attachment 1969155 [details]
6.3.5-Successfull test

Comment 15 Robert Koppelhuber 2023-06-06 01:15:07 UTC
Thank you success full test but per attachment https://bugzilla.redhat.com/attachment.cgi?id=1969155
did remove 6.3.4 before hand which included removal of dependies akmod-nivida.

$ uname -a
Linux earth 6.3.5-201.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun  1 15:13:15 UTC 2023 x86_64 GNU/Linux


Given This might be the nvidia issue with simpledrm that was supposed to be fixed in the nvidia driver a couple of months ago, but seems it is not any relationship with that?

Further since F38 upgrade a few weeks ago I note that Plasma desktop on login screen choosing Plasma X11 or Xorg cause it the come back to login inscreen. Unable to login. Can with Plasma Wayland desktop. Can also login with Gnome.

Since F36 have been forced to use Nvidia drivers due to Plasma wayland desktop being unstable with irregular sudden complete freeze with control including alt sequences. Powerdown only solution. With nvidia drivers rock solid.

Under F38 for the short while under wayland have been getting less often but still same freeze requiring power off. Would prefer to use Plasma Xorg. Wayland re nvidia is just not quiet right. Where are we up to re the nvidia story, open source, wayland drivers????

Comment 16 Fedora Update System 2023-06-06 14:04:12 UTC
FEDORA-2023-ed3bcae7e8 has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-ed3bcae7e8

Comment 17 Fedora Update System 2023-06-06 14:04:13 UTC
FEDORA-2023-70b0935c41 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2023-70b0935c41

Comment 18 Robert Koppelhuber 2023-06-06 21:57:00 UTC
Created attachment 1969412 [details]
Journalctl for 6.3.5-test-201

Previous attachment F38-kernel-6.3.5-test-201-Success.txt uploaded the log for journalctl -r boot=-1 which effectively was 6.2.5.

This upload is for the booting 
$ uname -a
Linux earth 6.3.5-201.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun  1 15:13:15 UTC 2023 x86_64 GNU/Linux

Sorry!

Comment 19 Fedora Update System 2023-06-07 01:31:55 UTC
FEDORA-2023-70b0935c41 has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-70b0935c41`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-70b0935c41

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 20 Fedora Update System 2023-06-07 01:38:59 UTC
FEDORA-2023-ed3bcae7e8 has been pushed to the Fedora 38 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-ed3bcae7e8`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-ed3bcae7e8

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 21 Kjell Randa 2023-06-07 18:28:41 UTC
The new kernel fixes the missing console and /proc/fb now contains 0 VESA VGA instead of beeing empty.
Tested on two machines using the proprietary Nvidia driver.

Comment 22 Fedora Update System 2023-06-09 01:38:06 UTC
FEDORA-2023-70b0935c41 has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 23 Fedora Update System 2023-06-09 02:00:11 UTC
FEDORA-2023-ed3bcae7e8 has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 24 Robert Koppelhuber 2023-06-12 03:12:54 UTC
Just confirming, installed  kernel 6.3.6 with complete update from std repo for F38.

Boot and functions OK.


Note You need to log in before you can comment on or make changes to this bug.