Bug 2211591

Summary: F37+38: kernel 6.3.4 no support for nvidia while booting
Product: [Fedora] Fedora Reporter: customercare
Component: grub2Assignee: Kernel Maintainer List <kernel-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 37CC: acaringi, adscvr, airlied, alciregi, bskeggs, fmartine, hdegoede, hpa, jarodwilson, josef, kernel-maint, lgoncalv, linville, lkundrak, masami256, mchehab, mlewando, nfrayer, pgnet.dev, pjones, ptalbert, rharwood, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel log 6.3.4 none

Description customercare 2023-06-01 07:33:28 UTC
Created attachment 1968243 [details]
kernel log 6.3.4

1. Please describe the problem:

No display support while booting on nvidia cards.

LUKS password needs to be entered blindly.

The issue looks exactly like the one from January, where the needed drivers were missing in the kernel build. 

BUT: in contrary to january's 6.1.5 issue, the simpledrm driver is not blocked in the kernel options:

" rhgb quiet splash nouveau.modeset=0 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau "

Where enabling simpledrm helped with 6.1.5, nvidia does not work proper on boot with 6.3.4


2. What is the Version-Release number of the kernel:

6.3.4 

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

6.2.x works fine


6. Are you running any modules that not shipped with directly Fedora's kernel?:

rpmfusions nvidia drivers

Comment 1 customercare 2023-06-01 07:35:56 UTC
can one add "leigh123linux" to the cc, does not work for me.

Comment 2 customercare 2023-06-01 07:43:51 UTC
HINT FOUND:


this is from the BLS config file:

options root=UUID=9d2595b2-a35c-48c1-a839-bb54c1a96597 ro vconsole.font=latarcyrheb-sun16 rd.luks.uuid=luks-ed009ed3-118c-465d-9b89-9b2a4f5cc3f3 rd.luks.uuid=luks-9d2595b2-a35c-48c1-a839-bb54c1a96597 rhgb quiet splash audit=0 nouveau.modeset=0 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 initcall_blacklist=simpledrm_platform_driver_init

simpledrm is blocked. BUT that has never been given to grub defaults:

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
#GRUB_TERMINAL_OUTPUT="console"
#GRUB_CMDLINE_LINUX="vconsole.font=latarcyrheb-sun16 rd.luks.uuid=luks-ed009ed3-118c-465d-9b89-9b2a4f5cc3f3 rd.luks.uuid=luks-9d2595b2-a35c-48c1-a839-bb54c1a96597 rhgb quiet splash audit=0 nouveau.modeset=0 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 initcall_blacklist=simpledrm_platform_driver_init"
GRUB_CMDLINE_LINUX="vconsole.font=latarcyrheb-sun16 rd.luks.uuid=luks-ed009ed3-118c-465d-9b89-9b2a4f5cc3f3 rd.luks.uuid=luks-9d2595b2-a35c-48c1-a839-bb54c1a96597 rhgb quiet splash audit=0 nouveau.modeset=0 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau"
GRUB_DISABLE_RECOVERY="true"
GRUB_VIDEO_BACKEND=vbe
GRUB_FONT_PATH=/boot/grub2/fonts/unicode.pf2
GRUB_GFXMODE=0x11b
GRUB_GFXPAYLOAD_LINUX="keep"
GRUB_TERMINAL_OUTPUT="gfxterm"
#GRUB_GFXMODE="1440x900x32"
GRUB_ENABLE_BLSCFG=true

There is no file in /etc/ that adds this to the linux kernel line.

Comment 3 customercare 2023-06-01 07:45:52 UTC
that grub config is untouched since the 6.1.5 kernel nvidia issue and worked AND other pcs do not have this commentline at all and can't see the boot screen too.

Comment 4 customercare 2023-06-01 07:49:29 UTC
rebooting confirmed: BLS config caused issue.

manually removing the false options , and system boots again as it should.

Comment 5 customercare 2023-06-01 07:53:21 UTC
I checked if grub has used the commented CMDLINE with the old arguments, but grub.cfg shows the correct normal line:

# grep kernelopts= /boot/grub2/grub.cfg
set default_kernelopts="root=UUID=9d2595b2-a35c-48c1-a839-bb54c1a96597 ro rd.driver.blacklist=nouveau modprobe.blacklist=nouveau vconsole.font=latarcyrheb-sun16 rd.luks.uuid=luks-ed009ed3-118c-465d-9b89-9b2a4f5cc3f3 rd.luks.uuid=luks-9d2595b2-a35c-48c1-a839-bb54c1a96597 rhgb quiet splash audit=0 rd.driver.blacklist=nouveau nouveau.modeset=0 modprobe.blacklist=nouveau "


Question: Who did create the BLS configfiles with commented CMDLINE instead of the real one?

Comment 6 customercare 2023-06-01 07:59:11 UTC
- switched to grub2 - as grub2-mkconfig produced clean BLS config files after removing the #GRUB_CMDLINE_LINUX= line from the default file.

Comment 7 customercare 2023-06-01 08:00:01 UTC
grub2-common-2.06-94.fc37.noarch
grub2-efi-x64-2.06-94.fc37.x86_64
grub2-pc-2.06-94.fc37.x86_64
grub2-pc-modules-2.06-94.fc37.noarch
grub2-starfield-theme-2.02-0.43.fc26.x86_64
grub2-tools-2.06-94.fc37.x86_64
grub2-tools-efi-2.06-94.fc37.x86_64
grub2-tools-extra-2.06-94.fc37.x86_64
grub2-tools-minimal-2.06-94.fc37.x86_64

Comment 8 Marta Lewandowska 2023-06-02 10:18:07 UTC
Hi, 

If you
# grubby --update-kernel /path/to/kernel --remove-args "the args you want to remove"
does the machine consistently boot as it should?

if not, could you please show the output of the following:
# grubby --info ALL

and mention what steps you took when installing the new kernel? (just dnf update kernel ?)

Also, I understand from your comments that your /etc/default/grub has not been changed in a long time? Is that correct? Specifically which GRUB_CMDLINE_LINUX entry is commented has not changed?

Comment 9 customercare 2023-06-02 10:54:32 UTC
(In reply to Marta Lewandowska from comment #8)
> Hi, 
> 
> If you
> # grubby --update-kernel /path/to/kernel --remove-args "the args you want to
> remove"
> does the machine consistently boot as it should?

I can't answere this, as no BLS config file nor grubenv was changed when i executed the command with one of the older kernels.

It's problem some of the fedora system i maintain have: you can't select a default kernel anymore via grubby. No clue, why this isn't working anymore. Most system just boot the newest kernel aka first entry in the list.


> and mention what steps you took when installing the new kernel? (just dnf
> update kernel ?)

"dnf update -y" is run in a 2h periode via cron. 
 
> Also, I understand from your comments that your /etc/default/grub has not
> been changed in a long time? Is that correct? Specifically which
> GRUB_CMDLINE_LINUX entry is commented has not changed?

in January there was the kernel 6.1.5 issue, where justin forgot to add some old drivers in the kernel.
In that process, the default/grub file was changed a few day later, and I commented the old kernel line out and added the new value, as you can see above.

After that, the default was untouched. I did the change to test, if the removal of the simpledrm argument let nvidia work with kernel 6.1.5, which I got confirmed via a normal boot with that old kernel. Since then, default/grub was unchanged. 

Yesterday, I removed the old commented out line in default/grub, recreated the config via grub2-mkconfig and e voila, it booted normally again.

I checked the kernel logs and found that at LEAST on 26. May the kernelline was tampered with the old simpledrm block argument again. 
Unlucky for us, older boot logs are not available anymore.

Comment 10 Marta Lewandowska 2023-06-02 11:52:49 UTC
Ok, so you're saying you can't use grubby for anything anymore, but it used to work? This seems like a problem.

Is this / these system(s) UEFI or BIOS? If UEFI, could you please 
#cat /boot/efi/EFI/fedora/grub.cfg

Also, when you run grub2-mkconfig, how do you run it exactly? What's the output target?
# grub2-mkconfig -o [what's here?]

Comment 11 customercare 2023-06-02 13:08:28 UTC
Sorry, my mistake, i did not recognize "update-kernel" i read "default-kernel".

(In reply to Marta Lewandowska from comment #10)
> Ok, so you're saying you can't use grubby for anything anymore, but it used to work? This seems like a problem.

On my Laptop i can't switch the boot kernel anymore with grubby. it sticks to index#2 . But thats a different Story, I open a new bug for that. Grubby CAN switch the default kernel in general. I tested it a few minutes ago, with F37 on my Surface tablet.. Worked as expected. The laptop issue must be caused by a very special constellation.

> Is this / these system(s) UEFI or BIOS? If UEFI, could you please 
> #cat /boot/efi/EFI/fedora/grub.cfg

It's a Bios-Legacy boot. 

/boot/efi/EFI/fedora/grub.cfg does not exist.

# ll /boot/grub2/
insgesamt 68
-rw-r--r--. 1 root root    84  7. Dez 2015  device.map
drwx------. 2 root root  4096 29. Apr 10:17 fonts
-rw-r--r--. 1 root root  4707 24. Jun 2021  grub.cfg
-rw-r--r--. 1 root root  6515  2. Apr 2017  grub.cfg_new
-rw-r--r--. 1 root root  6515  2. Apr 2017  grub.cfg_old
-rw-r--r--. 1 root root  5862 28. Nov 2019  grub.cfg.rpmsave
-rw-------. 1 root root  1024  2. Jun 08:59 grubenv
-rw-r--r--. 1 root root  1024 24. Jan 2018  grubenv.rpmsave
drwxr-xr-x. 2 root root 12288 28. Nov 2019  i386-pc
drwxr-xr-x. 2 root root  4096  2. Apr 2017  locale
drwxr-xr-x. 4 root root  4096  9. Mai 2012  themes

# ls -la /boot/efi/EFI/fedora/
insgesamt 6308
drwx------. 3 root root    4096 29. Apr 10:17 .
drwxr-xr-x. 4 root root    4096 21. Jul 2022  ..
-rwx------. 1 root root     110  7. Jul 2022  BOOTX64.CSV
drwx------. 2 root root    4096  8. Aug 2018  fw
-rwx------. 1 root root   65824  8. Aug 2018  fwupia32.efi
-rwx------. 1 root root   77496  8. Aug 2018  fwupx64.efi
-rw-------. 1 root root    1024 30. Nov 2021  grubenv
-rwx------. 1 root root 3530048 10. Apr 19:08 grubx64.efi
-rwx------. 1 root root  857248  7. Jul 2022  mmx64.efi
-rwx------. 1 root root  946712  7. Jul 2022  shim.efi
-rwx------. 1 root root  946712  7. Jul 2022  shimx64.efi

> 
> Also, when you run grub2-mkconfig, how do you run it exactly? What's the
> output target?
> # grub2-mkconfig -o [what's here?]

For the fix of the BLS config files, i only executed "grub2-mkconfig" with any arguments, so it printed the  grub.cfg to stdout.

What the kernel install scripts do, i have no clue. The system in question was installed with Fedora 18 and got upgraded via dnf since. As you can see above, there are some pretty old files in those directories, it's possible that the upgrades did not end up with the same config files you would get from a fresh install. 

Same for the mentioned laptop, which even started with fedora 15, and runs now on fedora 38.

Comment 12 Marta Lewandowska 2023-06-02 18:45:25 UTC
Thanks for sharing the directory structure. That's actually really helpful.

If these are really Legacy BIOS, then your directories should look something like this:

[root@hp-dlg5-01 ~]# ls -l /boot/grub2
total 36
-rw-r--r--. 1 root root    64 Jun  2 13:26 device.map
drwxr-xr-x. 2 root root  4096 Jun  2 13:27 fonts
-rw-------. 1 root root  6441 Jun  2 13:28 grub.cfg
-rw-r--r--. 1 root root  1024 Jun  2 13:27 grubenv
drwxr-xr-x. 2 root root 12288 Jun  2 13:27 i386-pc
drwxr-xr-x. 2 root root  4096 Jun  2 13:26 locale

[root@hp-dlg5-01 ~]# ls -l /boot/efi/EFI/fedora/
total 0

You've had them for a while so you have some old stuff kicking around or maybe installed efi binaries that you don't need, but even if /boot/efi/EFI/fedora has some stuff in it for whatever reason, it shouldn't have a grubenv file. I think that might be the reason grubby is confused and you aren't able to use it properly.

The reason I'm focusing on grubby is because it's the tool you should be using to manipulate BLS entries. And when you install a new kernel, grub and grubby pass the arguments from your default kernel to the new one, so what happened to you (or what I understood anyway) shouldn't happen. If you have the kernel command line set correctly for your present kernel, you should end up with the same command line for the newly installed kernel. 

If you need to fix stuff for all the kernels, you can use commands like
# grubby --update-kernel ALL --args "args to add" --remove-args "args to remove"
or you can update kernels one at a time, as you see fit.

I hope this helps..?

Comment 13 customercare 2023-06-09 07:50:21 UTC
(In reply to Marta Lewandowska from comment #12)

> You've had them for a while so you have some old stuff kicking around or
> maybe installed efi binaries that you don't need, but even if
> /boot/efi/EFI/fedora has some stuff in it for whatever reason, it shouldn't
> have a grubenv file. I think that might be the reason grubby is confused and
> you aren't able to use it properly.


That file is not causing it: I just removed it, changed kernel and the choosen kernel got ignored on boot. 

After the boot, the choosen kernel is named as the default kernel, which it clearly isn't:

Last login: Fri Jun  9 09:39:34 2023 from 127.0.0.1
[root@eve ~]# grubby --default-kernel
/boot/vmlinuz-6.2.15-200.fc37.x86_64
[root@eve ~]# uname -a
Linux eve.xxxxxxxxxxxxxxxxxxx 6.3.5-100.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Tue May 30 15:43:51 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

[root@eve ~]# cat  /boot/grub2/grubenv
# GRUB Environment Block
# WARNING: Do not edit this file by tools other than grub-editenv!!!
saved_entry=7e390913b33b4e5ba8f960a9ba97aeee-6.2.15-200.fc37.x86_64
boot_success=1
root@eve ~]# 


Can this be fixed by  wiping /boot/efi /boot/grub2 and reinstalling grub2-pc & grub2-efi-x64 ??

Comment 14 Marta Lewandowska 2023-06-12 14:43:02 UTC
grub is a protected package (of course you can work around this but...) so removing it or just wiping files is not a great idea. But you can certainly remove (using yum/dnf) grub-efi* since you don't need those packages on BIOS, and then you can reinstall grub2* and it should install only packages you need. 
Your directory structure should look like in comment#12, so you shouldn't have grubenv or grub.cfg in /boot/efi/EFI/fedora -- shouldn't really have anything in there. Maybe also check for a soft link from /etc/grub*cfg to /boot/grub2/grub.cfg