Bug 2071209

Summary: Laptop display is not turning on with simpledrm driver in kernel and Nvidia driver
Product: [Fedora] Fedora Reporter: vtq <vtq-gnome>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CANTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 36CC: aannoaanno, acaringi, adscvr, airlied, alciregi, bskeggs, develop, fmartine, garrett.mitchener, hdegoede, hpa, jarodwilson, jglisse, jonathan, josef, kernel-maint, kparal, lgoncalv, linville, masami256, mchehab, negativo17, ptalbert, rgnoble, sstorey, steved
Target Milestone: ---Keywords: CommonBugs
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: https://ask.fedoraproject.org/t/common-issues/22440
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-16 14:56:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg from kernel-5.17.1-300.fc36.x86_64 with rpmfusion driver package
none
dmesg from kernel-5.17.1-300.fc36.x86_64 but with efifb instead of simpledrm, rpmfusion driver package
none
dmesg from kernel-5.17.1-300.fc36.x86_64 but with efifb instead of simpledrm, nvidia .run driver
none
dmesg from kernel-5.17.1-300.fc36.x86_64 with nvidia .run driver
none
dmesg from kernel-5.17.1-300.fc36.x86_64 but with efifb instead of simpledrm, nvidia .run driver
none
dmesg efifb blacklisted none

Description vtq 2022-04-02 08:00:59 UTC
Created attachment 1870090 [details]
dmesg from kernel-5.17.1-300.fc36.x86_64 with rpmfusion driver package

1. Please describe the problem:

On my laptop with a display driven by an Nvidia 3060 GPU with latest Nvidia driver version 510.60.02, the display goes dark during boot process and is not turning on again. There is no image and no backlight, although /sys/class/drm/card0-eDP-1/status is 'connected' and /sys/class/drm/card0-eDP-1/dpms is 'on'. Switching to VT or back with Ctrl+Alt+Fx also does not turn on the display.

If an external display is connected via USB-C to the Nvidia GPU, the external display works normally. I can type in the password blindly in GDM and get to GNOME desktop. In GNOME Settings the laptop display is recognized and settings like refresh rate and arrangement can be changed but it remains off. 

This seems to be related to the recent change in kernel of switching the framebuffer driver from efifb to simpledrm. However, this issue is happening with Nvidia driver installed via both the official .run installer or the RPM Fusion packages, which supposedly already include patches related to simpledrm like https://github.com/rpmfusion/nvidia-kmod/blob/master/nvidia-kmod-simpledrm.patch and https://github.com/rpmfusion/nvidia-kmod/blob/master/nvidia-kmod-pci-request-regions.patch. 

Simply appending initcall_blacklist=simpledrm_platform_driver_init to the kernel command line does not fix this issue. But I can revert to efifb by compiling the kernel with the following options:
CONFIG_FB_EFI=y
# CONFIG_SYSFB_SIMPLEFB is not set
and then the laptop display starts to work as expected. VT is working only if the Nvidia driver is from the official .run installer, but not with the RPM Fusion packages. I understand that it may well be Nvidia's responsibility to fix this on the driver side, but hope that the change to simpledrm could perhaps be held back before the driver is ready.


2. What is the Version-Release number of the kernel:

kernel-5.17.1-300.fc36.x86_64


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

It was working in Fedora 35 with latest kernel-5.16.18-200 and also works if I boot into this kernel in Fedora 36. The issue appeared after upgrading to Fedora 36 beta with kernel-5.17.0-0.rc7.116.


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

This issue is reproducible, with both upgraded install and new install of Fedora 36 beta. Steps for a new install from 36 beta live image:
- Install Fedora from the live image to local disk, reboot and run dnf update. 
- Either enable rpmfusion repos and install akmod-nvidia from there, or download Nvidia official .run installer from their website and install it (this also requires dkms and libglvnd-devel) and run depmod.
- Append 'rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1' to GRUB_CMDLINE_LINUX in /etc/default/grub if it's not already there, run 'grub2-mkconfig -o /etc/grub2-efi.cfg' and then reboot.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

The Nvidia driver does not compile successfully with kernel-5.18.0-0.rc0.20220331git787af64d05cd.13.fc37.x86_64.


6. Are you running any modules that not shipped with directly Fedora's kernel?:

Nvidia official display driver: nvidia nvidia-drm nvidia-modeset nvidia-uvm.


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

See attached.

Comment 1 vtq 2022-04-02 08:06:02 UTC
Created attachment 1870091 [details]
dmesg from kernel-5.17.1-300.fc36.x86_64 but with efifb instead of simpledrm, rpmfusion driver package

Comment 2 vtq 2022-04-02 08:07:30 UTC
Created attachment 1870092 [details]
dmesg from kernel-5.17.1-300.fc36.x86_64 but with efifb instead of simpledrm, nvidia .run driver

Comment 3 vtq 2022-04-02 08:09:17 UTC
Created attachment 1870093 [details]
dmesg from kernel-5.17.1-300.fc36.x86_64 with nvidia .run driver

Comment 4 vtq 2022-04-02 08:17:10 UTC
Created attachment 1870095 [details]
dmesg from kernel-5.17.1-300.fc36.x86_64 but with efifb instead of simpledrm, nvidia .run driver

Comment 5 Javier Martinez Canillas 2022-04-04 11:21:01 UTC
(In reply to vtq from comment #0)
> Created attachment 1870090 [details]
> dmesg from kernel-5.17.1-300.fc36.x86_64 with rpmfusion driver package
> 

Yes, there are known issues with the Nvidia driver that is relying on the
efifb driver for at least VT support.

> 
> Simply appending initcall_blacklist=simpledrm_platform_driver_init to the

If not allowing the simpledrm driver to be initialized doesn't make it work
then the problem is not with this driver.

> kernel command line does not fix this issue. But I can revert to efifb by
> compiling the kernel with the following options:
> CONFIG_FB_EFI=y
> # CONFIG_SYSFB_SIMPLEFB is not set

What happens if you do this but boot with initcall_blacklist=efifb_driver_init ?

> and then the laptop display starts to work as expected. VT is working only
> if the Nvidia driver is from the official .run installer, but not with the
> RPM Fusion packages. I understand that it may well be Nvidia's

I believe is due the RPM Fusion package has the patch that removes the conflicting
framebuffers and that causes the efifb fbdev to be unregistered when the Nvidia DRM
driver probes.

Comment 6 vtq 2022-04-05 08:18:52 UTC
Ok, it does look like this is not due to the use of simpledrm but rather missing efifb. If I use initcall_blacklist=efifb_driver_init for the kernel with efifb built-in, the laptop display is also not lighting up, same as the simpledrm case.

Comment 7 vtq 2022-04-05 08:20:28 UTC
Created attachment 1870816 [details]
dmesg efifb blacklisted

Comment 8 vtq 2022-05-12 00:48:50 UTC
The issue is still present on Fedora 36 final release, kernel 5.17.6, and latest beta driver 515.43.04 with the proprietary kernel modules. With the new open source kernel modules, GDM shows up on the laptop display (but not the external display) and everything will freeze soon after. So it is still necessary to either re-build the kernel with efifb or switch to iGPU (hybrid graphics) as a workaround for now.

Also saw another report of this issue: https://www.reddit.com/r/Fedora/comments/ueq4jm/internal_laptop_screen_black_in_discrete_graphics/

Although I don't know how many hardware setups are affected, given the severity of this issue, should it be documented in F36 Common issues at https://ask.fedoraproject.org/tags/c/common-issues/141/f36?

Comment 9 Steve Storey 2022-05-14 12:49:41 UTC
(In reply to Javier Martinez Canillas from comment #5)

> I believe is due the RPM Fusion package has the patch that removes the

Thank you for the patch in the first place :)

> conflicting framebuffers and that causes the efifb fbdev to be unregistered
> when the Nvidia DRM driver probes.

I thought that the Fedora config changes meant that there *is* no efifb driver [1]
loaded in the kernel at all any more (so more that it's not present in the first
place as opposed to being removed by the patch ?).

If that is the case, what are the chances of getting FB_EFI re-enabled in the
stock kernel for such users ?

[1] - https://src.fedoraproject.org/rpms/kernel/blob/a6037821c00f3ad66c6f75f6de4d58b8f04f04d3/f/kernel-x86_64-fedora.config#_1843

Comment 10 Javier Martinez Canillas 2022-05-14 13:44:34 UTC
(In reply to Steve Storey from comment #9)
> (In reply to Javier Martinez Canillas from comment #5)
> 
> > I believe is due the RPM Fusion package has the patch that removes the
> 
> Thank you for the patch in the first place :)
>

You are welcome.
 
> > conflicting framebuffers and that causes the efifb fbdev to be unregistered
> > when the Nvidia DRM driver probes.
> 
> I thought that the Fedora config changes meant that there *is* no efifb
> driver [1]
> loaded in the kernel at all any more (so more that it's not present in the
> first
> place as opposed to being removed by the patch ?).
>

Yes, but the user mentioned in the description that built a custom kernel with CONFIG_FB_EFI=y
but that it only worked with the official driver, not the one from rpmfusion. I explained that
CONFIG_FB_EFI with the rpmfusion build won't work because that patch would made the efifb driver
to be unregistered when the Nvidia DRM driver probes.

In other words, the Nvidia DRM driver could not rely anymore on using the efifb framebuffer to
bind to the framebuffer console and have VT support.
 
> If that is the case, what are the chances of getting FB_EFI re-enabled in the
> stock kernel for such users ?
>

Not likely. I understand that this is not ideal for Nvidia users but we can't hold back changes
and improve the support for all other graphic devices just because bugs in the Nvidia driver
that they are not fixing.

Comment 11 Javier Martinez Canillas 2022-05-16 14:56:33 UTC
This is a known issue with the Nvidia driver and there's nothing we can in the Fedora kernel:

https://ask.fedoraproject.org/t/no-virtual-terminal-vt-with-the-nvidia-driver/22440

Comment 12 Christian 2022-05-16 22:06:13 UTC
Why is this closed as CANTFIX?

It could easily be fixed by re-enabling efifb in the kernel for people who want or need to use that module.
As far as I know having simplefb compiled in doesn't mean that efifb could be built in as well, other distributions manage to do exactly that.

That sounds to me rather like a "we could but don't want to fix", and there I'd like to hear the reasoning, since this obviously affects all fedora users with a rather common hardware setup.

Comment 13 Javier Martinez Canillas 2022-05-17 10:16:17 UTC
(In reply to Christian from comment #12)
> Why is this closed as CANTFIX?
> 
> It could easily be fixed by re-enabling efifb in the kernel for people who
> want or need to use that module.
> As far as I know having simplefb compiled in doesn't mean that efifb could
> be built in as well, other distributions manage to do exactly that.
> 

It soon will break for them since there are patches posted to remove all the
platform devices that bind to {simple,of,efi}fb drivers once a real driver is
probed. That is, the "efi-framebuffer" platform device that matches the efifb
fbdev driver will be unregistered once a DRM driver probes, and so loading the
efifb module loading to be a no-op operation.

> That sounds to me rather like a "we could but don't want to fix", and there
> I'd like to hear the reasoning, since this obviously affects all fedora
> users with a rather common hardware setup.

We could build efifb as a module and papering over the issue and things may
work until they break again and users complain. Sorry, but the truth is that
the Nvidia proprietary driver was relying on the efifb driver and things used
to work but it was due sheer luck.

If the Nvidia driver wants to have VT and the framebuffer console to bind to
it, needs to register an emulated fbdev device. Anything else just works by
coincidence.

Comment 14 Javier Martinez Canillas 2022-05-17 10:39:46 UTC
But you are correct that this is an inconvenience for Nvidia users, we will investigate if we can implement a workaround for this.

Comment 15 Christian 2022-05-17 11:41:50 UTC
I assume that at least for users of newer GPUs, the new FOSS driver might resolve this as I expect it to integrate more with kernel interfaces such as the framebuffer. 
However, this is long term and only for newer products, so short term, at least for the Fedora 36 and if possible 37 lifecycle, it would be great to have a workaround. 

I assume that leaving the efifb built in  (if possible, module always require shoving it into the initrd) and setting a kernel parameter (which nvidia packages, e.g. from rpmfusion, can and I think already do anyway) to use that should work. 

Thanks for looking into it.

Comment 16 Steve Storey 2022-05-17 11:54:32 UTC
(In reply to Javier Martinez Canillas from comment #13)
> 
> If the Nvidia driver wants to have VT and the framebuffer console to bind to
> it, needs to register an emulated fbdev device. Anything else just works by
> coincidence.

Is the patch you made in Feb enough to register the emulated fb dev device? [1]

> It soon will break for them since there are patches posted to remove all the
> platform devices that bind to {simple,of,efi}fb drivers once a real driver is
> probed. That is, the "efi-framebuffer" platform device that matches the efifb
> fbdev driver will be unregistered once a DRM driver probes, and so loading
> the
> efifb module loading to be a no-op operation.

If the answer to the previous question is 'yes' - then is that still enough once
these patches to remove the platform devices get released ?

[1] - https://github.com/negativo17/nvidia-driver/issues/129#issuecomment-1126971188 - sorry, don't know where the canonical version of the patch is

Incidentally, the ticket above ^^ suggests that not _everyone_ apparently has
problems, even with VTs, but so far, it's not clear what determines whether
you do or don't have problems. Any idea why it might work for some and not
others?

Comment 17 Javier Martinez Canillas 2022-05-17 12:12:24 UTC
(In reply to Christian from comment #15)
> I assume that at least for users of newer GPUs, the new FOSS driver might
> resolve this as I expect it to integrate more with kernel interfaces such as
> the framebuffer. 
> However, this is long term and only for newer products, so short term, at
> least for the Fedora 36 and if possible 37 lifecycle, it would be great to
> have a workaround. 
> 
> I assume that leaving the efifb built in  (if possible, module always
> require shoving it into the initrd) and setting a kernel parameter (which
> nvidia packages, e.g. from rpmfusion, can and I think already do anyway) to
> use that should work. 
>

Yes, that's what I was thinking too. Having both simpledrm and efifb built-in
and a kernel cmdline param to decide which one to use. It's and ugly workaround
but it may be something like that needed in the meantime...
 
> Thanks for looking into it.

You are welcome. It's not that we want to make life of users harder but is just
that using simpledrm solves other issues (i.e: having a DRI interface and Gnome
wayland sessions with nomodeset, etc). So reverting the change would break that
for users whose DRM drivers are doing the correct thing. So is not fair either.

Comment 18 Javier Martinez Canillas 2022-05-17 12:18:54 UTC
(In reply to Steve Storey from comment #16)
> (In reply to Javier Martinez Canillas from comment #13)
> > 
> > If the Nvidia driver wants to have VT and the framebuffer console to bind to
> > it, needs to register an emulated fbdev device. Anything else just works by
> > coincidence.
> 
> Is the patch you made in Feb enough to register the emulated fb dev device?
> [1]
>

It's part of the solution but unfortunately not enough. There are more changes
needed in the proprietary driver so only Nvidia is able to fix that...
 
> > It soon will break for them since there are patches posted to remove all the
> > platform devices that bind to {simple,of,efi}fb drivers once a real driver is
> > probed. That is, the "efi-framebuffer" platform device that matches the efifb
> > fbdev driver will be unregistered once a DRM driver probes, and so loading
> > the
> > efifb module loading to be a no-op operation.
> 
> If the answer to the previous question is 'yes' - then is that still enough
> once
> these patches to remove the platform devices get released ?
> 
> [1] -
> https://github.com/negativo17/nvidia-driver/issues/129#issuecomment-
> 1126971188 - sorry, don't know where the canonical version of the patch is
> 
> Incidentally, the ticket above ^^ suggests that not _everyone_ apparently has
> problems, even with VTs, but so far, it's not clear what determines whether
> you do or don't have problems. Any idea why it might work for some and not
> others?

I'm not that familiar with Nvidia cards to understand why not everyone is having
the issue. But yes, it seems that there are some Nvidia card that work correctly.

Comment 19 Christian 2022-05-17 13:17:44 UTC
I can gladly do some tests, what I can already say is that for me VTs don't work with a F36 kernel config on an RTX3070 (thus a newer model, ampere generation), but I do set a custom resolution by setting it in grub and using the keepvideo paramater. What I could try, once I am home, is if e.g. not setting a custom resolution changes anything) 

My best guess, however, would be that it depends on the GPU generation. 

What also could be interesting is comparing a vanilla installed nvidia module and one installed via rpmfusion rpms, since the latter do apply a patch that modifies framebuffer behaviour.

Short term, however, I doubt there is a quick and guaranteed-to-work solution other than keeping the removed fb modules compiled in kernel and switching to them via a kernel command line.

Comment 20 Hans de Goede 2022-05-17 13:20:27 UTC
> I'm not that familiar with Nvidia cards to understand why not everyone is having
the issue. But yes, it seems that there are some Nvidia card that work correctly.

Could it be that this is not card-model related. But rather related to people booting in BIOS mode (and thus getting a vgacon which likely sticks around) vs booting in EFI mode and thus having the efifb issue ?

Comment 21 Javier Martinez Canillas 2022-05-17 13:41:32 UTC
(In reply to Hans de Goede from comment #20)
> > I'm not that familiar with Nvidia cards to understand why not everyone is having
> the issue. But yes, it seems that there are some Nvidia card that work
> correctly.
> 
> Could it be that this is not card-model related. But rather related to
> people booting in BIOS mode (and thus getting a vgacon which likely sticks
> around) vs booting in EFI mode and thus having the efifb issue ?

That indeed sounds quite plausible. Thanks for pointing out this Hans.

Comment 22 Steve Storey 2022-05-17 13:56:56 UTC
(In reply to Hans de Goede from comment #20)
> Could it be that this is not card-model related. But rather related to
> people booting in BIOS mode (and thus getting a vgacon which likely sticks
> around) vs booting in EFI mode and thus having the efifb issue ?

On F35, I was using BIOS boot, and with updates-testing repo / stabilization
repo, experienced the issue with my RTX 2060 around Feb when the 5.17 kernel
was being prepared for stable, where the simpledrm change had been made there
as a way to test it in preparation for F36 [1], as did others (tho I cannot
now find the discussion thread for this :( ).

With F36 I have also moved to UEFI (in place manual change to add ESP and change
boot settings), and I certainly get the same issue booting to sddm.

Now I'm thinking about it however, I think I might have had access to the
VTs on f35, but definitely don't on F36.

[1] https://src.fedoraproject.org/rpms/kernel/c/8db3d1f47fd8f9dfa6c83e5e6c20dde1109899cf?branch=stabilization is where
    the change was initially applied for testing, it was later rolled back somwehre when moving to the stable f35 branch

Comment 23 Christian 2022-05-17 14:41:05 UTC
with regards to UEFI versus BIOS: 
the machine I can reproduce it with was always booted in UEFI mode, and if I boot with a stock Fedora 35 kernel under Fedora 36 VTs work, 
if I boot with a stock Fedora 36 kernel  (same boot parameters, same UEFI config, same resolution) they do not. 

Kind regards, 

Christian

Comment 24 Simone Caronni 2022-05-17 14:54:28 UTC
(In reply to Javier Martinez Canillas from comment #21)
> (In reply to Hans de Goede from comment #20)
> > > I'm not that familiar with Nvidia cards to understand why not everyone is having
> > the issue. But yes, it seems that there are some Nvidia card that work
> > correctly.
> > 
> > Could it be that this is not card-model related. But rather related to
> > people booting in BIOS mode (and thus getting a vgacon which likely sticks
> > around) vs booting in EFI mode and thus having the efifb issue ?
> 
> That indeed sounds quite plausible. Thanks for pointing out this Hans.

We have at work some non-Ampere GTX cards and some Optimus laptops with various generations of Quadro cards, the latter without connected outputs on the Nvdia GPU. All of them boot with UEFI, no CSM compatibility.

The computers with the discrete card work "fine", I just see a non usable second screen of 1024x768 connected to nowhere due to the simpledrm driver, but apart from that everything works fine, tty switching, cosole, wayland/X, etc. Adding the various patches or kernel boot parameter I see only one screen but I have all the other issues mentioned (lockups on tty switches or blank screen, etc).

On the Optimus laptops, the Intel card + simpledrm are working perfectly fine and the nvidia card is only seldom used, there is absolutely no issue, as the nvidia card is not used to drive the main display and does not need to restore the screen on VT switches. Adding the patches or the kernel boot parameter of course now screw up everything.

Nothing particularly blocking on the desk side workstations and no issue at all on the laptops.

Comment 25 Javier Martinez Canillas 2022-05-19 18:37:15 UTC
I've proposed the following workaround https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1788.

TL; DR the kernel will choose to either register a platform device that binds to simpledrm or {efi,vesa}fb depending on whether nvidia-drm.modeset=1 is set or not.

Comment 26 Steve Storey 2022-05-20 08:05:35 UTC
(In reply to Javier Martinez Canillas from comment #25)
> I've proposed the following workaround
> https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1788.
> 
> TL; DR the kernel will choose to either register a platform device that
> binds to simpledrm or {efi,vesa}fb depending on whether nvidia-drm.modeset=1
> is set or not.

That's great, thank you! I see that the MR got merged - is there a convenient way
for me to test it locally? Will it be automatically included into the next kernel
update on koji ?

Comment 27 Javier Martinez Canillas 2022-05-20 08:21:17 UTC
(In reply to Steve Storey from comment #26)
> (In reply to Javier Martinez Canillas from comment #25)
> > I've proposed the following workaround
> > https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1788.
> > 
> > TL; DR the kernel will choose to either register a platform device that
> > binds to simpledrm or {efi,vesa}fb depending on whether nvidia-drm.modeset=1
> > is set or not.
> 
> That's great, thank you! I see that the MR got merged - is there a
> convenient way
> for me to test it locally? Will it be automatically included into the next
> kernel
> update on koji ?

It will be included in the next Fedora 5.17.x build, yes.

Comment 28 Christian 2022-05-22 17:27:41 UTC
Hey, 

thanks a lot for the fix. Could you give a quick ping here when a kernel with the patch in is in bodhi? 

fuchs@deskfox ~ % grep FB_EFI /boot/config-5.17.9-300.fc36.x86_64 
# CONFIG_FB_EFI is not set

it seems it didn't make it into that one in time, and I'll gladly test and give feedback once there is one. 

Thanks in advance and kind regards!

Comment 29 Simone Caronni 2022-05-23 08:33:27 UTC
(In reply to Javier Martinez Canillas from comment #25)
> I've proposed the following workaround
> https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1788.
> 
> TL; DR the kernel will choose to either register a platform device that
> binds to simpledrm or {efi,vesa}fb depending on whether nvidia-drm.modeset=1
> is set or not.

Should it have any impact on Optimus laptops? We're rolling out Fedora 36 everywhere at work and it works just fine with intel/simpledrm and nvidia does not take care of driving fbcon consoles.

Comment 30 Kamil Páral 2022-05-23 11:21:44 UTC
(In reply to Javier Martinez Canillas from comment #25)
> TL; DR the kernel will choose to either register a platform device that
> binds to simpledrm or {efi,vesa}fb depending on whether nvidia-drm.modeset=1
> is set or not.

Javier, do I understand correctly that nvidia-drm.modeset=1 has to be specified by the user manually? Or it set internally by the nvidia binary driver?

Comment 31 Javier Martinez Canillas 2022-05-23 11:43:24 UTC
(In reply to Kamil Páral from comment #30)
> (In reply to Javier Martinez Canillas from comment #25)
> > TL; DR the kernel will choose to either register a platform device that
> > binds to simpledrm or {efi,vesa}fb depending on whether nvidia-drm.modeset=1
> > is set or not.
> 
> Javier, do I understand correctly that nvidia-drm.modeset=1 has to be
> specified by the user manually? Or it set internally by the nvidia binary
> driver?

It has to be manually set. But I thought that's already the case to use the
proprietary driver instead of Nouveau ? That's why I chose that one.

Although I don't have experience with Nvidia nor machines with an Nvidia GPU
to do any testing...

Comment 32 Christian 2022-05-23 11:50:05 UTC
(In reply to Javier Martinez Canillas from comment #31)
> (In reply to Kamil Páral from comment #30)
> > (In reply to Javier Martinez Canillas from comment #25)
> > > TL; DR the kernel will choose to either register a platform device that
> > > binds to simpledrm or {efi,vesa}fb depending on whether nvidia-drm.modeset=1
> > > is set or not.
> > 
> > Javier, do I understand correctly that nvidia-drm.modeset=1 has to be
> > specified by the user manually? Or it set internally by the nvidia binary
> > driver?
> 
> It has to be manually set. But I thought that's already the case to use the
> proprietary driver instead of Nouveau ? That's why I chose that one.
> 
> Although I don't have experience with Nvidia nor machines with an Nvidia GPU
> to do any testing...

For me it is set, but I don't remember if I have set it manually or if it was set by the driver (from rpmfusion). 

It's a sane option to take, since you set it when you want the nvidia drivers drm part to take care of framebuffer modesetting. 
If you have an optimus setup where the intel card does the actual connection to the montiors, you usually neither want nor set that.

I have an optimus laptop at hand, however, it's a special case (a thinkpad, where you can switch between optimus, discreet only or discreet off) and I can gladly test with that one, once the kernel is in bodhi.
I'm afraid I don't have any optimus devices at hand where actual stuff is done by the intel card and the heavy lifting offloaded to nvidia.

Comment 33 Simone Caronni 2022-05-23 13:00:53 UTC
I can test on the laptops here at work, can we get a heads up here once it's in Bodhi? Theoretically for Optimus laptops not much should change as well, as the fbcon on simpledrm is replaced by the fbcon on efifb?...

(In reply to Christian from comment #32)
> I have an optimus laptop at hand, however, it's a special case (a thinkpad,
> where you can switch between optimus, discreet only or discreet off)

May I ask you which model is that? Never seen one that allows you to do three, normally just optimus or discreet only.

Comment 34 Christian 2022-05-23 13:10:10 UTC
(In reply to Simone Caronni from comment #33)

> May I ask you which model is that? Never seen one that allows you to do
> three, normally just optimus or discreet only.

I might misremember the third option then, it's an older model, T430. I can check later on when not working.

Comment 35 Simone Caronni 2022-05-24 06:54:33 UTC
(In reply to Javier Martinez Canillas from comment #25)
> I've proposed the following workaround
> https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1788.
> 
> TL; DR the kernel will choose to either register a platform device that
> binds to simpledrm or {efi,vesa}fb depending on whether nvidia-drm.modeset=1
> is set or not.

Does https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1788/diffs#39748a97d7b231ca66cdf8a0b6292952ce8c2350_37_51 work only when the modeset parameter is passed on the kernel command line?

The module is not in the initrd, so the normal way to assign it to the module is at module load time, i.e. /etc/modprobe.d/something.conf:

options nvidia-drm modeset=1

Thanks.

Comment 36 Javier Martinez Canillas 2022-05-24 07:16:48 UTC
(In reply to Simone Caronni from comment #35)
> (In reply to Javier Martinez Canillas from comment #25)
> > I've proposed the following workaround
> > https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1788.
> > 
> > TL; DR the kernel will choose to either register a platform device that
> > binds to simpledrm or {efi,vesa}fb depending on whether nvidia-drm.modeset=1
> > is set or not.
> 
> Does
> https://gitlab.com/cki-project/kernel-ark/-/merge_requests/1788/
> diffs#39748a97d7b231ca66cdf8a0b6292952ce8c2350_37_51 work only when the
> modeset parameter is passed on the kernel command line?
> 

Yes, it only works when is in the kernel command line. Otherwise it is not
an early parameter and can't be used by drivers/firmware/sysfb.c (that is
built-in) to figure out whether it has to register an "efi-framebuffer" or
"simple-framebuffer" platform device.

Comment 37 Simone Caronni 2022-05-25 06:56:05 UTC
I don't think there is an easy solution until Nvidia provides a proper fix, as one change breaks something else.

The parameter should not be enforced on the kernel command line by the packages as it must not be present (i.e. set at 0) also for Mosaic and SLI setups, where modeset is not supported. So the user needs an option to disable it and it must not reappear on the system if the user does not want to.

Nvidia is shipping the driver with modeset disabled (https://github.com/NVIDIA/yum-packaging-nvidia-kmod-common/blob/main/nvidia.conf#L23), in our setup is in the configuration file and not on the kernel command line.

I would be in favour of shipping no workaround at all.

Let's see how it goes :/

Comment 38 Steve Storey 2022-05-25 07:27:48 UTC
(In reply to Simone Caronni from comment #37)
> I don't think there is an easy solution until Nvidia provides a proper fix,
> as one change breaks something else.

I don't think that this change on its own would break something else ? Given that
your kmod etc. won't be applying the setting on the kernel command line, then
shipping this change in the kernel won't change anything for you?

> The parameter should not be enforced on the kernel command line by the
> packages as it must not be present (i.e. set at 0) also for Mosaic and SLI
> setups, where modeset is not supported. So the user needs an option to
> disable it and it must not reappear on the system if the user does not want
> to.

Agree (at least until we know which group is in the majority). This can be an
opt-IN workaround for people like me who have issues.

> I would be in favour of shipping no workaround at all.

That would leave me, and all the others like me with a completely broken system -
no virtual terminals, no graphical UI - unless either we rebuild the kmod to
include Javier's patches on top (which still won't give VTs), or rebuild the
kernel either disabling simplefb outright, or applying this patch.

At least with this patch - nothing changes by default for anyone, but people
have the option to be able to apply a oneline config change to fix both graphical
setup and VTs, all with the stock kernel.

I am therefore very much in favour of shpiping it :)

I do wonder whether just having CONFIG_FB_EFI = y/m enabled and then being able
to use the CLI argument to just disable the simplefb init (I can't now find the
foo for that) would be enough actually? That then might also allow people with
Mosaic / SLI setups to work as well (that doesn't stop this workaround from
shipping as-is - it would just be a different config parameter for them to apply
to the kernel command line)

Comment 39 Christian 2022-05-25 14:03:00 UTC
(In reply to Steve Storey from comment #38)
> (In reply to Simone Caronni from comment #37)

> > I would be in favour of shipping no workaround at all.
> 
> That would leave me, and all the others like me with a completely broken
> system -
> no virtual terminals, no graphical UI - unless either we rebuild the kmod to
> include Javier's patches on top (which still won't give VTs), or rebuild the
> kernel either disabling simplefb outright, or applying this patch.
> 
> At least with this patch - nothing changes by default for anyone, but people
> have the option to be able to apply a oneline config change to fix both
> graphical
> setup and VTs, all with the stock kernel.
> 
> I am therefore very much in favour of shpiping it :)

Fully agree here. 

If that kernel parameter is problematic because it might be set by people who don't want it, changing it to something that needs to be set explicitly and manually is perfectly fine with me, and still better than the option of either having to boot F36 with an old F35 kernel, recompile the F36 kernel with a changed configuration (because the manuals for that seem to be a bit outdated, I am used to compiling kernels on my gentoo boxes, but with fedora and the official, not-vanilla ones I haven't managed yet, at least not with the commands found in the docs) or have no VTs available. 

Kind regards, 

Christian

Comment 40 Steve Storey 2022-05-26 07:59:48 UTC
https://bodhi.fedoraproject.org/updates/FEDORA-2022-8095b23575 looks to working great for me (and I've added +ive karma there) - thanks!

Comment 41 Christian 2022-05-26 12:50:53 UTC
Heya, 

thank you very much, https://bodhi.fedoraproject.org/updates/FEDORA-2022-8095b23575 works if I use the vanilla nvidia driver, I finally have a VT again, and even full res. 
It does _not_ work with the rpmfusion driver, but that is most likely due to their custom patch that they apply, so I guess we shall now go report a bug there that they remove that for kernels >= the one with the fix. 

Thank you very much again, I hope that nvidia will soon fix that on their end, so that no workarounds or patches are needed on either side. 

Kind regards, 

Christian

Comment 42 Christian 2022-05-26 13:06:09 UTC
PS: https://bugzilla.rpmfusion.org/show_bug.cgi?id=6313  created a bug report to rpmfusion so they do remove that workaround. 

PSPS: removing the rpmfusion driver did actually remove rd.driver.blacklist=nouveau and nvidia-drm.modeset=1 from my kernel command line, so my guess would be they set it, I had to manually re-set it after installing the driver manually.