Bug 1643302 - locked out: switching ttys and login broken with NVIDIA driver
Summary: locked out: switching ttys and login broken with NVIDIA driver
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: gdm   
(Show other bugs)
Version: 29
Hardware: Unspecified Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Ray Strode [halfline]
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-25 21:08 UTC by Fabio Valentini
Modified: 2018-11-08 07:48 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-11-08 03:16:13 UTC
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Fabio Valentini 2018-10-25 21:08:54 UTC
After upgrading from a working fedora Workstation 28 system to fedora 29, I can't log into a GNOME session anymore from gdm. Switching ttys doesn't work either. (Maybe those two things are related?)

- pressing Ctrl-Alt-FX key combinations has no effect
- logging in as my user results in a black screen, looks like gdm is killed and nothing else happens

The thing is, since I can't really get debug logs from the system (I'm completely locked out as soon as I try to log in and trigger the error). So, after hard-resetting the system, there's not necessarily something useful in the journal - and I don't want to torture my PC needlessly by repeatedly force-resetting it.


The only useful log messages I could salvage from when the system locked up were these messages, from when starting the X server for the user session was attempted:

(EE) NVIDIA(GPU-0): Failed to acquire modesetting permission.
(EE) NVIDIA(0): Failing initialization of X screen 0
(II) UnloadModule: "nvidia"
(II) UnloadSubModule: "glxserver_nvidia"
(II) Unloading glxserver_nvidia
(II) UnloadSubModule: "wfb"
(II) UnloadSubModule: "fb"
(EE) Screen(s) found, but none have a usable configuration.
(EE)
Fatal server error:
(EE) no screens found(EE)

(note the conflicting error messages: Screens found / no screens found)


Version-Release number of selected component (if applicable):
gdm-3.30.1-2.fc29
gnome-shell-3.30.1-1.fc29
mutter-3.30.1-1.fc29
nvidia-driver-410.66-2.fc29.x86_64
xorg-x11-xserver-Xorg-1.20.1-4.fc29

↑ these package versions are from when I first tried to upgrade to fedora 29. I attempted the upgrade again with the state of packages that are available from f29/updates-testing now, and the same error occurs still.


How reproducible:
Always.


Steps to Reproduce:
1. install (or upgrade to) fedora 29
2. install NVIDIA drivers (from negativo17 repo)
3. reboot
4. find yourself locked out of your system by broken gdm / GNOME session, and unable to switch to a tty for recovery and debugging


Actual results:
I can't use fedora 29.


Expected results:
After upgrading from fedora 28 to 29, system should continue to work.


Additional info:
The NVIDIA drivers aren't to blame here. I have the exact same version of the driver installed on fedora 28, and it works flawlessly.

- Enabling / disabling modesetting in the nvidia kernel module on the kernel command line changed nothing.

- Explicitly disabling wayland for gdm in /etc/gdm/custom.conf changed nothing (it started in X mode anyway, I could see that from the recovered system logs).

Comment 1 Mateusz Mikuła 2018-10-28 12:12:43 UTC
I cannot reproduce it on Rawhide with RPM Fusion Nvidia drivers but on Fedora 29 (both RPM Fusion and Negativo17) I can consistently reproduce it with `rhgb` present on the kernel command line and kernels 4.18.11 or newer.

I can successfully log in with kernel 4.18.10 and `rhgb` enabled or kernel >= 4.8.11 and `rhgb` disabled.

Comment 2 Mateusz Mikuła 2018-10-28 12:21:25 UTC
(In reply to Mateusz Mikuła from comment #1)
> I cannot reproduce it on Rawhide with RPM Fusion Nvidia drivers but on
> Fedora 29 (both RPM Fusion and Negativo17) I can consistently reproduce it
> with `rhgb` present on the kernel command line and kernels 4.18.11 or newer.
> 
> I can successfully log in with kernel 4.18.10 and `rhgb` enabled or kernel
> >= 4.8.11 and `rhgb` disabled.

My bad, Rawhide doesn't use `rhgb` by default it's the same as Fedora 29.

Comment 3 Fabio Valentini 2018-10-28 16:45:14 UTC
I can confirm that dropping rhgb from the kernel command line fixes the tty switching and login issue.

Comment 4 Michael Cronenworth 2018-10-31 14:47:41 UTC
FYI: I'm not seeing this problem on my setup. The 'rhgb' option is present.

Card: 750 Ti
Kernel: 4.18.16-300.fc29.x86_64
Driver: 410.73

Comment 5 Fabio Valentini 2018-10-31 15:23:46 UTC
I checked with the latest packages from today (Oct 31).

With this cmdline, the system works fine:

BOOT_IMAGE=/vmlinuz-4.18.16-300.fc29.x86_64 root=UUID=(...) ro rootflags=subvol=fedora-29 rd.driver.blacklist=nouveau nvidia-drm.modeset=0 resume=UUID=(...) rcu_nocbs=1-11 quiet

(/ is a btrfs subvolume; I disabled modesetting because wayland doesn't quite work right yet; rcu_nocbs=1-11 works around a ryzen 1st gen hardware bug)

Simply adding "rhgb" at the end still breaks tty switching and logging in.

Card:   GTX 1070
Kernel: 4.18.16-300.fc29.x86_64
Driver: 410.73 (nvidia-driver-410.73-4.fc29.x86_64)

Comment 6 Marijn Oosterveld 2018-11-01 12:27:15 UTC
(In reply to Fabio Valentini from comment #5)
> I checked with the latest packages from today (Oct 31).
> 
> With this cmdline, the system works fine:
> 
> BOOT_IMAGE=/vmlinuz-4.18.16-300.fc29.x86_64 root=UUID=(...) ro
> rootflags=subvol=fedora-29 rd.driver.blacklist=nouveau nvidia-drm.modeset=0
> resume=UUID=(...) rcu_nocbs=1-11 quiet
> 
> (/ is a btrfs subvolume; I disabled modesetting because wayland doesn't
> quite work right yet; rcu_nocbs=1-11 works around a ryzen 1st gen hardware
> bug)
> 
> Simply adding "rhgb" at the end still breaks tty switching and logging in.
> 
> Card:   GTX 1070
> Kernel: 4.18.16-300.fc29.x86_64
> Driver: 410.73 (nvidia-driver-410.73-4.fc29.x86_64)

I have the same hardware and kernel as you, but removing rhgb did not work for me.

Comment 7 Mateusz Mikuła 2018-11-03 16:29:31 UTC
(In reply to Marijn Oosterveld from comment #6)
> (In reply to Fabio Valentini from comment #5)
> > I checked with the latest packages from today (Oct 31).
> > 
> > With this cmdline, the system works fine:
> > 
> > BOOT_IMAGE=/vmlinuz-4.18.16-300.fc29.x86_64 root=UUID=(...) ro
> > rootflags=subvol=fedora-29 rd.driver.blacklist=nouveau nvidia-drm.modeset=0
> > resume=UUID=(...) rcu_nocbs=1-11 quiet
> > 
> > (/ is a btrfs subvolume; I disabled modesetting because wayland doesn't
> > quite work right yet; rcu_nocbs=1-11 works around a ryzen 1st gen hardware
> > bug)
> > 
> > Simply adding "rhgb" at the end still breaks tty switching and logging in.
> > 
> > Card:   GTX 1070
> > Kernel: 4.18.16-300.fc29.x86_64
> > Driver: 410.73 (nvidia-driver-410.73-4.fc29.x86_64)
> 
> I have the same hardware and kernel as you, but removing rhgb did not work
> for me.

Multiple users with Nvidia reported this workaround as working on Reddit, could you make sure `rhgb` was removed by running `cat /proc/cmdline`?

Comment 8 simon.galton 2018-11-04 00:51:33 UTC
(In reply to Mateusz Mikuła from comment #7)
> (In reply to Marijn Oosterveld from comment #6)
> > (In reply to Fabio Valentini from comment #5)
> > > I checked with the latest packages from today (Oct 31).
> > > 
> > > With this cmdline, the system works fine:
> > > 
> > > BOOT_IMAGE=/vmlinuz-4.18.16-300.fc29.x86_64 root=UUID=(...) ro
> > > rootflags=subvol=fedora-29 rd.driver.blacklist=nouveau nvidia-drm.modeset=0
> > > resume=UUID=(...) rcu_nocbs=1-11 quiet
> > > 
> > > (/ is a btrfs subvolume; I disabled modesetting because wayland doesn't
> > > quite work right yet; rcu_nocbs=1-11 works around a ryzen 1st gen hardware
> > > bug)
> > > 
> > > Simply adding "rhgb" at the end still breaks tty switching and logging in.
> > > 
> > > Card:   GTX 1070
> > > Kernel: 4.18.16-300.fc29.x86_64
> > > Driver: 410.73 (nvidia-driver-410.73-4.fc29.x86_64)
> > 
> > I have the same hardware and kernel as you, but removing rhgb did not work
> > for me.
> 
> Multiple users with Nvidia reported this workaround as working on Reddit,
> could you make sure `rhgb` was removed by running `cat /proc/cmdline`?

I also had this problem, even after removing 'rhgb'.  I was able to get it to work by setting the nvidia-drm.modeset value to 1:

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.18.16-300.fc29.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap quiet rd.driver.blacklist=nouveau nvidia-drm.modeset=1

Comment 9 Alain V. 2018-11-04 08:12:32 UTC
I have same issue

Card: GeForce GTX 960
Kernel: 4.18.16-300.fc29.x86_64
Driver: 410.73

removing 'rhgb' is the workaround for me. No need for nvidia-drm.modeset=1.

Comment 10 Ray Strode [halfline] 2018-11-04 12:26:45 UTC
my guess is we need this fix in plymouth https://gitlab.freedesktop.org/plymouth/plymouth/commit/89283f38b04a6543484b35576af296651bc3c0ba 

will look tomorrow

Comment 11 Armin Beširović 2018-11-05 10:49:22 UTC
+1 on removing rhgb. System details:

# 03:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 980] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: eVga.com. Corp. Device 2983
	Flags: bus master, fast devsel, latency 0, IRQ 133
	Memory at de000000 (32-bit, non-prefetchable) [size=16M]
	Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Memory at d0000000 (64-bit, prefetchable) [size=32M]
	I/O ports at e000 [size=128]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [250] Latency Tolerance Reporting
	Capabilities: [258] L1 PM Substates
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Capabilities: [900] Secondary PCI Express <?>
	Kernel driver in use: nvidia
	Kernel modules: nouveau, nvidia_drm, nvidia

# uname -a
Linux bluelion 4.18.16-300.fc29.x86_64 #1 SMP Sat Oct 20 23:24:08 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Comment 12 Ray Strode [halfline] 2018-11-05 21:00:29 UTC
can you guys try this update:

https://bodhi.fedoraproject.org/updates/FEDORA-2018-89d998abe5

make sure to run 

# dracut -f

after installing it to rebuild the initramfs

Comment 13 Mateusz Mikuła 2018-11-06 10:00:29 UTC
FEDORA-2018-89d998abe5 works fine with Negativo17 Nvidia drivers (modeset enabled).

Comment 14 Fedora Update System 2018-11-06 22:01:39 UTC
plymouth-0.9.4-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-89d998abe5

Comment 15 Fabio Valentini 2018-11-07 10:08:43 UTC
Yep, this update seems to fix the issue.

Comment 16 Fedora Update System 2018-11-08 03:16:13 UTC
plymouth-0.9.4-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 17 Alain V. 2018-11-08 07:48:02 UTC
I confirm my issue is solved with this change, among the other changes pulled in by "dnf update", today.

Thank you.
Alain


Note You need to log in before you can comment on or make changes to this bug.