Bug 1825046 - NVIDIA Turing GPU "secure boot" is broken and can lead to full system lockups
Summary: NVIDIA Turing GPU "secure boot" is broken and can lead to full system lockups
Keywords:
Status: ON_QA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 32
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker AcceptedFreezeException
Depends On:
Blocks: F32FinalFreezeException
TreeView+ depends on / blocked
 
Reported: 2020-04-16 23:01 UTC by Ben Skeggs
Modified: 2020-04-24 06:08 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)

Description Ben Skeggs 2020-04-16 23:01:51 UTC
Due to a few missing MODULE_FIRMWARE() lines in the Nouveau DRM driver causing the SEC2 RTOS to be missing from initramfs, the high secure firmware binaries will fail to properly initialise the GPU, leaving it in an odd state which can lead to full system hangs.

I've received two systems from Lenovo (Thinkpad P1, and P53) that are considered very important to be working correctly with Fedora 32, and the issue manifests most easily as suspend/resume failing in Discrete GPU mode.

Other, more severe, failure methods are possible due to the undefined state of the GPU after the ASB firmware has failed to load.

The patch has already been pulled into the Fedora kernel for the next build, and this bug is to propose the issue as a blocker so the installation image can contain the fix.

Comment 1 Fedora Blocker Bugs Application 2020-04-16 23:05:47 UTC
Proposed as a Blocker for 32-final by Fedora user bskeggs using the blocker tracking app because:

 We have requests from Lenovo to ensure that the Thinkpad P1/P53 are well supported in the F32 release, and this bug severely impacts system stability.

Comment 2 Adam Williamson 2020-04-16 23:15:21 UTC
it is really difficult for us to accept a bug that is not publicly visible as a blocker or FE. Fedora is a public project. Does the description really need to be private? if so, is there at least some sanitized version we can make visible?

the *change to the kernel package itself* is necessarily publicly visible, so I don't see how this is secret...

Comment 3 Fedora Update System 2020-04-17 19:56:51 UTC
FEDORA-2020-bebcd88161 has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-bebcd88161

Comment 4 Fedora Update System 2020-04-17 22:07:20 UTC
FEDORA-2020-bebcd88161 has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-bebcd88161`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-bebcd88161

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 5 Ben Skeggs 2020-04-20 01:39:27 UTC
(In reply to Adam Williamson from comment #2)
> it is really difficult for us to accept a bug that is not publicly visible
> as a blocker or FE. Fedora is a public project. Does the description really
> need to be private? if so, is there at least some sanitized version we can
> make visible?
> 
> the *change to the kernel package itself* is necessarily publicly visible,
> so I don't see how this is secret...

It doesn't need to be private, I've fixed that now.

Comment 7 Stephen Gallagher 2020-04-20 12:29:19 UTC
While this is a serious bug, Turing-based processors are still relatively new (and expensive!). I don't think we'd be likely to block on such a small subset of hardware at the Go/No-Go meeting, particularly with this bug coming in after we've already slipped once.

I'm +1 FE for this. Given that a fix is ready and we are going to be respinning for another blocker bug anyway, we should get this in.

Comment 8 Ben Cotton 2020-04-20 13:17:14 UTC
+1 FE for sure. I'd entertain an argument as to why it should be a blocker.

Comment 9 Adam Williamson 2020-04-20 15:17:17 UTC
Yeah, same place as Stephen and Ben for me, +1 FE for sure, sceptical on blocker (but as we need to respin it's somewhat academic).

Comment 10 Stephen Gallagher 2020-04-20 15:56:02 UTC
(In reply to Adam Williamson from comment #9)
> Yeah, same place as Stephen and Ben for me, +1 FE for sure, sceptical on
> blocker (but as we need to respin it's somewhat academic).

Well, it's academic as long as the provided fix actually works.

FWIW, I asked FESCo to rule on whether they think it needed to be a special blocker and the result was "no".

Comment 11 Geoffrey Marr 2020-04-20 17:44:50 UTC
Discussed during the 2020-04-20 blocker review meeting: [0]

The decision to classify this bug as a "RejectedBlocker" and an "AcceptedFreezeException" was made as we think the impact here is too narrow to qualify as blocker (it only affects a very new NVIDIA hardware generation), but certainly significant enough to accept as an FE.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2020-04-20/f32-blocker-review.2020-04-20-16.01.txt


Note You need to log in before you can comment on or make changes to this bug.