Bug 1900233 - Grub fails to boot with tpm.c errors
Summary: Grub fails to boot with tpm.c errors
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: grub2
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Javier Martinez Canillas
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-21 16:30 UTC by thepiguy0
Modified: 2022-06-25 22:09 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-05 11:59:53 UTC
Type: Bug


Attachments (Terms of Use)
Dmesg after boot (81.68 KB, text/plain)
2020-12-16 13:30 UTC, thepiguy0
no flags Details

Description thepiguy0 2020-11-21 16:30:32 UTC
Description of problem:

Occasionally on boot, grub fails to load any OS and display "Command failed" messages instead.
To fix this, it will require (sometimes multiple) power cycles and then will just work.

Version-Release number of selected component (if applicable):

All components are of the grub2-2.04-31 release

How reproducible:

Seems to be about half the time I attempt to boot. The other half, it works as expected

Steps to Reproduce:
1. Attempt to boot the computer (either from cold or from a reboot)
2. Select any OS from the menu
3. About 50% (in my experience) of the time it will result in the above error

Actual results:

error: ../../grub-core/commands/efi/tpm.c:306:Command failed.
error: ../../grub-core/commands/efi/tpm.c:306:Command failed.
error: ../../grub-core/commands/efi/tpm.c:306:Command failed.
error: ../../grub-core/commands/efi/tpm.c:306:Command failed.
error: ../../grub-core/commands/efi/tpm.c:306:Command failed.
error: ../../grub-core/commands/efi/tpm.c:306:Command failed.
error: ../../grub-core/commands/efi/tpm.c:306:Command failed.
error: ../../grub-core/loader/i386/efi/linux.c:208:you need to load a kernel first.

Press any key to continue...

The above also occurs when trying to boot Windows, however without the linux.c line

Expected results:

Boot the selected OS without any issues

Additional info:

Comment 1 Chris Murphy 2020-12-16 04:34:58 UTC
Was this ever working or has it always been an issue? 

The most recent tpm related patch I see are in 2.04-19 and -20
https://src.fedoraproject.org/rpms/grub2/blob/master/f/grub2.spec
https://src.fedoraproject.org/rpms/grub2/blob/master/f/0216-tpm-Don-t-propagate-TPM-measurement-errors-to-the-ve.patch
https://src.fedoraproject.org/rpms/grub2/blob/master/f/0217-tpm-Enable-module-for-all-EFI-platforms.patch

The older grub2 packages are here, you'd need 2.04-18.fc33 to go back before the most recent tpm changes. This is optional, I'm not making a recommendation because I'm not really sure what the problem is, but it's one possible idea for trying to narrow down when it started.
https://koji.fedoraproject.org/koji/packageinfo?packageID=6684

It may also help attach dmesg. Ordinarily there isn't anything sensitive in dmesg, but you can reboot and then immediately capture dmesg.

A more invasive possible fix is to check if there are newer firmware (BIOS or UEFI) updates for the computer.

Comment 2 thepiguy0 2020-12-16 13:30:08 UTC
Created attachment 1739639 [details]
Dmesg after boot

Dmesg after successful boot

Comment 3 thepiguy0 2020-12-16 13:31:36 UTC
My Fedora Linux install on this laptop is relatively new as any kernel older than 5.9 had some dealbreaking bugs forcing me away, so unfortunately I have no real experience other than this. There is a possibility it did occur on F32 when I first tried it just under a year ago but I could be wrong about that. However I do know on Arch Linux before I moved over to Fedora, there didn't appear to be any issues so I believe it's something to do with the Fedora version.

I have attached the dmesg (it was after a successful boot - if there is a way to get debugging information out of grub when it's failing I'm happy to do so).

I'll give the older grub a go and see if it makes a difference, and as for UEFI updates I am currently running the latest for this computer.

Comment 4 thepiguy0 2020-12-16 13:46:14 UTC
As an update to this, 2.04-18.fc33 does in fact seem to solve the issue. If you would like any more logs regarding this version, I am happy to take them as well

Comment 5 thepiguy0 2021-01-20 23:17:45 UTC
I was wondering if there has been any development on this. Having run the grub2-2.04-18.fc33 packages for a month now I can definitely confirm they are not affected by the same bug. If there is a way for me to test specific patches I am happy to do so

Comment 6 antofthy 2021-04-04 03:30:05 UTC
I just had this exact set of error meesages happen to me...  After upgrading to  grub2-2.04-33.fc33, on a DELL Inspiron 3593
As described above I powered off, and left it off for a full minute, after which the grub menu came up as normal.
Before this the grub menu seemed truncated with some actions not available.

Any suggestions as to fixing this permanently?

Comment 7 thepiguy0 2021-04-04 20:01:14 UTC
Unfortunately it's still not fixed for me either, I also sometimes get the truncated grub menu.

I have tested around and the regression is introduced (as we thought above) in either 2.04-19 or 2.04-20 as 2.04-18 works perfectly and 2.04-20 has the bug (2.04-19 failed in Koji so I can't test that).

As a temporary solution, I've downgraded to 2.04-18 and excluded GRUB in /etc/dnf/dnf.conf to ensure it doesn't upgrade to a broken version again.

I'd really like to see a permanent fix to this as well though, seeing as we now know which two commits caused it.

Comment 8 Chris Murphy 2021-04-04 20:37:56 UTC
It's possible it's been fixed upstream in 2.06, recently released on Fedora 34 post beta. The nightlies have it. It should be sufficient to test by creating USB stick media and booting it to see if the problem still happens.

https://kojipkgs.fedoraproject.org/compose/branched/Fedora-34-20210404.n.1/compose/Everything/x86_64/iso/
 grub2-common-1:2.06~rc1-3.fc34.noarch

Comment 9 thepiguy0 2021-04-05 02:22:29 UTC
I've just tested that now and unfortunately the same issue does appear to be present (albeit possibly less frequent, ~3 times in 12 boots).

I also experienced a weird issue on the occasions when it did work where grub was unresponsive and there was a small white line that travelled from left to right before dropping down slightly and repeating. Once this reached the bottom of the screen it became responsive and booted my selection.

For reference, I also completed the same test with the Arch Linux live USB and had no issues (no unresponsive GRUB, no visual artifacts and it booted perfectly every time) so this suggests to me that there is still something causing issues in the Fedora-specific grub.

Comment 10 Chris Murphy 2021-04-05 02:48:45 UTC
ok thanks for testing; I suspect you're correct; if you could file a separate bug for the unresponsiveness issue noting it's grub2 2.06~rc1-3 that'll help keep everything organized.

Comment 11 thepiguy0 2021-04-28 18:31:19 UTC
Given that Fedora F34 has now been officially released, I was wondering if there has been any progress on this bug? Unfortunately 2.04-18 (the last working version of Grub) isn't available for Fedora 34 and therefore I am unable to upgrade without forcing myself to use the buggy Grub.

If there is anything that I can do to help (e.g. testing any packages or providing more logs) then I am more than happy to do so.

Comment 12 Fedora Admin user for bugzilla script actions 2021-05-07 00:36:06 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 13 thepiguy0 2021-05-29 15:50:35 UTC
As a final check, is anything likely to happen with this bug any time soon?

Currently I am still using F33 with Grub 2.04-18 but as I'm sure you'll understand, this is not a permanent solution (especially as F34 has no working grub version at all).

If no work is planned, then I will likely switch away from Fedora to another distribution (as this bug is Fedora-specific). I still prefer Fedora's philosophy so if logs/testing etc are required then I am happy to do so in order to help fix the issue.

Comment 14 Chris Murphy 2021-05-29 21:34:28 UTC
It's quasi-planned, Javier said he'd try to look at it soon but there's still some fall from the new shim. Maybe there's a way to inhibit/denylist a built-in GRUB module as an interim solution? Might be worth asking on the upstream grub-users@ list.

Comment 15 ndk 2021-06-07 04:08:38 UTC
A similar thing happens to me but the error codes are different- 

error: ../../grub-core/commands/efi/tpm.c:140:command failed.

error: ../../grub-core/commands/efi/tpm.c:140:command failed.

error: ../../grub-core/commands/efi/tpm.c:140:command failed.

error: ../../grub-core/commands/efi/tpm.c:140:command failed.

error: ../../grub-core/commands/efi/tpm.c:140:command failed.

error: ../../grub-core/loader/i386/efi/linux.c:208:you need to load kernel first.

press any key to continue...

Comment 16 ndk 2021-06-07 04:10:41 UTC
(In reply to antofthy from comment #6)
> I just had this exact set of error meesages happen to me...  After upgrading
> to  grub2-2.04-33.fc33, on a DELL Inspiron 3593
> As described above I powered off, and left it off for a full minute, after
> which the grub menu came up as normal.
> Before this the grub menu seemed truncated with some actions not available.
> 
> Any suggestions as to fixing this permanently?

I also use the same laptop model as antofthy

Comment 17 Javier Martinez Canillas 2021-07-05 11:59:53 UTC
The reason why grub2-2.04-18 doesn't have this issue and grub2-2.04-19 onwards have it
is that on this package NVR the tpm module was enabled. This seems to be a problem with
the EFI firmware on these machines and I'm not sure what we can do about it.

Comment 18 Chris Murphy 2021-07-05 21:38:58 UTC
A couple of options for those having this problem:
(a) check if the laptop manufacturer support site has TPM specific firmware updates, that's sometimes a thing
(b) check to make sure UEFI/BIOS firmware is up to date

Those two are the easiest to try out and the most complete fix. If there are none, then it's a case of working with upstream GRUB:
(c) does the problem happen with locally compiled upstream git GRUB 2.06? If yes, then see if they can help provide a work around since the manufacturer won't. If it's not a problem with upstream GRUB but is with Fedora's GRUB, then reopen this bug report.
(d) upstream maybe can help articlate the nature of the problem so that you can push back on the manufacturer to release a firmware fix, either logic board firmware or TPM.

The reality is that this problem is going to trickle down into other distros if it's not nipped in the bud soon, because eventually they will all rebase on GRUB 2.06.

Comment 19 thepiguy0 2021-07-05 21:55:11 UTC
I ended up swapping off Fedora because of this issue, but I can answer most of the questions you have there:

a) As far as I can tell, my laptop (Lenovo Yoga S740-14IIL) does not have separate TPM updates
b) The UEFI is fully updated
c) Unfortunately when I was still on Fedora, I never got around to manually compiling and installing Grub. However, I am now running Arch Linux which is currently distributing Grub 2.06 as well. Their GRUB version (which as far as I know is close to upstream) has no issues with my laptop, so I do believe this is a Fedora-specific issue.

Comment 20 Chris Murphy 2021-07-05 21:56:11 UTC
Cool, thanks for the update. In that case I think it's something with this Fedora specific patch and I'm not sure how to work around it.

Comment 21 Chris Murphy 2021-07-06 01:56:39 UTC
If Arch grub-install doesn't build in the tpm module, then it's still uncertain this is a Fedora specific problem. Fedora includes the tpm module in grub{$arch}.efi for measured boot support, etc. You'd need to do a grub-install on Arch that includes adding both sets of grub modules:

https://src.fedoraproject.org/rpms/grub2/blob/rawhide/f/grub.macros#_119
https://src.fedoraproject.org/rpms/grub2/blob/rawhide/f/grub.macros#_406

An alternative I just thought of is most UEFI firmware setups have a way to disable a TPM.

Comment 22 thepiguy0 2021-07-10 13:24:13 UTC
Sorry for the late reply.

Unfortunately I've got very little experience with manually installing grub - the extent is I know the arch install commands "grub-install --target=x86_64-efi --efi-directory=/efi".

I had a look around and there appears to be a default directory under which modules are loaded: "/usr/lib/grub/x86_64-efi", and within there, there is a tpm.mod file. I don't know if that means it's loaded though?

My laptop currently has no data of any importance on it so if you want me to test some stuff out (e.g. if you have a different grub install command), I am happy to do so.

One thing: I'd rather not turn off TPM. I do dual boot the laptop with Windows and use Windows Hello (which I believe uses TPM), and the TPM module is definitely active in Arch once booted, so it seems like a backwards step to turn it completely off so I can use Fedora over another Linux distro without this issue.

Comment 23 Chris Murphy 2021-07-10 21:31:47 UTC
I think --modules= option is needed to bake it into the grubx64.efif file; it's not enough for the module to be available, it needs to be loaded. It is either loaded by 'insmod' command in the grub.cfg or it needs to be included in the grubx64.efi created by grub-install command.

So the issues it that most all distributions will be moving to signed images, and while I don't know if Arch will do that, if they do, the problem will end up there as well. So while there's no obligation to figure this out, it's true that the problem is almost certainly upstream and will trickle down everywhere eventually.

Comment 24 thepiguy0 2021-07-11 10:33:28 UTC
Thanks for the reply, I have just tried reinstalling grub with "grub-install --target=x86_64-efi --efi-directory=/efi --modules=tpm" and definitely now experience a similar error on Arch (it just says "error: Command failed." but displays it a similar number of times and fails to boot).

Therefore, it does appear that this may be an upstream issue and not Fedora's issue, have you got any recommendations for where I should take the discussion?

Comment 25 Chris Murphy 2021-07-12 07:23:56 UTC
I suggest emailing the grub maintainers and ask about it.
https://lists.gnu.org/mailman/listinfo/grub-devel/

Comment 26 Bruno Parisi 2021-07-21 11:39:58 UTC
(In reply to thepiguy0 from comment #24)
> Thanks for the reply, I have just tried reinstalling grub with "grub-install
> --target=x86_64-efi --efi-directory=/efi --modules=tpm" and definitely now
> experience a similar error on Arch (it just says "error: Command failed."
> but displays it a similar number of times and fails to boot).
> 
> Therefore, it does appear that this may be an upstream issue and not
> Fedora's issue, have you got any recommendations for where I should take the
> discussion?

Thank you so much for following through on this issue, I'm running a Lenovo C940 and have had this issue for about 8 months now and clean installed F34 with still the same issue. Have you contacted grub maintainers?? If not I will, I understand your on Arch now so maybe because the issue no longer affects you you didn't email them (Totally fine, I am happy to take the mantle from here).

Comment 27 thepiguy0 2021-07-21 22:45:52 UTC
(In reply to Bruno Parisi from comment #26)
> Thank you so much for following through on this issue, I'm running a Lenovo
> C940 and have had this issue for about 8 months now and clean installed F34
> with still the same issue. Have you contacted grub maintainers?? If not I
> will, I understand your on Arch now so maybe because the issue no longer
> affects you you didn't email them (Totally fine, I am happy to take the
> mantle from here).

Unfortunately I haven't yet got around to it yet so if you are willing to do so that would be great.

It is actually possible to replicate the issue under Arch by enabling the TPM module so it's definitely something that needs sorting (and besides, I prefer Fedora and would like to move back if we can sort it out).

Please keep me posted on any updates and let me know if there's anything else I can do to help the cause (and thank you for following this up with the grub devs)

Comment 28 Bruno Parisi 2021-07-21 23:47:23 UTC
UPDATE: I contacted the devs and they responded quite quickly with the below.

May I ask you to try this patch [1] out.

Daniel

[1] https://lists.gnu.org/archive/html/grub-devel/2021-02/msg00107.html

I'm not sure how to implement this so I've asked for instruction's on how to do so. If anyone else wants to give it a crack be my quest and please update us with your results. I will do the same once I understand what to do.

Comment 29 thepiguy0 2021-07-22 00:00:39 UTC
(In reply to Bruno Parisi from comment #28)
> UPDATE: I contacted the devs and they responded quite quickly with the below.
> 
> May I ask you to try this patch [1] out.
> 
> Daniel
> 
> [1] https://lists.gnu.org/archive/html/grub-devel/2021-02/msg00107.html
> 
> I'm not sure how to implement this so I've asked for instruction's on how to
> do so. If anyone else wants to give it a crack be my quest and please update
> us with your results. I will do the same once I understand what to do.

Wow that was a quick response.

I believe we already have that patch: https://src.fedoraproject.org/rpms/grub2/blob/rawhide/f/0152-tpm-Don-t-propagate-TPM-measurement-errors-to-the-ve.patch

It's very slightly different but from my skim over, the only differences are the GRUB patch has some logging

Interestingly, I believe change was the contents of the 2.04-18 -> 2.04-19 change (2.04-18 was the last working version for Fedora 33)

Comment 30 David Taylor 2021-08-02 17:14:54 UTC
I run Fedora 31 on a Lenovo Yoga C940 laptop, dual-booted with vendor provided Windows 11. Today I needed to reboot into Windows 11 to run a program. While I was there I "paused" Windows Update so it wouldn't reboot while I was away.

Then I rebooted back into Fedora and the efi/tpm.c error occurred after selecting the kernel. It continued working, but ... that was weird. Rebooted again, same thing.

I rebooted into Windows and re-enabled auto-updating, then rebooted back into Fedora. The problem disappeared.

It appears Windows modifies the efi firmware in someway that grub doesn't like, and turning on auto-updates in windows fixed it for me on fedora.

I hope this helps someone.

Comment 31 Bruno Parisi 2021-08-03 21:01:02 UTC
Hi dataylor  thanks for that update, I also upgraded to Windows 11 (I'm an insider) and after reading your post I did a double-take thinking "Hey, maybe I haven't had the issue since upgrading too?? but I thought it happened after the 1st boot but that could have been a different issue my memory is foggy" but low and behold it happened again from a restart from Fedora. But.... I do think it's happening less frequently (Unless that's a placebo effect lol).

In your case, I hope it stays fixed

The GRUB team did share a patch which I will attempt in the next few days... fingers crossed

Comment 32 thepiguy0 2021-08-05 20:31:34 UTC
I've just had a UEFI update from Lenovo and one of the update notes stated "Fix BitLocker randomly asks for recovery key", which could very well be related to TPM. As such, I retried Arch grub with TPM enabled and couldn't get any errors. I have since taken the plunge and re-installed Fedora and as far as I can tell the issue is now fixed for my machine.

For anybody else with a Lenovo product, keep an eye on the UEFI updates!

On another note though, this bug (https://bugzilla.redhat.com/show_bug.cgi?id=1946969) which is another Fedora specific issue still occurs on every boot, so Grub isn't completely fixed for my machine yet

Comment 33 Bruno Parisi 2021-08-05 23:59:45 UTC
(In reply to thepiguy0 from comment #32)
> I've just had a UEFI update from Lenovo and one of the update notes stated
> "Fix BitLocker randomly asks for recovery key", which could very well be
> related to TPM. As such, I retried Arch grub with TPM enabled and couldn't
> get any errors. I have since taken the plunge and re-installed Fedora and as
> far as I can tell the issue is now fixed for my machine.
> 
> For anybody else with a Lenovo product, keep an eye on the UEFI updates!
> 
> On another note though, this bug
> (https://bugzilla.redhat.com/show_bug.cgi?id=1946969) which is another
> Fedora specific issue still occurs on every boot, so Grub isn't completely
> fixed for my machine yet

Awesome update!! This could be a game changer, unfortunately you're running an S740 right?? (I'm on a C940 14IIL) I might have to raise this with Lenovo to see if there's anything coming down the pipe. Sorry to hearabout your other issue, I'm not familiar with that at all so can't offer any assistance.

Keeps us updated if the update holds true and you no longer have the TPM issue. Fingers crossed for you.

Cheers

Comment 34 thepiguy0 2021-08-07 17:37:11 UTC
(In reply to Bruno Parisi from comment #33)
> Awesome update!! This could be a game changer, unfortunately you're running
> an S740 right?? (I'm on a C940 14IIL) I might have to raise this with Lenovo
> to see if there's anything coming down the pipe. Sorry to hearabout your
> other issue, I'm not familiar with that at all so can't offer any assistance.
> 
> Keeps us updated if the update holds true and you no longer have the TPM
> issue. Fingers crossed for you.
> 
> Cheers

Yes that's correct, I'm on an S740-14IIl (and two days later I haven't once encountered the issue so I do believe it's fixed). Unfortunately this is one of many issues (this one, https://bugzilla.kernel.org/show_bug.cgi?id=207749, is my main gripe currently).

It's definitely worth writing in, I filed an issue under Operating System related issues and stated that the issues were related to the firmware, not Linux etc. After initially being told to reinstall Windows, the support staff agreed to forward an email containing all the details of my many issues further in with the hopes it would make it to the firmware team. I have no idea whether it made it (and prompted this UEFI update), probably not but you never know.

Comment 35 Reg 2022-06-25 22:07:45 UTC
On Dell Inspiron 3593 error is gone after disabling PTT(intel's version of TPM) in bios.

Comment 36 Reg 2022-06-25 22:09:04 UTC
(In reply to Reg from comment #35)
> On Dell Inspiron 3593 error is gone after disabling PTT(intel's version of
> TPM) in bios.

*I'm using fedora 36


Note You need to log in before you can comment on or make changes to this bug.