Bug 1790115

Summary: [CRITICAL REGRESSION] Fedora's configuration of kernel >= 5.4 is not bootable
Product: [Fedora] Fedora Reporter: Artem S. Tashkinov <aros>
Component: kernelAssignee: Hans de Goede <hdegoede>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 31CC: airlied, bskeggs, fmartine, hdegoede, ichavero, itamar, jarodwilson, jeremy, jforbes, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, lkundrak, lobsielvith, masami256, mchehab, mihai, mjg59, pjones, steved, y9t7sypezp
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-5.5.10-100.fc30 kernel-5.5.10-200.fc31 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-21 03:15:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
set debug=all grub2 errors
none
The first grub debug screen
none
grub `set debug=all` + kernel debug options
none
Working 5.4 config
none
diff between Fedora git2 and git3 config files none

Description Artem S. Tashkinov 2020-01-11 22:40:26 UTC
1. Please describe the problem:

Kernels kernel-5.4.8-200.fc31.x86_64 and kernel-5.4.10-200.fc31.x86_64 do NOT boot at all. Right after GRUB shows the kernel name and it tries to boot it I can only see EFI stub: UEFI secure boot is enabled

At this point the laptop is dead. It doesn't respond to Ctrl + Alt + Del, it doesn't respond to CAPS lock, nothing. Only 4 seconds power button press shuts it down.


2. What is the Version-Release number of the kernel:

kernel-5.4.8-200.fc31.x86_64 and kernel-5.4.10-200.fc31.x86_64 (testing)

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

  No idea. Right now I'm running 5.3.11 and it boots/works just fine.


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below: no steps, no boot.


5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:


6. Are you running any modules that not shipped with directly Fedora's kernel?: kernel itself does not boot. No, I'm not using any 3d-party modules. Besides secure boot is enabled.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

I cannot attach anything - the kernel does NOT boot.

lspci:

00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 0a)
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 08)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
02:00.0 Network controller: Intel Corporation Wireless 3165 (rev 81)
00:1f.7 Non-Essential Instrumentation [1300]: Intel Corporation Device 9d26 (rev 21)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 (rev f1)
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1)
00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21)
00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 08)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01)
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:02.0 VGA compatible controller: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07)

CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz

product: HP Pavilion x360 Convertible (P1F10UA#ABA)
BIOS: F.28 (the latest released one)

Comment 1 Artem S. Tashkinov 2020-01-11 22:44:28 UTC
I have tried removing all the boot options aside from 

vmlinuz-5.4.8-300.fc31.x86_64 root=UUID=UUID ro
initrd-XXX

that didn't help. Nothing can be seen on the screen as if GRUB doesn't even try to load the kernel image.

Comment 2 Artem S. Tashkinov 2020-01-11 22:48:10 UTC
This might be a dupe of bug 1779611

I will now test with the offered debug flags.

Comment 3 Artem S. Tashkinov 2020-01-11 23:14:06 UTC
Created attachment 1651553 [details]
set debug=all grub2 errors

Comment 4 Artem S. Tashkinov 2020-01-11 23:15:25 UTC
I've tried downgrading to the first release in F31 but that didn't help.

I cannot downgrade to the release in F30 due to a conflict in dependencies.

Comment 5 Artem S. Tashkinov 2020-01-11 23:21:03 UTC
Created attachment 1651554 [details]
The first grub debug screen

Comment 6 Artem S. Tashkinov 2020-01-11 23:23:21 UTC
I've run badblocks, e2fsck, fsck.vfat for all responsible partitions - no errors have been found.

Comment 7 Artem S. Tashkinov 2020-01-11 23:38:26 UTC
The plot thickens.

Looks like grub is able to load the kernel image and initrd.

The last three messages on the screen are:

script/lexer.c:321: token 0 text []
loader/efi/linux.c:82: kernel_addr: 0x1000000 handover_offset: 0x190 params: 0x7af07000
loader/efi/linux.c:85: hadover_func() = 0x1000390

At this point the system is dead.

No idea what to do next.

Comment 8 Artem S. Tashkinov 2020-01-12 00:16:36 UTC
I've just tried booting using ReFind - the issue persists which means it's not related to GRUB.

Rawhide kernel-5.5.0-0.rc5.git2.1.fc32.x86_64 also doesn't boot.

Please advise.

Comment 9 Artem S. Tashkinov 2020-01-12 00:17:18 UTC
Oh, I've also tried disabling secure boot - that didn't help either.

Comment 10 Artem S. Tashkinov 2020-01-12 00:53:19 UTC
I've just thought what if my Fedora installation or my HW are so broken I cannot install any new kernels?

Nope, not true, I've just installed kernel-5.3.7-301.fc31.x86_64 and it works.

I've also tried " nomodeset i915.modeset=0" as my boot flags - no dice.

So,

1) grub2 is unlikely to be at fault as ReFind exhibits the same issue
2) rawhide 5.5 kernel does not boot
3) kernels 5.4.8 and 5.4.10 do not boot
4) 5.3 kernels (and earlier) boot
5) secure boot doesn't change anything
6) nomodeset i915.modeset=0 options don't help
7) there are no kernel messages on boot - i wonder if I can force some sort of text mode. It's unlikely there's a serial port on this laptop.

Comment 11 Hans de Goede 2020-01-12 13:07:54 UTC
Hmm, this sounds somewhat similar to bug 1779611, although that one is about Dell Inspiron models not HP Pavillion models and it started happening in 5.3 not 5.4, still there are some similarities.

Can you try adding "efi=debug earlyprintk=efi,keep" and see if that gives any (extra) output ?

Also can you try disabling the TPM (called PTT for some reason in the Dell case) in the BIOS settings and see if that helps?

Comment 12 Artem S. Tashkinov 2020-01-12 22:51:21 UTC
(In reply to Hans de Goede from comment #11)

> Also can you try disabling the TPM (called PTT for some reason in the Dell case) in the BIOS settings and see if that helps?

Tried all the combinations of TPM device - none has helped. Left as it was before TPM Device: Hidden.

> Can you try adding "efi=debug earlyprintk=efi,keep" and see if that gives any (extra) output ?

Nothing. There's no output - just a black blank screen with an underline which doesn't even blink.

Comment 13 Artem S. Tashkinov 2020-01-12 23:05:51 UTC
I've tried removing absolutely everything from boot except the kernel:

grub> linux (hd0,gpt3)/vmlinuz5.4.8-200.fc31.x86_64 efi=debug earlyprintk=efi,keep
grub> boot
_
(at this point the laptop is frozen).

Also tried adding "nomodeset i915.modeset=0" - the result is the same.

Comment 14 Artem S. Tashkinov 2020-01-12 23:29:55 UTC
Created attachment 1651710 [details]
grub `set debug=all` + kernel debug options

Comment 15 Artem S. Tashkinov 2020-01-12 23:33:14 UTC
I've found a single person in the entire internet whose symptoms looks the same: https://ask.fedoraproject.org/t/install-fails-to-start-on-uefi-winnovo-vokbook/2460/13

It's not clear whether he solved his issue or not. Also he has a different laptop.

Comment 16 Artem S. Tashkinov 2020-01-12 23:39:29 UTC
This looks similar as well: https://askubuntu.com/questions/1196377/computer-hanging-when-booting-linux-5-4-2 - again no resolution.

I'm utterly confused. In my 20 years of using Linux it's the first time I cannot get anything from the kernel on boot.

Comment 17 Artem S. Tashkinov 2020-01-13 00:25:07 UTC
OK, I'll download a dozen of 5.4 kernels from koji and test whether they boot.

Comment 18 Artem S. Tashkinov 2020-01-23 13:12:28 UTC
So,

kernel-5.4.0-0.rc0.git1.1.fc32 boots
kernel-5.4.0-0.rc0.git7.1.fc32 fails to boot (hard freeze)

And all the kernels in between give 

Forbidden

You don't have permission to access this resource.

E.g. https://kojipkgs.fedoraproject.org//vol/fedora_koji_archive05/packages/kernel/5.4.0/0.rc0.git2.1.fc32/x86_64/kernel-5.4.0-0.rc0.git2.1.fc32.x86_64.rpm

from https://koji.fedoraproject.org/koji/buildinfo?buildID=1380315 from https://koji.fedoraproject.org/koji/packageinfo?buildStart=50&packageID=8&buildOrder=-nvr&tagOrder=name&tagStart=50

Which means some changes between git1 and git7 rendered my system unbootable.

I don't know what to do. 

Could Fedora maintainers make the remaining kernels available please?

Comment 19 Artem S. Tashkinov 2020-01-23 13:16:33 UTC
And even if I find the exact kernel at which booting on my system stopped that won't tell us what the bad commit is.

Looks like my only option is bisecting which doesn't sound good for a 4 year old laptop with just two cores and pathetic cooling system which is barely capable of cooling it during normal work and the CPU temps go through the roof when I start compiling.

Someone did break UEFI boot on my laptop. Looks like the number of Linux users in the world is still extremely low since there's no way I can be the only one having this issue.

Comment 20 Hans de Goede 2020-01-23 13:27:17 UTC
I'm sorry that you're having this issue. As you have figured out yourself these early boot crashes are unfortunately quite hard to debug due to the lack of logs.

2 things which you can try, all of which will likely not help but are worth trying:

1. Downgrading grub, as that seems to help in bug 1779611
2. Add dis_ucode_ldr to the kernel commandline

If those do not help, then I am afraid bisecting is the only answer. I know this is no fun, but this is how I've root-caused and eventually fixed these kinda bugs multiple times, see e.g. :

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=818c7ce724770fbcdcd43725c81f2b3535f82b76
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b61fbc887af7a13a1c90c84c1feaeb4c9780e1e2

Both of which also fix these kinda early boot issues. In those cases I had access to hardware where I can reproduce the issue which is not the case in this case I'm afraid.

Comment 21 Artem S. Tashkinov 2020-01-23 13:31:57 UTC
(In reply to Hans de Goede from comment #20)

> 1. Downgrading grub, as that seems to help in bug 1779611

Already tried that however I can only use the initial F31 grub2 release since F30 releases can't be installed due to dependencies.

> Add dis_ucode_ldr to the kernel commandline

Will try it, thanks!

The above commits apply to a 32bit UEFI environment which is not the case for me.

Comment 22 Artem S. Tashkinov 2020-01-23 14:51:25 UTC
I've compiled kernel 5.4 from the official sourced using _my_ own configuration and it boots and works just fine.

1) Does Fedora have any unusual patches on top of kernel 5.4?
2) Does Fedora enable any unusual architecture/EFI options? I'm not a fan of building a kernel using Fedora's .config since it contains support for all the devices under the Sun.

Comment 23 Artem S. Tashkinov 2020-01-23 14:57:00 UTC
This is a complete bummer/disaster because now I'm looking at enabling and disabling Fedora's kernel configuration options one by one which might take a millennium. This is actually worse than bisecting. There's no way I'm going through it. Tell me what to do because I'm on the verge of giving up on Fedora and Linux in general after having used it for over 20 years and having helped resolve numerous bugs in the kernel (not to mention hundreds of bugs in other open source projects).

Comment 24 Artem S. Tashkinov 2020-01-23 15:11:18 UTC
Created attachment 1654871 [details]
Working 5.4 config

Comment 25 Hans de Goede 2020-01-23 18:38:42 UTC
Ok, so we are getting somewhere.

The first thing to try is to build a 5.4 kernel with the Fedora patches and your config:

fedpkg clone -a -b f31 kernel
cd kernel
fedpkg prep
cd kernel-5.4.fc31/linux-5.4.14-200.fc31.x86_64
cp <your-.config> .
make oldconfig
# answer questions about any config differences, see e.g. /boot/config-5.4.xxx to see the Fedora choices
make -j2 bzImage && make -j2 modules && sudo make modules_install && make install
# reboot into new kernel

If that results in a working kernel, then it is a config difference, if that kernel is still broken, then it is likely caused by one of the patches Fedora adds.

If it is a config difference another idea is to try the 5.4 kernel, but with the 5.3 config:

fedpkg clone -a -b f31 kernel
cd kernel
fedpkg prep
cd kernel-5.4.fc31/linux-5.4.14-200.fc31.x86_64
# you can skip the above steps and re-use the dir from last time if you want
cp /boot/config-5.3.xxxx .config
# answer questions about new config options
make -j2 bzImage && make -j2 modules && sudo make modules_install && make install
# reboot into new kernel

If that works then the problem is caused by a change in the Fedora kernel config between 5.3 and 5.4, note that the diff between these 2 should be reasonably small and thus easy to take a detailed look at

Comment 26 Artem S. Tashkinov 2020-02-24 10:13:51 UTC
kernel-5.6.0-0.rc2.git3.1.fc33 is also not bootable, sigh.

I haven't yet convinced myself to start debugging this issue. Will keep on running 5.3.xx till eternity until someone with a lot more eagerness will join the club. Compiling the kernel on this absolutely weak laptop with horrible thermals doesn't excite me at all.

It's really really strange I'm the only one having this issue. Shows how many people really use Linux.

Comment 27 Justin M. Forbes 2020-02-24 16:17:27 UTC
I might also recommend trying earlier 5.4 kernels from koji https://koji.fedoraproject.org/koji/packageinfo?buildStart=50&packageID=8&buildOrder=-completion_time&tagOrder=name&tagStart=0#buildlist as it is possible that 5.4.0 boots fine, and the issue is actually a patch that made it into a stable update upstream. Or try The current upstream 5.4.22 with your config.

Comment 28 Steve 2020-02-24 18:53:05 UTC
(In reply to Artem S. Tashkinov from comment #22)
...
> 1) Does Fedora have any unusual patches on top of kernel 5.4?

The patch files are in the kernel source RPM:

$ rpm -ql -p kernel-5.5.5-200.fc31.src.rpm

> 2) Does Fedora enable any unusual architecture/EFI options?

The config files are also in the kernel source RPM.

The rpm2archive command can be used to convert an RPM file into a tar file:

$ wget https://kojipkgs.fedoraproject.org//packages/kernel/5.5.5/200.fc31/src/kernel-5.5.5-200.fc31.src.rpm
$ rpm2archive kernel-5.5.5-200.fc31.src.rpm
$ tar -xf kernel-5.5.5-200.fc31.src.rpm.tgz

NB: All files are "hidden" dot files:

$ ls -a .*.patch
$ ls -a .*.config

A generic strategy would be to do diffs on suitably chosen text files, such as the list of patch files or a pair of config files.

Comment 29 Artem S. Tashkinov 2020-02-25 00:00:55 UTC
(In reply to Justin M. Forbes from comment #27)

> I might also recommend trying earlier 5.4 kernels from koji https://koji.fedoraproject.org/koji/packageinfo?buildStart=50&packageID=8&buildOrder=-completion_time&tagOrder=name&tagStart=0#buildlist as it is possible that 5.4.0 boots fine, and the issue is actually a patch that made it into a stable update upstream. Or try The current upstream 5.4.22 with your config.

As I mentioned earlier:

> kernel-5.4.0-0.rc0.git1.1.fc32 boots
> kernel-5.4.0-0.rc0.git7.1.fc32 fails to boot (hard freeze)

So, no, Fedora's kernel 5.4.0 (release) is already unbootable and kernels in between cannot be downloaded (I get an access dedied error).

Comment 30 Steve 2020-02-25 00:41:46 UTC
(In reply to Artem S. Tashkinov from comment #29)
...
> > kernel-5.4.0-0.rc0.git1.1.fc32 boots
> > kernel-5.4.0-0.rc0.git7.1.fc32 fails to boot (hard freeze)
> 
> So, no, Fedora's kernel 5.4.0 (release) is already unbootable and kernels in
> between cannot be downloaded (I get an access dedied error).

Starting here:

Information for build kernel-5.4.0-0.rc0.git2.1.fc32
https://koji.fedoraproject.org/koji/buildinfo?buildID=1380315

This download completed without any errors:

$ wget https://kojipkgs.fedoraproject.org//vol/fedora_koji_archive05/packages/kernel/5.4.0/0.rc0.git2.1.fc32/x86_64/kernel-core-5.4.0-0.rc0.git2.1.fc32.x86_64.rpm

Were you behind a firewall when you got the access denied error?

Comment 31 Justin M. Forbes 2020-02-26 13:28:38 UTC
(In reply to Artem S. Tashkinov from comment #29)

> > kernel-5.4.0-0.rc0.git1.1.fc32 boots
> > kernel-5.4.0-0.rc0.git7.1.fc32 fails to boot (hard freeze)
> 
> So, no, Fedora's kernel 5.4.0 (release) is already unbootable and kernels in
> between cannot be downloaded (I get an access dedied error).

That is unfortunate, as there are thousands of patches between git1 and git7, that is the majority of the merge window.

Comment 32 snip4 2020-02-28 22:56:11 UTC
FYI, I had the exactly same issue with all 5.4 Fedora kernel series and beyond. A real headache. I tried everything. Lately, my motherboard provider (Asus) launch a new BIOS version, updating Intel ME and mitigations stuff. Voilá, problem solved. I can see you are using the latest BIOS available for your laptop, unfortunately depends on HP support to release an updated version. https://bugzilla.redhat.com/show_bug.cgi?id=1739836

Comment 33 Artem S. Tashkinov 2020-02-29 09:00:09 UTC
(In reply to snip4 from comment #32)

HP won't release a BIOS update for a 4.5 years old laptop. Also Windows 10 works on it perfectly. This is all messed up. :( TPM Device option in BIOS settings is set to Hidden. There's no option to disable it. Kernel 5.5.6 of course freezes on boot - no changes here.

(In reply to Steve from comment #30)
> Were you behind a firewall when you got the access denied error?

Access denied error was given by https://koji.fedoraproject.org/ - on my end everything works perfectly. Anyways, all those kernels have long been gone. There's no way to download them again. Besides, changes in kernel subsystems are aggregated in pulls, so, the most I could have learned was which subsystem was responsible for this SNAFU. And it's pretty obvious it's something related to UEFI.

Comment 34 Artem S. Tashkinov 2020-02-29 09:10:14 UTC
There's an update for CSME from HP but my laptop is not listed among the models that will receive it:

https://support.hp.com/us-en/document/c06560700

Comment 35 Steve 2020-02-29 15:32:00 UTC
(In reply to Artem S. Tashkinov from comment #33)
... 
> Access denied error was given by https://koji.fedoraproject.org/ - on my end
> everything works perfectly. 

Thanks. There could have been a temporary problem on the server end.

> Anyways, all those kernels have long been gone.
> There's no way to download them again.

I just downloaded all of these with "wget". For your purposes, you probably only need kernel-core:

$ ls -s1 kernel-core*.rpm
32264 kernel-core-5.4.0-0.rc0.git2.1.fc32.x86_64.rpm
32328 kernel-core-5.4.0-0.rc0.git3.1.fc32.x86_64.rpm
32320 kernel-core-5.4.0-0.rc0.git4.1.fc32.x86_64.rpm
32380 kernel-core-5.4.0-0.rc0.git5.1.fc32.x86_64.rpm
32368 kernel-core-5.4.0-0.rc0.git6.1.fc32.x86_64.rpm

For efficiency, I repeatedly edited the "wget" command-line to increment the gitN number.

If you have enough disk space, they can probably all be installed simultaneously:

# dnf install kernel-core*.rpm

Comment 36 Steve 2020-02-29 16:13:48 UTC
(In reply to Artem S. Tashkinov from comment #33)
> Access denied error was given by https://koji.fedoraproject.org/ - on my end everything works perfectly.

If you have a very high speed internet connection, it's possible the server temporarily blocked your IP address for consuming too much bandwidth.

The "wget" option, "--limit-rate", can be used to reduce the download speed.

Comment 37 Justin M. Forbes 2020-03-03 16:16:48 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 31 kernel bugs.

Fedora 31 has now been rebased to 5.5.7-200.fc31.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 32, and are still experiencing this issue, please change the version to Fedora 32.

If you experience different issues, please open a new bug report for those.

Comment 38 Steve 2020-03-04 14:31:31 UTC
These EFI patches will be in kernel v5.6-rc5:

Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e130a920f69399777062f9fe7763abe895d386b0

The git branch can be followed by clicking the second "parent" link. (Note the "Fixes:" line for three of the four patches.)

Comment 39 Steve 2020-03-06 12:59:10 UTC
(In reply to Artem S. Tashkinov from comment #18)
...
> kernel-5.4.0-0.rc0.git1.1.fc32 boots
> kernel-5.4.0-0.rc0.git7.1.fc32 fails to boot (hard freeze)
...

AFAICT, there is only one EFI merge in v5.4-rc1. From my clone of the kernel git repo:

$ git log --oneline --merges v5.4-rc1 | grep -i efi
cc9b499a1f Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc9b499a1f71696054a2771aae504c53eecff31d

Comment 40 Steve 2020-03-06 18:23:39 UTC
(In reply to Steve from comment #38)
> These EFI patches will be in kernel v5.6-rc5:
> 
> Merge branch 'efi-urgent-for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e130a920f69399777062f9fe7763abe895d386b0
...

That should be in this build:

kernel-5.6.0-0.rc4.git1.1.fc33
https://koji.fedoraproject.org/koji/buildinfo?buildID=1474663

* Fri Mar 06 2020 Jeremy Cline <jcline> - 5.6.0-0.rc4.git1.1
- Linux v5.6-rc4-135-gaeb542a1b5c5

That's the current HEAD. From my kernel mainline git repo:

$ git describe --abbrev=12 HEAD
v5.6-rc4-135-gaeb542a1b5c5

Comment 41 Artem S. Tashkinov 2020-03-06 21:33:38 UTC
OK, then:

> kernel-5.6.0-0.rc4.git1.1.fc33 https://koji.fedoraproject.org/koji/buildinfo?buildID=1474663

Does NOT boot.

Speaking of 5.4.

5.4-rc0.git2 boots fine.
5.4-rc0.git3 instant freeze with no output.

I guess there are thousands of commits between them but I'm curious if there's anything which pertains to booting, UEFI or x86-64.

Comment 42 Artem S. Tashkinov 2020-03-06 22:01:27 UTC
Looks like Fedora's configs are almost the same between these two snapshots except for CONFIG_CRYPTO_SHA512=y in rc0.git3 but this option should be harmless which means the bad commit is in the kernel itself.

The patch between these two kernels weighs almost 13MB which is too much to process.

I cannot find anything which pertains to uefi/efi in this patch.

No, it's far above my paygrade.

Comment 43 Artem S. Tashkinov 2020-03-06 22:10:12 UTC
Just to be extra sure:

5.4.0-0.rc0.git2.1.fc32.x86_64 boots fine.
5.4.0-0.rc0.git3.1.fc32.x86_64 does NOT boot.

Comment 44 Steve 2020-03-06 22:40:55 UTC
(In reply to Artem S. Tashkinov from comment #43)
> Just to be extra sure:
> 
> 5.4.0-0.rc0.git2.1.fc32.x86_64 boots fine.
> 5.4.0-0.rc0.git3.1.fc32.x86_64 does NOT boot.

Actually, it may not be too bad. The changelog shows those snapshots are one day apart:

kernel-5.4.0-0.rc0.git3.1.fc32
https://koji.fedoraproject.org/koji/buildinfo?buildID=1382506

* Thu Sep 19 2019 Jeremy Cline <jcline> - 5.4.0-0.rc0.git3.1
- Linux v5.3-7639-gb41dae061bbd

* Wed Sep 18 2019 Jeremy Cline <jcline> - 5.4.0-0.rc0.git2.1
- Linux v5.3-3839-g35f7a9526615

And there are only 21 merges (although merges can contain merges, so that could be misleading):

$ git log --oneline --merges --abbrev=12 b41dae061bbd | less -N

      1 b41dae061bbd Merge tag 'xfs-5.4-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
      2 e6bc9de71497 Merge tag 'vfs-5.4-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
      3 b6c0d3577246 Merge tag 'ovl-fixes-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
      4 7d14df2d280f Merge tag 'for-5.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
      5 0bb73e42f027 Merge tag 'afs-next-20190915' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      6 f60c55a94e1d Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
      7 734d1ed83e1f Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
      8 d013cc800a2a Merge tag 'filelock-v5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
      9 e170eb27715f Merge branch 'work.mount-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
     10 b30d87cf969e Merge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
     11 53e5e7a7a71c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
     12 81160dda9a7a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
     13 8b53c76533aa Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
     14 6cfae0c26b21 Merge tag 'char-misc-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
     15 e6874fc29410 Merge tag 'staging-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
     16 e444d51b14c4 Merge tag 'tty-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
     17 c6b48dad92ae Merge tag 'usb-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
     18 1f7d290a7275 Merge tag 'driver-core-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
     19 fe38bd686207 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
     20 404e634fdb96 Merge tag 'for-linus-urgent' of git://git.kernel.org/pub/scm/virt/kvm/kvm
     21 35f7a9526615 Merge tag 'devprop-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
...

Comment 45 Steve 2020-03-06 22:52:21 UTC
(In reply to Artem S. Tashkinov from comment #41)
> OK, then:
> 
> > kernel-5.6.0-0.rc4.git1.1.fc33 https://koji.fedoraproject.org/koji/buildinfo?buildID=1474663
> 
> Does NOT boot.

Thanks for testing.

> Speaking of 5.4.
> 
> 5.4-rc0.git2 boots fine.
> 5.4-rc0.git3 instant freeze with no output.
> 
> I guess there are thousands of commits between them but I'm curious if
> there's anything which pertains to booting, UEFI or x86-64.

Can you suggest a search term for "booting" that might be in a commit summary? (a kernel component, for example)

Here is a first try using your other two suggestions. There are a lot of KVM commits, but those don't sound relevant, so we have:

$ git log --oneline --abbrev=12 35f7a9526615^..b41dae061bbd | egrep -iw 'efi|x86' | fgrep -v KVM
f6680cbdb258 crypto: x86/aes-ni - use AES library instead of single-use AES cipher
eb7d6ba882f1 crypto: x86 - Rename functions to avoid conflict with crypto/sha256.h
78cd4bf53635 platform/x86: intel_cht_int33fe: Use new API to gain access to the role switch
f8ea7c6049d5 x86: kvm: svm: propagate errors from skip_emulated_instruction()
8ce5fac2dc1b crypto: x86/xts - implement support for ciphertext stealing
cc1d24b980de crypto: x86/des - switch to library interface
5bd08a4ae3d0 platform: x86: hp-wmi: convert platform driver to use dev_groups <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<?
520c1993bbe6 crypto: aegis128l/aegis256 - remove x86 and generic implementations
5cb97700beaa crypto: morus - remove generic and x86 implementations
1d2c3279311e crypto: x86/aes - drop scalar assembler implementations
2c53fd11f762 crypto: x86/aes-ni - switch to generic for fallback and key routines

Comment 46 Steve 2020-03-06 23:28:37 UTC
(In reply to Steve from comment #45)
...
> Can you suggest a search term for "booting" that might be in a commit summary? (a kernel component, for example)
...

"core" or "driver.core":

$ git log --oneline --abbrev=12 35f7a9526615^..b41dae061bbd | grep -iw 'driver.core' | wc -l
21

The commit summaries are too technical for me to interpret, but this one is big, and it has a lot of reverts:

Merge tag 'driver-core-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1f7d290a7275edb270dbee13212c37cb59940221

Comment 47 Artem S. Tashkinov 2020-03-06 23:34:24 UTC
> platform: x86: hp-wmi: convert platform driver to use dev_groups

It's a good catch but almost definitely not the culprit as it's a loadable module.

Comment 48 Artem S. Tashkinov 2020-03-06 23:42:43 UTC
Whatever it is, it's enabled by Fedora's .config as I cannot reproduce the bug with mine. So, it's an option with =Y ;-) The plot thickens. Anyways my config is stripped down hard, so it'll be quite difficult to make any conclusions.

Comment 49 Steve 2020-03-07 00:15:33 UTC
Created attachment 1668245 [details]
diff between Fedora git2 and git3 config files

$ diff -u ../kernel-5.4.0-0.rc0.git2.1.fc32/.kernel-x86_64.config ../kernel-5.4.0-0.rc0.git3.1.fc32/.kernel-x86_64.config > diff-config-git2-git3.txt

Extracted from the Fedora kernel source files like this (and likewise for git2):

$ rpm2archive kernel-5.4.0-0.rc0.git3.1.fc32.src.rpm
$ tar -xf kernel-5.4.0-0.rc0.git3.1.fc32.src.rpm.tgz

Also, the list of Fedora patches is the same for the git2 and git3 builds.

Comment 50 Steve 2020-03-07 00:30:15 UTC
(In reply to Artem S. Tashkinov from comment #42)
> Looks like Fedora's configs are almost the same between these two snapshots
> except for CONFIG_CRYPTO_SHA512=y in rc0.git3 but this option should be
> harmless which means the bad commit is in the kernel itself.
...

I guess you already looked into this, but here are some details:

$ grep '^\+.*[y]$' diff-config-git2-git3.txt
+CONFIG_CRYPTO_SHA512=y
+CONFIG_EXFAT_DISCARD=y
+CONFIG_EXFAT_DONT_MOUNT_VFAT=y
+CONFIG_MLX5_SW_STEERING=y
+CONFIG_NET_TC_SKB_EXT=y
+CONFIG_NET_VENDOR_PENSANDO=y
+CONFIG_SND_HDA_INTEL_DETECT_DMIC=y

$ grep '^\+.*[m]$' diff-config-git2-git3.txt
+CONFIG_ADIN_PHY=m
+CONFIG_ATH9K_PCI_NO_EEPROM=m
+CONFIG_EXFAT_FS=m

$ grep '^-.*$' diff-config-git2-git3.txt
--- ../kernel-5.4.0-0.rc0.git2.1.fc32/.kernel-x86_64.config	2019-09-18 11:29:29.000000000 -0700
-CONFIG_CRYPTO_SHA512=m
-# CONFIG_SND_HDA_INTEL_DETECT_DMIC is not set

Comment 51 Steve 2020-03-07 00:47:14 UTC
(In reply to Artem S. Tashkinov from comment #48)
> ... So, it's an option with =Y ;-) ...

I'm not qualified to comment, but a build with "=y" would correspond to the Fedora git3 version:

$ fgrep CRYPTO_SHA512 artem-5.4.config 
CONFIG_CRYPTO_SHA512_SSSE3=m
CONFIG_CRYPTO_SHA512=m

Have you tried manually loading the module to see if anything unexpected happens? Or loading it from the kernel command line?

Comment 52 Steve 2020-03-07 01:59:51 UTC
Here's another one to look into:

$ less diff-config-git2-git3.txt
...
-# CONFIG_SND_HDA_INTEL_DETECT_DMIC is not set
+CONFIG_SND_HDA_INTEL_DETECT_DMIC=y
...

$ grep DMIC artem-5.4.config
# CONFIG_SND_HDA_INTEL_DETECT_DMIC is not set

$ git grep CONFIG_SND_HDA_INTEL_DETECT_DMIC
sound/pci/hda/hda_intel.c:static bool dmic_detect = IS_ENABLED(CONFIG_SND_HDA_INTEL_DETECT_DMIC);

Comment 53 Artem S. Tashkinov 2020-03-07 02:13:54 UTC
I've bisected it:

https://bugzilla.kernel.org/show_bug.cgi?id=206175#c19

It's extremely weird as I've no idea how Fedora's config might be involved. It seemingly doesn't.

Comment 54 Steve 2020-03-07 03:22:23 UTC
(In reply to Artem S. Tashkinov from comment #53)
> I've bisected it:

Congratulations, and thanks for the update.

> https://bugzilla.kernel.org/show_bug.cgi?id=206175#c19

>> 14 reboots.

For the record, about how long did that take?

Comment 55 Artem S. Tashkinov 2020-03-07 03:50:15 UTC
(In reply to Steve from comment #54)

Around an hour and a half - the most painful part was copying the compiled kernel from my desktop PC to my laptop, so it went this way:

  * go to the PC and compile the kernel,
  * go to the laptop,
  * mount the PC via samba, copy bzImage, unmount manually (systemd bug? cannot unmount a CIFS share automatically),
  * reboot and test a new kernel,
  * reboot into the working one,
  * enter the LUKS password, wait for boot,
  * login under root,
  * rinse and repeat.

Prior to everything I had to disable the Secure boot option on my laptop. You can technically sign your own custom kernels but it's just too much hassle.

If you have a powerful, well-cooled device you can do everything on sight without going anywhere however compiling the kernel on my laptop wasn't an option - it would have taken ages. Also, Fedora's config has this "TEST    posttest" part which made each compilation step longer by at least a minute or two. I'm pretty sure it wasn't necessary but I was too lazy to search for and disable the corresponding config option.

Also, I didn't use ccache which can greatly reduce compilation times but it's not always beneficial and requires a lot of disk space.

In short I'm happy a have a fast desktop PC.

Comment 56 Steve 2020-03-07 04:04:01 UTC
(In reply to Artem S. Tashkinov from comment #55)
> (In reply to Steve from comment #54)
> 
> Around an hour and a half - the most painful part was copying the compiled
> kernel from my desktop PC to my laptop, so it went this way:
...

Thanks for your detailed report. That process sounds tedious. Sometimes I find it easier to transfer files (not kernels, fortunately) with a high-quality USB flash drive, such as this one:

Kingston 64GB DataTraveler Elite G2 Black Metal Casing Fast 180MB/s R, 70MB/W USB 3.1 Flash Drive with LED light indicator (DTEG2/64GB)
https://www.amazon.com/Kingston-64GB-DataTraveler-indicator-DTEG2/dp/B075KQDWGK/

Comment 57 Steve 2020-03-07 17:23:36 UTC
(In reply to Steve from comment #46)
> (In reply to Steve from comment #45)
> ...
> > Can you suggest a search term for "booting" that might be in a commit summary? (a kernel component, for example)
> ...
> 
> "core" or "driver.core":
> 
> $ git log --oneline --abbrev=12 35f7a9526615^..b41dae061bbd | grep -iw 'driver.core' | wc -l
> 21
...

This is moot now, but that actually "found" the commit that bisecting found:

$ git log --oneline --abbrev=12 35f7a9526615^..b41dae061bbd | grep -iw 'driver.core' | cat -n
...
     9	cdfee5623290 driver core: initialize a default DMA mask for platform device
...

Comment 58 Steve 2020-03-07 18:36:04 UTC
(In reply to Steve from comment #44)
...
> And there are only 21 merges (although merges can contain merges, so that could be misleading):
> 
> $ git log --oneline --merges --abbrev=12 b41dae061bbd | less -N
...
>      17 c6b48dad92ae Merge tag 'usb-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
>      18 1f7d290a7275 Merge tag 'driver-core-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
...

One more follow-up comment. That "usb" merge appears to have pulled in a few commits completely unrelated to USB, including the bad commit.

If so, maintainers might need to review their process so that doesn't happen.

Here is a simplistic summary that doesn't attempt to exclude all USB-related commits. Note the DMA commits, in particular:

$ git log --oneline --abbrev=12 --decorate 1f7d290a7275..c6b48dad92ae | cat -n | grep -iv usb
    14	44493062abc3 device connection: Add fwnode_connection_find_match()
    18	6b68240d7c54 dt-bindings: connector: add optional properties for Type-B
    22	6ed151f26484 xhci-ext-caps.c: Add property to disable Intel SW switch
    29	9334367cda85 xhci: fix possible memleak on setup address fails.
    30	8a62dff2c073 xhci: add TSP bitflag to TRB tracing
    39	8ceb1417f3ca mfd: don't select DMA_DECLARE_COHERENT for the sm501 and tc6393xb drivers
    75	cdfee5623290 (tag: bad-1) driver core: initialize a default DMA mask for platform device
    76	bd5defaee872 dma-mapping: remove is_device_dma_capable
   154	314de2f6b577 ARM: dts: exynos: Use standard arrays of generic PHYs for EHCI/OHCI devices
   156	c27989cc536b dt-bindings: switch Exynos EHCI/OHCI bindings to use array of generic PHYs

NB: "bad-1" is my tag.

Comment 59 Artem S. Tashkinov 2020-03-07 18:53:09 UTC
(In reply to Steve from comment #58)

This comment was probably meant for LKML, not for this bugzilla.

Comment 60 Steve 2020-03-10 17:07:57 UTC
Artem: I am not going to wade into a kernel BZ debate, but I would strongly encourage you to attach a COMPLETE dmesg without being asked:

Bug_206175 - Fedora >= 5.4 kernels instantly freeze on boot without producing any display output 
https://bugzilla.kernel.org/show_bug.cgi?id=206175

As for the patch itself, it is worth noting ALL the files that were changed:

$ git show --oneline --numstat bad-1
cdfee56232 (tag: bad-1) driver core: initialize a default DMA mask for platform device
0       9       arch/m68k/kernel/dma.c
0       6       arch/powerpc/kernel/setup-common.c
0       1       arch/sh/boards/mach-ap325rxa/setup.c
0       2       arch/sh/boards/mach-ecovec24/setup.c
0       1       arch/sh/boards/mach-kfr2r09/setup.c
0       1       arch/sh/boards/mach-migor/setup.c
0       2       arch/sh/boards/mach-se/7724/setup.c
16      21      drivers/base/platform.c
1       1       include/linux/platform_device.h

The recompiles that you mentioned in Comment_20 were probably triggered by the change to platform_device.h:
https://bugzilla.kernel.org/show_bug.cgi?id=206175#c20

Comment 61 Artem S. Tashkinov 2020-03-10 17:15:44 UTC
(In reply to Steve from comment #60)
> I would strongly encourage you to attach a COMPLETE dmesg.

I was just asked what the platform was:

> What platform doesn't boot here?  arm, x86, something else?  What board?

No word of dmesg in any shape or form ;-) Also, just also, kernel 5.4 doesn't boot remember? ;-) How can I post dmesg from it? ;-)

> The recompiles that you mentioned in Comment_20 were probably triggered by the change to platform_device.h:

As far as I understood Christoph Hellwig the changes in drivers/base/platform.c literally change how many drivers function, so it's not just the header file. All other files are not used under the x86-64 arch.

Comment 62 Steve 2020-03-10 17:22:43 UTC
(In reply to Artem S. Tashkinov from comment #61)
> Also, just also, kernel 5.4 doesn't boot remember? ;-) How can I post dmesg from it? ;-)

Sarcasm is not a debugging strategy. Attach dmesg for the last bootable kernel:

> 5.4.0-0.rc0.git2.1.fc32.x86_64 boots fine. <<<<<<<<<<<<<<<<<<
> 5.4.0-0.rc0.git3.1.fc32.x86_64 does NOT boot.

Sanitize it if you are concerned about leaking information.

Comment 63 Artem S. Tashkinov 2020-03-10 17:28:48 UTC
(In reply to Steve from comment #62)
> Sarcasm is not a debugging strategy.

That's wasn't sarcasm per se. I just tried to light up the conversation.

> Attach dmesg for the last bootable kernel:

I will post dmesg as soon as Christoph Hellwig asks for it. I _really_ don't want to leave any irrelevant messages in the bug report - Linus has taken it seriously.

I don't really understand the issue but it looks like the committed patch was incomplete and bound to break something. That's kinda weird and alarming but it's not me to judge.

Comment 64 Steve 2020-03-10 17:40:08 UTC
(In reply to Artem S. Tashkinov from comment #63)
...
> I will post dmesg as soon as Christoph Hellwig asks for it. I _really_ don't
> want to leave any irrelevant messages in the bug report - Linus has taken it
> seriously.
...

OK, it's not my problem, but kernel debugging 101 calls for attaching a dmesg.

> 5.4.0-0.rc0.git2.1.fc32.x86_64 boots fine. <<<<<<<<<<<<<<<<<<

The point is to provide more information about your system, not to look for error messages.

Comment 65 Steve 2020-03-10 18:12:37 UTC
(In reply to Steve from comment #64)
...
> > 5.4.0-0.rc0.git2.1.fc32.x86_64 boots fine. <<<<<<<<<<<<<<<<<<
> 
> The point is to provide more information about your system, not to look for error messages.

BTW, that's a debug build according to the changelog:

kernel-5.4.0-0.rc0.git2.1.fc32
https://koji.fedoraproject.org/koji/buildinfo?buildID=1380315

* Wed Sep 18 2019 Jeremy Cline <jcline> - 5.4.0-0.rc0.git2.1
- Linux v5.3-3839-g35f7a9526615

* Tue Sep 17 2019 Jeremy Cline <jcline> - 5.4.0-0.rc0.git1.1
- Linux v5.3-2061-gad062195731b

* Tue Sep 17 2019 Jeremy Cline <jcline>
- Reenable debugging options.

Comment 66 Steve 2020-03-10 18:39:29 UTC
(In reply to Steve from comment #65)
... 
> BTW, that's a debug build according to the changelog:
...

If you are in a Fedora kernel source directory, this should show the differences:

$ diff -u .kernel-x86_64.config .kernel-x86_64-debug.config

Comment 67 Artem S. Tashkinov 2020-03-11 14:22:24 UTC
(In reply to Hans de Goede from comment #11)
> Can you try adding "efi=debug earlyprintk=efi,keep" and see if that gives any (extra) output ?

For future reference the correct flags are

efi=debug earlycon=efifb keep_bootcon

Comment 68 Steve 2020-03-11 14:58:22 UTC
(In reply to Artem S. Tashkinov from comment #67)
> (In reply to Hans de Goede from comment #11)
> > Can you try adding "efi=debug earlyprintk=efi,keep" and see if that gives any (extra) output ?
> 
> For future reference the correct flags are
> 
> efi=debug earlycon=efifb keep_bootcon

Let's make sure to thank Arvind Sankar for noting the correct command-line parameters:
https://bugzilla.kernel.org/show_bug.cgi?id=206175#c36

They are documented here:

The kernel’s command-line parameters [for v5.5]
https://www.kernel.org/doc/html/v5.5/admin-guide/kernel-parameters.html

Comment 69 Steve 2020-03-12 06:36:40 UTC
This will be in v5.6-rc6:

driver code: clarify and fix platform device DMA mask allocation
author    Christoph Hellwig <hch>                 2020-03-11 17:07:10 +0100
committer Linus Torvalds <torvalds> 2020-03-11 09:30:27 -0700
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e3a36eb6dfaeea8175c05d5915dcf0b939be6dab

Comment 70 Artem S. Tashkinov 2020-03-12 11:49:11 UTC
Yeah, I know. I've helped fix close to a dozen issues in the Linux kernel now - I perfectly know how it all works.

What made me unpleasantly surprised is that Linus CC'ed Greg Kroah-Hartman and asked him to include the patch in stable and Greg has apparently forgotten about that which means I have to wait for at least a month longer before Fedora includes 5.6.

Or maybe Fedora kernel maintainers could include the patch in the 5.5 series? I've been running 5.3.x for far too long now.

Yeah, I know I can compile my own kernel but it takes a lot more effort to have it run under secure UEFI mode.

Comment 71 Steve 2020-03-12 18:54:23 UTC
(In reply to Artem S. Tashkinov from comment #70)
> Yeah, I know. I've helped fix close to a dozen issues in the Linux kernel
> now - I perfectly know how it all works.

If it makes you feel better, consider Comment 69 to be *for the record*.

> What made me unpleasantly surprised is that Linus CC'ed Greg Kroah-Hartman
> and asked him to include the patch in stable and Greg has apparently
> forgotten about that which means I have to wait for at least a month longer
> before Fedora includes 5.6.
...

If you can find another bug like yours, you might be able to make a stronger case for including the patch in "stable".

That code has been in the kernel for a long time, so someone else must have encountered it.

A web search for "RIP kmem_cache_alloc_trace" found this:

Kernel panic due to "kmem_cache_alloc+117 from mempool_alloc_slab" on RHEL 7
https://access.redhat.com/solutions/2149041

Comment 72 Artem S. Tashkinov 2020-03-12 20:46:20 UTC
(In reply to Steve from comment #71)
> If you can find another bug like yours, you might be able to make a stronger case for including the patch in "stable".

The author of the patch as well as Linux both have admitted the patch should have led to bugs and Christoph Hellwig was surprised I was the only person who had been hit but the change. If that's not a justification enough to fix this regression I don't know what is. Anyway, your stance is clear and I will patiently wait for the kernel 5.6 release in the stable/testing Fedora repos. As you can see I'm not trying to reopen this bug report even though it is _not_ fixed in Fedora (save for rawhide kernels which I prefer not to use). It sounds a little bit like "f off with you puny issues" but that's expected in the world of Linux. I've been using Linux since the late 90s and there's one particular thing I learned a very long time ago and which is abundantly clear: there's almost no accountability in Open Source except for certain products like RHEL (which costs money and comes with certain warranties and reputation): developers introduce changes without properly testing them all the time. Even when you find such _mission critical_ regressions you're told:

> If you can find another bug like yours, you might be able to make a stronger case for including the patch in "stable".

Enough, Steve. I'm sorry for wasting everyone's time here in RedHat's bugzilla. I should have left you alone. In the future I will try not to file any bug reports against the kernel here, thank you very much.

Again, my apologies for wasting your disk storage, network capacity and CPU time. It's my last comment in this bug report.

Comment 73 Steve 2020-03-12 21:00:20 UTC
(In reply to Artem S. Tashkinov from comment #72)
...
> ... even though it is _not_ fixed in Fedora ...

Well, it's not fixed in stable, and there is zero chance that Fedora is going to apply a patch on anything as critical DMA code, so your original position that it needs to be applied upstream is exactly right. So I was *agreeing* with you.

> In the future I will try not to file any bug reports against the kernel here, thank you very much.

You obviously know enough to file bug reports at bugzilla.kernel.org, so that would actually be more efficient in your case.

If you look at other kernel bug reports on the Fedora BZ, you will see that not everyone is at your level of expertise.

See, for example:

Bug 1812703 - Package kernel-core is changed in repository

Comment 74 Hans de Goede 2020-03-12 22:07:19 UTC
(In reply to Artem S. Tashkinov from comment #72)
> Enough, Steve. I'm sorry for wasting everyone's time here in RedHat's bugzilla.

Please note that AFAIK Steve does not work for Red Hat and AFAICT is also not really active as a Fedora kernel contributor (sorry in advance if I'm mistaken here Steve). With that said I believe 100% that Steve was trying to help.

I do work for Red Hat and do occasionally contribute / add downstream patches to the Fedora kernel package. As such I have been following this bug with great interest because bugs which cause systems to not boot are *BAD*.

To quote from my last weekly status report which I send out internally every Wednesday:

-Regressions:
 -Kernel 5.4 and later not booting on some HP 360 models
   1790115 - [CRITICAL REGRESSION] kernel >= 5.4 is not bootable
  -Thanks to some heroic efforts from one of the reporters, doing both a git
   bisect and taking the discussion upstream to get this resolved, we now have
   a patch fixing this, this should show up in the next upstream 5.5.z update

So Artem, your efforts on this are very much appreciated, and keeping us informed about this bug through this bugzilla is also appreciated. Now as for getting the patch upstream; or added as a downstream patch to the Fedora kernels:

You mentioned in comment 70 that you were disappointed that Greg (gkh) did not add the patch to 5.5.9, but the rule is that patches must show up in Linus' tree before they are backported. Looking at the timestamp that 5.5.9 was tagged:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-5.5.y

And the timestamp that the fix was added to Linus' tree:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e3a36eb6dfaeea8175c05d5915dcf0b939be6dab

There was an aprox. 10 hour window where Greg could have picked up the patch, but I guess that he had already prepared most of 5.5.9 before the commit got added to Linus' tree and was probably waiting for some tests to complete running before pushing it out. So I would expect it to show up in 5.5.10.

In the mean time I will add this as a downstream patch, since as said having some systems not boot at all is BAD.

Comment 75 Steve 2020-03-13 01:49:29 UTC
(In reply to Hans de Goede from comment #74)
...

I sometimes work on Fedora bugs *voluntarily*, but I am just a Fedora user and occasional tester. Some reporters really just need tech support. See, for example:

Bug 1783358 - My Fedora 31 doesn't start network after system-upgrade from 30 to 31

And, yes, I consider that *work*.

> ... the rule is that patches must show up in Linus' tree before they are backported.

Thanks for pointing that out. I ran off the rails when I mistakenly assumed that politics are involved, when there is really a simple process:

mainline -> stable

> In the mean time I will add this as a downstream patch, since as said having some systems not boot at all is BAD.

Thanks. You proved me wrong about what can be accepted as a Fedora patch, ... :-)

but I will test when a build is available, although I am not seeing this boot failure.

Can you suggest any special tests that would execute the code in the patch or is booting and using the system sufficient?

Comment 76 Hans de Goede 2020-03-13 10:20:13 UTC
(In reply to Steve from comment #75)
> (In reply to Hans de Goede from comment #74)
> ...
> 
> I sometimes work on Fedora bugs *voluntarily*, but I am just a Fedora user
> and occasional tester. Some reporters really just need tech support.

Right, and your contributions are very much appreciated, thank you for making Fedora better.

> > In the mean time I will add this as a downstream patch, since as said having some systems not boot at all is BAD.
> 
> Thanks. You proved me wrong about what can be accepted as a Fedora patch,
> ... :-)

The rules for what will be included in a Fedora kernel are somewhat fluid. It helps a lot if a patch is in mainline, it also helps if a patch is heading towards the next stable release.

Put sometimes we add patches which are not in mainline yet, but which are hopefully on their way there (e.g. they are already in linux-next).

In case of doubt I just send a mail to the Fedora-kernel mailinglist to discuss things there.

> Can you suggest any special tests that would execute the code in the patch
> or is booting and using the system sufficient?

Just booting the system is the best test I can come up with for this patch.

Comment 77 Hans de Goede 2020-03-13 14:46:06 UTC
Ok I've added the patch for this to Fedora's distgit for the f30 5.5.x and f31 5.5.x kernels (5.6 in f32+ will get it from the mainline tree).

This will be in the next official Fedora kernel build for f30 + f31, either 5.5.9-201.fc31 or 5.5.10 .

Comment 78 Steve 2020-03-13 17:32:41 UTC
(In reply to Hans de Goede from comment #77)
> Ok I've added the patch for this to Fedora's distgit for the f30 5.5.x and
> f31 5.5.x kernels (5.6 in f32+ will get it from the mainline tree).
> 
> This will be in the next official Fedora kernel build for f30 + f31, either
> 5.5.9-201.fc31 or 5.5.10 .

Thanks for your explanation of the Fedora patch process and for adding the patch.

Comment 79 Fedora Update System 2020-03-18 18:49:02 UTC
FEDORA-2020-fee107f027 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2020-fee107f027

Comment 80 Fedora Update System 2020-03-18 18:49:03 UTC
FEDORA-2020-aabfec096f has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-aabfec096f

Comment 81 Fedora Update System 2020-03-19 03:02:25 UTC
kernel-5.5.10-100.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-fee107f027

Comment 82 Fedora Update System 2020-03-19 03:14:23 UTC
kernel-5.5.10-200.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-aabfec096f

Comment 83 Fedora Update System 2020-03-21 03:15:52 UTC
kernel-5.5.10-100.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 84 Fedora Update System 2020-03-21 03:48:48 UTC
kernel-5.5.10-200.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.