Bug 1753033 - Unable to boot flash drive to install Fedora 31 Beta
Summary: Unable to boot flash drive to install Fedora 31 Beta
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 31
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-17 21:21 UTC by Henrique C. S. Junior
Modified: 2020-11-24 16:57 UTC (History)
27 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-24 16:57:43 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg (89.08 KB, text/plain)
2019-09-23 20:20 UTC, Henrique C. S. Junior
no flags Details
lspci (34.10 KB, text/plain)
2019-09-23 20:21 UTC, Henrique C. S. Junior
no flags Details
screenshot_error (983.46 KB, image/png)
2019-09-23 20:57 UTC, Henrique C. S. Junior
no flags Details
ABRT_log (22.46 KB, application/x-xz)
2019-09-23 20:58 UTC, Henrique C. S. Junior
no flags Details
dmesg from kernel 5.4 (137.40 KB, text/plain)
2019-09-24 13:17 UTC, Henrique C. S. Junior
no flags Details

Description Henrique C. S. Junior 2019-09-17 21:21:11 UTC
Sorry if Anaconda is not the right component here.
I'm excited with Fedora 31 but unfortunately, I'm unable to test the beta because, after creating a bootable USB stick I'm facing an unresponsive black screen. I can't even provide some debug info. In fact, only restarting by pressing the power button works.
I have tried the live media and netinst. Basic video settings and troubleshooting arent working either.
Here is my hardware:

Dell Inspiron 15 5567
CPU: Intel Core i7-7500U
GPU: AMD Radeon R7 M445 (2GB GDDR5)
Display: 15.6”, Full HD (1920 x 1080), IPS
Storage: 1000GB HDD
RAM: 8GB DDR4, 2400 MHz

Comment 1 Chris Murphy 2019-09-23 18:40:49 UTC
Two options:

A. Netinstall: edit the menu entry using 'e' key, boot with option 'inst.sshd' by adding it without quotes at the end of the 'linux' line, and now you should be able to remotely login a root via ssh without a password - if the boot gets as far as launching Anaconda.

B. Boot with parameter 'systemd.debug-shell' and try to change to tty9 using ctrl-alt-f9 - that will present a privileged shell. The tricky part will be getting something out of the system. If it's wired networking and networking is up, you can either use netcat or fpaste; if not then mount a 2nd USB stick and write out the logs we need:

# journalctl -o short-monotonic > /mnt/bug1753033-journal.txt

It's plausible you can write the journal to the installer USB stick (I haven't recently tried this) by mounting the FAT EFI system partition, that's the only partition I'd expect to be writable. Of course once you do this, that stick will no longer pass media verification.

Comment 2 Henrique C. S. Junior 2019-09-23 19:44:05 UTC
Hi, Chris,
I have something new to add.
Today I tried upgrading F30 to F31 by DNF and it turns out that If I select the F31 kernel (5.3.0-1) I get the same black, dead, screen. But the previous kernels from F30 that are still there are working well (5.2.15-200 and 5.2.13-200).

So it looks like a kernel issue, right?

Comment 3 Chris Murphy 2019-09-23 20:01:30 UTC
That does suggest it's kernel related.

I suggest booting with both kernels and capturing dmesg and attaching both to this bug report, as well as attach the output of 'lspci -vvnn' as a file.

For the black screen case with kernel 5.3.0, can you remotely login via ssh, or does the system seem dead? What happens if you edit the menu entry and remove 'rhgb quiet' parameters? There should be some hint displayed on screen but the ideal case is to get booted, and then login remotely, or switch to tty9 by using boot param 'systemd.debug-shell=1'

Comment 4 Henrique C. S. Junior 2019-09-23 20:19:39 UTC
(In reply to Chris Murphy from comment #3)
> That does suggest it's kernel related.
> 
> I suggest booting with both kernels and capturing dmesg and attaching both
> to this bug report, as well as attach the output of 'lspci -vvnn' as a file.
> 
> For the black screen case with kernel 5.3.0, can you remotely login via ssh,
> or does the system seem dead? What happens if you edit the menu entry and
> remove 'rhgb quiet' parameters? There should be some hint displayed on
> screen but the ideal case is to get booted, and then login remotely, or
> switch to tty9 by using boot param 'systemd.debug-shell=1'

I tried removing 'rhgb quiet' on 5.3.0 but the computer is completely dead. All I can do is push the power button.
Attached goes dmesg and lspci.

Comment 5 Henrique C. S. Junior 2019-09-23 20:20:25 UTC
Created attachment 1618349 [details]
dmesg

Comment 6 Henrique C. S. Junior 2019-09-23 20:21:03 UTC
Created attachment 1618350 [details]
lspci

Comment 7 Henrique C. S. Junior 2019-09-23 20:53:48 UTC
I don’t know if this is in any way related to this problem or if it can help to find a solution now but, starting several releases ago (I have reported the error before), every time I boot into fedora I receive an error message that looks like the attached screenshot (screenshot.png). I usually test other distros on my notebook; recently I have used Manjaro and just removed openSUSE Tumbleweed. Fedora is the only showing this message every single time that I log-in. It is annoying, but never really impacted on the systems performance or showed any signal that something bad was really happening. I’m using Dell notebooks for more than a decade and the same message appeared on my last two notebooks using Fedora.
Attached is the full ABRT log.

Comment 8 Henrique C. S. Junior 2019-09-23 20:57:33 UTC
Created attachment 1618355 [details]
screenshot_error

Comment 9 Henrique C. S. Junior 2019-09-23 20:58:06 UTC
Created attachment 1618356 [details]
ABRT_log

Comment 10 Chris Murphy 2019-09-23 21:32:57 UTC
Probably the mce hardware errors. Not related to this but worth filing a bug with Dell.

From the lspci, this is dual headed hardware. It has both i915 and amdgpu graphics drivers loaded. If you see literally nothing at all after booting without 'quiet rhgb' then it's a deep dive. You'll need to search bugzilla.redhat.com, lkml, and maybe even bugzilla.kernel.org, looking for your make/model hardware, or at least the AMD GPU and see if you can find recent regressions related to them.

The next level of testing is to download older kernels from koji, and try to find out when this problem happened, between 5.2.x and 5.3.0.
https://koji.fedoraproject.org/koji/packageinfo?packageID=8

I would go to the first rc0 kernel
https://koji.fedoraproject.org/koji/buildinfo?buildID=1310532

You need kernel, kernel-core, kernel-modules, for your arch (x86_64). If the problem does NOT happen there, then you can try some newer kernel like rc6. Basically you want to keep splitting the remaining options by half, rather than do them in order, it'll save time. But you'll want a notepad to keep track of the kernel versions, you need to find out the two kernels closest to each other where the problem does not happen, and does happen.

Sounds tedious, and it is, but you're in luck that you've got such a reproducible regression. Usually those must be fixed. But the only way it'll get fixed is to find out exactly what commit broke it, and the kernel bisect is the indisputable way to find blame and get it fixed. And that's why I recommend looking for this bug elsewhere before you do all this bisect work.

Comment 11 Henrique C. S. Junior 2019-09-23 23:28:19 UTC
So, I have tested some more kernels:
5.4.0-0.rc0.git3 X
5.3.1-100 x
5.3.0-0.rc0.git1.1.fc31 x
5.2.16-200.fc30 ok
5.2.15-200 ok
5.2.13-200 ok

It looks like the issue starts on the first 5.3 and is still present at 5.4. And I found this issue here that may be related: https://bugzilla.kernel.org/show_bug.cgi?id=204725 but it is far beyond what I can understand.

Comment 12 Henrique C. S. Junior 2019-09-23 23:31:14 UTC
It is so weird... isn't 5.3 the kernel that improved AMD GPU support? https://www.theinquirer.net/inquirer/news/3081637/linux-kernel-53-released

Comment 13 Chris Murphy 2019-09-24 00:14:59 UTC
Regressions happen, they're definitely unintended.

From the upstream bug
"machine IS responsive after start, just not the screen."

If you have the same problem as this bug you'd be able to ssh in and capture dmesg, compared and see if you've got the same problem. But I think that is not your bug because they have RX5700 CPUs, and I don't see your GPU listed.

At this point I'd suggest a kernel bisect. If you haven't done one before I'll give you some tips. Basically I follow this:
https://kernelnewbies.org/KernelBuild

follow this part "Downloading the latest -rc tree" you don't want stable, 

follow this part "Duplicating your current config" I just copy the /boot/config.* file from Fedora for the closest kernel (I'd use 5.2.17 actually because most of the bisection is going to be happening on top of 5.2 stuff) as a template

skip the defconfig, menuconfig stuff - and I do 'make localmodconfig' which will only build modules you currently have running, it'll speed up compile time tremendously, but you won't have a complete kernel - just use it for testing this problem, if you connect some hardware or try to use a file system you weren't using at the time you run this localmodconfig command, it won't work.

$ git bisect start
$ git bisect good 0ecfebd2b524
$ git bisect bad 5ad18b2e60b7

So I get those good and bad commits from, '$ git rev-list -n 1 v5.2' which shows the commit for 5.2, we know that works; and then from koji if you scroll down to the changelog for the rc0 kernel you'll see "- Linux v5.2-915-g5ad18b2e60b7" and that last bit starting with g is the git commmit for that kernel.

Now it will switch to some commit, you build the kernel, follow the newbies thing above to install it and modules, which will fail with an error, you'll need to use dracut manually to build an initramfs, then duplicate /boot/loader/entries/pickany and rename it something semisane, and edit it to point to your new vmlinuz and initramfs. Reboot, pick that kernel...

If it works tell git:
$ git bisect good
If it didn't work tell git:
$ git bisect bad

And it tracks that. Changes commits, which changes the code in the files, and you just

$ make -j4
$ sudo make modules_install install
$ dracut -f /boot/initramfs 5.2.0+ ##whatever the dir name is in /lib/modules that doesn't have fc31 in the name

I just use generic 'vmlinux' and 'initramfs' names for this, then I don't have to change /boot/loader/entries/*conf file each time, it'll just keep picking the newly build kernel and initramfs.

And you do this maybe a 8-11 times, and at the end it will tell you the first bad commit. And that is what you put in this bug report and also to really get attention you'll make an upstream bug also, and flag it as a regression. Seriously worth to see if anyone else is having this problem before you do this much work! I've been there. Chances are you aren't the first person with this problem but you might be the first reporting it, most people give up and just keep using an older kernel for a while hoping it eventually gets found and fixed. Chances are that will happen but...

Comment 14 Henrique C. S. Junior 2019-09-24 01:18:32 UTC
Thanks for putting all that info together, Chris, I'll do the best that I can to sort this out (I've been using every Fedora release since FC3, it would be sad to be forced to leave my distro at this point).
I'll do a better research first, as you suggested and then I'll see about the kernel bisect, but I don't know if I have the skills to do so or even the time to go through it all since I'm finishing my Ph.D. and I can't really afford to stop my thesis for more than a week.

Comment 15 Henrique C. S. Junior 2019-09-24 02:28:54 UTC
I was noticing that when booting F31 the Hard disk was not spinning. Instead it stops just after grub finishes with a small `hiccup` while in F30, after grub, the HD produces that characteristic mechanic `purr`. So, I decided to disable UEFI and I was able to boof my F31 USB stick normaly.
So, it looks like video is not the problem, but UEFI!

Comment 16 Chris Murphy 2019-09-24 04:12:10 UTC
No that's misleading. You can't really disable UEFI. What you've done is enabled a compatibility support module which presents a faux BIOS to the kernel and it's probably masking the kernel bug related to your hardware. My suggestion is to stick with UEFI enabled, and use the older kernel until you have time to do a bisect, or wait for someone else to discover the regression.

Comment 17 Henrique C. S. Junior 2019-09-24 13:16:40 UTC
Hi, Chris!
I have no reasonable explanation to give... I have now disabled the legacy boot options on my BIOS, just UEFI is active as it was before and, suddenly, F31 started booting normally! Even the USB stick is working as it was supposed to do. (I can't complain haha).
I'm attaching the dmesg here from kernel 5.4 but I'll be reinstalling very soom to get a crean install for F31.

Comment 18 Henrique C. S. Junior 2019-09-24 13:17:37 UTC
Created attachment 1618605 [details]
dmesg from kernel 5.4

Comment 19 Michal Terepeta 2019-11-03 16:42:44 UTC
Hi,

I think I've encountered the same issue. My symptoms:

- Booting kernel 5.3 on F30 or the F31 USB installer results in blank screen and
  no activity and no reaction to anything I do.

- The 5.2 kernels on F30 work just fine.

I've done the bisection (thanks for the detailed instructions!). And this is
what I got:

```
c522ad0637cacca1775a3849c2b554f46577b98d is the first bad commit
commit c522ad0637cacca1775a3849c2b554f46577b98d
Author: Erik Schmauss <erik.schmauss>
Date:   Wed Jul 3 13:15:39 2019 -0700

    ACPICA: Update table load object initialization

    ACPICA commit c7ef9f3526765bed8930825dda1eed1a274b9668

    Use the common internal "initialize objects" interface
    Affects:
     Load()
     load_table()
     acpi_load_table

    Link: https://github.com/acpica/acpica/commit/c7ef9f35
    Signed-off-by: Bob Moore <robert.moore>
    Signed-off-by: Erik Schmauss <erik.schmauss>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki>

:040000 040000 0068e4aad7eb3ec7f429a5c1f2bc939e5e9c39d5 6d6f31045cac48ce43485e76d325ab4641053eca M      drivers
```

I've also tried to the current 5.4 tree and it still doesn't work for me.

Comment 20 Michal Terepeta 2019-11-14 19:39:50 UTC
This is interesting -- I "fixed" the problem by updating the BIOS (x299 motheboard). It seems that ASUS released a new one about a month ago and now everything works now without issues.
Credit for the idea of updating the  BIOS goes to: https://askubuntu.com/a/1182990

Comment 21 Ben Cotton 2020-11-03 16:47:16 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 22 Ben Cotton 2020-11-24 16:57:43 UTC
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.