Bug 1817055 - Does not boot on libvirt/kvm
Summary: Does not boot on libvirt/kvm
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-25 13:42 UTC by buzire.rhn
Modified: 2022-03-06 06:36 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-26 18:41:30 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Used GRUB command line (6.70 KB, image/png)
2020-03-28 13:14 UTC, buzire.rhn
no flags Details
Boot log from working kernel (65.90 KB, text/plain)
2020-03-28 16:56 UTC, buzire.rhn
no flags Details

Description buzire.rhn 2020-03-25 13:42:35 UTC
1. Please describe the problem:

After applying the last batch of upgrades from dnf-automatic including the kernel upgrade from 5.4.12-100.fc30 to 5.5.10-100.fc30, the VM guest hangs after GRUB.

The screen displays:

```
_
```

A CPU core is busy.

2. What is the Version-Release number of the kernel:

5.5.10-100.fc30

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

Works with 5.4.12-100.fc30 and 5.4.10.
First noticed: 5.5.10-100.fc30

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

Steps may depend on the environment, but:

 1. Reboot the VM in any way
 2. Wait for GRUB
 3. Select the 5.5 kernel

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

I'd love to test, but:

```
Importing GPG key 0xCFC659B9:
 Userid     : "Fedora (30) <fedora-30-primary>"
 Fingerprint: F1D8 EC98 F241 AAF2 0DF6 9420 EF3C 111F CFC6 59B9
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-30-x86_64
Is this ok [y/N]: y
Key imported successfully
Import of key(s) didn't help, wrong key(s)?
Public key for kernel-5.6.0-0.rc7.git0.1.fc33.x86_64.rpm is not installed. Failing package is: kernel-5.6.0-0.rc7.git0.1.fc33.x86_64
```

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Didn't manage to get any logs even with debug on the command line.

Extra info:

All tested kernels have a short pause at the same point, when only `_` is present on the display (assuming decompression), but the 5.5 one doesn't get out of it after a few minutes. It keeps using a CPU core during that time.

The hypervisor is CentOS 6, with libvirt/KVM, and using UEFI via OVMF. The underlying host is AMD Sempron(tm) 3850.

Comment 1 Steve 2020-03-25 21:52:44 UTC
(In reply to buzire.rhn from comment #0)
...
> Public key for kernel-5.6.0-0.rc7.git0.1.fc33.x86_64.rpm is not installed.
> Failing package is: kernel-5.6.0-0.rc7.git0.1.fc33.x86_64
...

You are running F30, which probably doesn't have the F33 keys installed, but it should have the F32 keys:

$ rpm -q fedora-gpg-keys
fedora-gpg-keys-30-2.noarch

$ rpm -ql fedora-gpg-keys | grep 32

If you are just trying to test the latest kernel, try the F32 version from here, instead:

https://bodhi.fedoraproject.org/updates/?packages=kernel

Comment 2 Steve 2020-03-25 22:18:01 UTC
(In reply to Steve from comment #1)
...
> If you are just trying to test the latest kernel, try the F32 version from here, instead:
> 
> https://bodhi.fedoraproject.org/updates/?packages=kernel

The key import succeeded when I tried this with F30:

# dnf update kernel --releasever=32

And this is what was installed:

# dnf -q repoquery kernel --releasever=32
kernel-0:5.6.0-0.rc7.git0.2.fc32.x86_64

Comment 3 Steve 2020-03-25 22:33:37 UTC
Yet another kernel to try:

# dnf -q repoquery kernel --repo=updates-testing
kernel-0:5.5.11-100.fc30.x86_64

# dnf update kernel --enablerepo=updates-testing

Comment 4 buzire.rhn 2020-03-28 12:04:03 UTC
I tried both 5.6.0-0.rc7.git0.2.fc32 and kernel-5.5.13-100.fc30.x86_64 and they behave the same as the original, hangup.

Comment 5 Steve 2020-03-28 12:40:38 UTC
(In reply to buzire.rhn from comment #4)
> I tried both 5.6.0-0.rc7.git0.2.fc32 and kernel-5.5.13-100.fc30.x86_64 and they behave the same as the original, hangup.

Thanks for testing. If the "rhgb quiet" options are on the kernel command-line, could you try removing those options in grub2 before booting. That might show more about what happens before the hang.

Pressing the "Esc" key while booting will also show boot messages.

Please attach a screenshot if any boot messages are displayed in the VM.

Comment 6 buzire.rhn 2020-03-28 13:14:09 UTC
Created attachment 1674282 [details]
Used GRUB command line

The command line I've been using. No quieting option as far as I can see.

Comment 7 Steve 2020-03-28 16:08:05 UTC
(In reply to buzire.rhn from comment #6)
> Created attachment 1674282 [details]
> Used GRUB command line
> 
> The command line I've been using. No quieting option as far as I can see.

OK, it sounds like the hang occurs before any boot messages are displayed. Is that correct?

In your original report you describe this message: "A CPU core is busy."

Where did you see that? I cannot find that string in the kernel code, so it is not clear where it is coming from. If it is on the VM display, could you please attach a screenshot showing it?

Comment 8 buzire.rhn 2020-03-28 16:13:58 UTC
Yes, that was observed on the VM graph. I'm not sure if it's exactly one core, because that graph is super imprecise. I'll get some shots later.

Comment 9 Steve 2020-03-28 16:35:10 UTC
(In reply to buzire.rhn from comment #8)
> Yes, that was observed on the VM graph. I'm not sure if it's exactly one core, because that graph is super imprecise. I'll get some shots later.

Thanks for your clarification. If the message is not from the kernel, there is no need to attach a screenshot.

Instead, could you boot from the last working kernel and attach a log for it:

$ journalctl --no-hostname -k > dmesg.txt

Comment 10 buzire.rhn 2020-03-28 16:56:12 UTC
Created attachment 1674335 [details]
Boot log from working kernel

Comment 11 buzire.rhn 2020-03-28 16:57:34 UTC
For clarification, the only thing that is displayed is the underscore (I put it in the quotes in the original report).

Comment 12 Steve 2020-03-28 17:37:33 UTC
(In reply to buzire.rhn from comment #11)
> For clarification, the only thing that is displayed is the underscore (I put it in the quotes in the original report).

Thanks for attaching the log and for the clarification.

Do you know about this:

$ grep 'corrupt' dmesg.txt 
Mar 28 14:07:39 kernel: FAT-fs (vda1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.

For the record:

Mar 28 14:07:11 kernel: Command line: BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.4.12-100.fc30.x86_64 root=UUID=5897537b-c1c8-40d4-bfb6-25e160c0e877 ro rootflags=subvol=root vconsole.font=latarcyrheb-sun16

Comment 13 buzire.rhn 2020-03-28 17:53:03 UTC
Yes, that's the EFI boot volume. It doesn't really have an opportunity to get corrupted, so I ignored the warning.

For the sake of demonstration, I ran fsck, agreed to clear the dirty bit, rebooted and observed the hang.

Comment 14 Steve 2020-03-28 18:14:48 UTC
(In reply to buzire.rhn from comment #13)
> Yes, that's the EFI boot volume. It doesn't really have an opportunity to
> get corrupted, so I ignored the warning.
> 
> For the sake of demonstration, I ran fsck, agreed to clear the dirty bit, rebooted and observed the hang.

Thanks. The last early boot hang that I looked at required kernel bisection by the reporter.*

Since this is in a VM, there may be a way to get more info from the kernel, but I will have to defer to the kernel experts on how to do that.

You could try the suggestions in Bug 1790115, Comment 20.

If you want to try kernel bisection, there is info here:

Bisecting a bug
https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html

* Bug 1790115 - [CRITICAL REGRESSION] Fedora's configuration of kernel >= 5.4 is not bootable

Comment 15 Steve 2020-03-28 18:32:28 UTC
(In reply to buzire.rhn from comment #0)
...
> The hypervisor is CentOS 6, with libvirt/KVM, and using UEFI via OVMF. The underlying host is AMD Sempron(tm) 3850.

For completeness, could you post the version info for your VM components:

$ rpm -q libvirt qemu-kvm # If those are the packages used in your VM configuration.

Comment 16 buzire.rhn 2020-03-28 18:35:34 UTC
I'm not going to have the time to bisect for the next week or so, but if nothing changes, I'll spin up a clone of the VM and then do it.

What is the config for Fedora kernels that I can use while bisecting? Is make rpm still the recommended way to build them?

$ rpm -q libvirt qemu-kvm
libvirt-4.5.0-23.el7_7.6.x86_64
qemu-kvm-1.5.3-167.el7_7.4.x86_64

Comment 17 Steve 2020-03-28 20:37:00 UTC
(In reply to buzire.rhn from comment #16)
> I'm not going to have the time to bisect for the next week or so, but if nothing changes, I'll spin up a clone of the VM and then do it.

Before bisecting, it would be a good idea to bracket the problem more narrowly by testing with Fedora kernels from koji:
https://koji.fedoraproject.org/koji/packageinfo?packageID=8

Downloading "kernel-core" and "kernel-modules" should be sufficient.

They have to be installed manually:

# dnf install kernel*.rpm

NB: For the Fedora gitN builds, the changelog has a commit id in the upstream kernel repo:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/

> What is the config for Fedora kernels that I can use while bisecting? Is make rpm still the recommended way to build them?

I'm not sure if bisecting the Fedora kernel or the upstream kernel would be better.

Anyway, here are some instructions for building the Fedora kernel, although the first package to install should be "fedora-packager", which pulls in the "fedpkg" package. "fedpkg" is partly a front-end for "git" and is documented with "--help" and with "man fedpkg".

Building the Fedora Kernel
Posted by Laura Abbott	on September 6, 2016
https://fedoramagazine.org/building-fedora-kernel/

> $ rpm -q libvirt qemu-kvm
> libvirt-4.5.0-23.el7_7.6.x86_64
> qemu-kvm-1.5.3-167.el7_7.4.x86_64

Thanks.

Comment 18 Steve 2020-03-28 22:09:54 UTC
(In reply to Steve from comment #17)
...
> > What is the config for Fedora kernels that I can use while bisecting? Is make rpm still the recommended way to build them?
> 
> I'm not sure if bisecting the Fedora kernel or the upstream kernel would be better.
...

I should have checked first, because the Fedora kernel git repo is only for Fedora kernel packages, so it can't be used for bisecting.

As for the Fedora kernel config files, they can be found in the Fedora kernel git repo:
https://src.fedoraproject.org/rpms/kernel.git

There are branches for each Fedora release:
https://src.fedoraproject.org/rpms/kernel/branches/

The Fedora kernel config files are at the root of the tree:
https://src.fedoraproject.org/rpms/kernel/blob/f30/f/kernel-x86_64-fedora.config

I don't really know how to pick a specific version, so I would suggest using the latest version for a release:

$ git branch
* (HEAD detached at origin/f30)
  master

$ git checkout origin/f32
Previous HEAD position was 53101c44d Linux v5.5.13
HEAD is now at 867ad4322 Linux v5.6-rc7

$ git branch
* (HEAD detached at origin/f32)
  master

Comment 19 buzire.rhn 2020-03-30 13:12:56 UTC
I decided to give it a try after updating/rebooting the hypervisor, and now I'm successfully running 5.6.0. The downside is that I don't know what the previous hypervisor setup was. I could reconstruct it using yum history if there's interest.

Comment 20 Steve 2020-03-30 15:24:55 UTC
(In reply to buzire.rhn from comment #19)
> I decided to give it a try after updating/rebooting the hypervisor, and now
> I'm successfully running 5.6.0. The downside is that I don't know what the
> previous hypervisor setup was. I could reconstruct it using yum history if
> there's interest.

Thanks for your followup report.

If, after further testing, everything seems to be working as expected, could you close this bug?

If the problem reoccurs, please update this bug report or open a new bug report.

I'm not sure what resolution to use, but CURRENTRELEASE seems to be about right:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_status

Comment 21 buzire.rhn 2020-03-31 13:56:44 UTC
I'm going to close this, the server that was affected is not one that should be rebooted outside updates.

If the problem reappears, I'll reopen.

Comment 22 buzire.rhn 2020-04-17 11:16:52 UTC
I had the same issue again. It turns out that it's not the hypervisor upgrade which "helped", but the reboot.

In both cases where the newer kernels don't boot, the hypervisor booted from a hard reset.

The only shared part between the hypervisor and guest is that the hypervisor passes a LUKS device to the guest, but that is a data volume, and shouldn't be in action before well into the boot process.

I'll try to reproduce / bisect this, but seeing that a hard boot is required, I may have trouble finding hardware I can use for this.

Comment 23 Ben Cotton 2020-04-30 20:12:47 UTC
This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 24 Ben Cotton 2020-05-26 18:41:30 UTC
Fedora 30 changed to end-of-life (EOL) status on 2020-05-26. Fedora 30 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 25 buzire.rhn 2022-03-06 06:36:18 UTC
This is still a problem on F34.


Note You need to log in before you can comment on or make changes to this bug.