1815102 – Kernel panic during boot on kernel-5.6.0-0.rc5.git0.2.fc32.x86_64

Bug 1815102 - Kernel panic during boot on kernel-5.6.0-0.rc5.git0.2.fc32.x86_64

Summary: Kernel panic during boot on kernel-5.6.0-0.rc5.git0.2.fc32.x86_64

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	32
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-19 13:37 UTC by tomasy
Modified:	2020-05-04 20:26 UTC (History)
CC List:	19 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-05-04 20:26:28 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Kernel Panic screen message (3.69 MB, image/jpeg) 2020-03-19 13:37 UTC, tomasy	no flags	Details
Boot waits in resuce mode (4.04 MB, image/jpeg) 2020-03-20 09:58 UTC, tomasy	no flags	Details
cat dmesg-1.txt \| egrep -i 'error\|fail' > dmesg-1-error_fail.txt (27.21 KB, text/plain) 2020-03-20 22:30 UTC, tomasy	no flags	Details
journalctl -b \| grep AVC > avc-1.txt (56.18 KB, text/plain) 2020-03-22 15:50 UTC, tomasy	no flags	Details
rpm -Va (8.16 KB, text/plain) 2020-03-22 21:50 UTC, tomasy	no flags	Details
grubenv-bootable (1.00 KB, text/plain) 2020-03-24 07:29 UTC, tomasy	no flags	Details
grubenv-non-bootable (1.00 KB, text/plain) 2020-03-24 07:31 UTC, tomasy	no flags	Details
View All

Description tomasy 2020-03-19 13:37:17 UTC

Created attachment 1671455 [details]
Kernel Panic screen message

1. Please describe the problem:
Upgraded from f31 by using dnf system-upgrade. After the last step I got a Kernel Panic message without any chance to rollback


2. What is the Version-Release number of the kernel:
kernel-5.6.0-0.rc5.git0.2.fc32.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
I did teh upgarde from 5.5.8-200.fc31.x86_64


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Kernel Panic every time after boot.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Not done

6. Are you running any modules that not shipped with directly Fedora's kernel?:
Not what I remember

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.
Attach screen

Comment 1 Steve 2020-03-19 17:33:51 UTC

(In reply to tomasy from comment #0)
...
> I did teh upgarde from 5.5.8-200.fc31.x86_64
...

That kernel should still be on your system, but you have to get to the grub2 menu to boot it.

While booting, try repeatedly pressing the "Esc" key.

For more options, see:

4.How to access the GRUB menu when hidden
https://hansdegoede.livejournal.com/19081.html

Comment 2 Steve 2020-03-19 17:49:12 UTC

(In reply to tomasy from comment #0)
...
> Kernel Panic every time after boot.
...

Hans: Shouldn't the grub2 menu be displayed automatically in this scenario?

Comment 3 Hans de Goede 2020-03-19 17:54:56 UTC

(In reply to Steve from comment #2)
> Hans: Shouldn't the grub2 menu be displayed automatically in this scenario?

Yes it should, note though that there is no info in this bug that it is not showing up after boot.

Maybe tomasy just didn't realize that he could workaround this by selecting the old kernel ?

Comment 4 Steve 2020-03-19 18:02:59 UTC

(In reply to Hans de Goede from comment #3)
> (In reply to Steve from comment #2)
> > Hans: Shouldn't the grub2 menu be displayed automatically in this scenario?
> 
> Yes it should, note though that there is no info in this bug that it is not
> showing up after boot.
> 
> Maybe tomasy just didn't realize that he could workaround this by selecting
> the old kernel ?

tomasy: Do you see the grub2 menu when you boot?

Comment 5 Steve 2020-03-19 22:28:10 UTC

Transcription from attached screenshot:

RIP: 0010:acpi_ps_peek_opcode+0x9/0x1a

Comment 6 tomasy 2020-03-20 07:44:45 UTC

I have tried to boot with the old kernel but it did not work. I get a Kernel Panic error in that case too.

Comment 7 Steve 2020-03-20 08:04:56 UTC

(In reply to tomasy from comment #6)
> I have tried to boot with the old kernel but it did not work. I get a Kernel Panic error in that case too.

Can you boot the rescue kernel? (It's usually at the bottom of the grub2 menu and has a name like "Fedora (0-rescue-...) ...")

Also, if you have an F31 live image on a USB flash drive, or a DVD, can you try booting that?

Comment 8 tomasy 2020-03-20 08:12:03 UTC

I have tried with the rescue too but it hangs after a while. No kernel panic. I have tried all kernels all. I get kernel panic from all excepth resuce.
I can boot from a Fedora Live

Comment 9 Steve 2020-03-20 08:20:31 UTC

(In reply to tomasy from comment #8)
> I have tried with the rescue too but it hangs after a while. No kernel panic.
> I have tried all kernels all. I get kernel panic from all excepth resuce.

Where does the rescue kernel hang? What does the last line say? (Or better, attach another screenshot, if possible.)

> I can boot from a Fedora Live

That's good. Which release?

Comment 10 Steve 2020-03-20 08:28:54 UTC

(In reply to tomasy from comment #8)
...
> I can boot from a Fedora Live

After booting from the live image, open a terminal window and run "lsblk -mf". That should show all your partitions and file systems. Make sure none are showing as mounted (IIRC, the live images don't mount other file systems.)

After that, try to run fsck on the file systems to make sure they are not corrupt:

# e2fsck /dev/...

Comment 11 Steve 2020-03-20 08:38:40 UTC

Could you explain the status of the system -- does it have critical data on it or is it just for testing?

And do you have enough room to install another copy of F31 on it? (in a separate 16GB partition, say)

Comment 12 tomasy 2020-03-20 09:55:22 UTC

I boot now from F31 Live Beta (via ipxe)
The system contains data I do not want to loose, but it is possible for me to do a backup. If you want to check something it is possible to remove some disk, sdd and sde, and insert a new one to install F31.
I normally use the system daily so I do not want to keep it in this state for weeks.

I forget to tell I use btrfs. There are some errors in checking the btrfs partition. I am not using quota, but it is possible to mount the partiton in in the F31 Live Beta

$ uname -a # F31 Live Beta
Linux localhost-live 5.3.0-0.rc6.git0.1.fc31.x86_64 #1 SMP Mon Aug 26 13:01:25 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ sudo lsblk -mf
NAME          SIZE OWNER GROUP MODE       FSTYPE   LABEL          UUID                                 FSAVAIL FSUSE% MOUNTPOINT
loop0         1.8G root  disk  brw-rw---- squashfs                                                           0   100% /run/media/liveuser/disk
loop1         6.5G root  disk  brw-rw---- ext4     Anaconda       ebd0f2ca-a749-4e55-bd1b-5700eccc061a
├─live-rw     6.5G root  disk  brw-rw---- ext4     Anaconda       ebd0f2ca-a749-4e55-bd1b-5700eccc061a    823M    87% /
└─live-base   6.5G root  disk  brw-rw---- ext4     Anaconda       ebd0f2ca-a749-4e55-bd1b-5700eccc061a
loop2          32G root  disk  brw-rw----
└─live-rw     6.5G root  disk  brw-rw---- ext4     Anaconda       ebd0f2ca-a749-4e55-bd1b-5700eccc061a    823M    87% /
sda         111.8G root  disk  brw-rw----
├─sda1        500M root  disk  brw-rw---- ext4                    ac517bba-03bb-44c8-bac4-bb92ab386826
├─sda2        5.9G root  disk  brw-rw---- swap                    35d54b3c-2b7a-48ab-93c2-f08d0c0de171                [SWAP]
└─sda3      105.4G root  disk  brw-rw---- btrfs    fedora_mario00 cdb00230-d06c-4cf8-b11a-e8b1abf13500
sdb         124.4M root  disk  brw-rw----
├─sdb3       1008K root  disk  brw-rw----
└─sdb4          2M root  disk  brw-rw----
sdc         465.8G root  disk  brw-rw----
└─sdc1      465.8G root  disk  brw-rw---- btrfs    3TB some       e1e3ef48-7d30-4e84-be3c-46999e59ffd5
sdd           3.7T root  disk  brw-rw---- btrfs    RAID1_mario    b711b27c-367f-4c65-80aa-359e73dda139
sde           3.7T root  disk  brw-rw---- btrfs    RAID1_mario    b711b27c-367f-4c65-80aa-359e73dda139
sr0          1024M root  cdrom brw-rw----


$ sudo e2fsck /dev/sda1
e2fsck 1.45.3 (14-Jul-2019)
/dev/sda1: recovering journal
/dev/sda1: clean, 436/128016 files, 264630/512000 blocks


$ sudo btrfsck /dev/sda3
Opening filesystem to check...
Checking filesystem on /dev/sda3
UUID: cdb00230-d06c-4cf8-b11a-e8b1abf13500
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups
ERROR: out of memory
ERROR: Loading qgroups from disk: -2
ERROR: failed to check quota groups
found 107167342592 bytes used, error(s) found
total csum bytes: 102891816
total tree bytes: 1511374848
total fs tree bytes: 1285160960
total extent tree bytes: 85508096
btree space waste bytes: 257524826
file data blocks allocated: 166745583616
 referenced 107213344768
extent buffer leak: start 162267136 len 16384
extent buffer leak: start 37339136 len 16384

Comment 13 tomasy 2020-03-20 09:58:05 UTC

Created attachment 1671823 [details]
Boot waits in resuce mode

Comment 14 Steve 2020-03-20 14:59:15 UTC

(In reply to tomasy from comment #12)
> I boot now from F31 Live Beta (via ipxe)

Please use F31 Live FINAL for any further recovery work:

Fedora-Workstation-Live-x86_64-31-1.9.iso
https://getfedora.org/en/workstation/download/

> The system contains data I do not want to loose, but it is possible for me to do a backup.

Please backup your important data, because you may need to reinstall F31.

> If you want to check something it is possible to remove some disk, sdd and sde, and insert a new one to install F31.

That probably won't be necessary, if you can boot from F31 Live, backup your data, reinstall F31, and restore your data.

> I normally use the system daily so I do not want to keep it in this state for weeks.

For testing purposes, all that is needed is a log file, probably a journal from /var/log/journal/ on sda3. After mounting sda3 readonly, what you do show for this:

$ ls -l /MOUNTPOINT/var/log/journal/*/system.journal

See the "journalctl" option "--root=ROOT" in "man journalctl" for how to read a journal in a non-standard location.

> I forget to tell I use btrfs. There are some errors in checking the btrfs partition.
> I am not using quota, but it is possible to mount the partiton in in the F31 Live Beta
...

If there was file system corruption during the upgrade, the best thing to do is backup your data, reinstall F31, and restore your data.

I would suggest backing up /var/log/ and /etc/ in addition to any other data you want to backup.

Comment 15 Steve 2020-03-20 15:23:11 UTC

sda         111.8G root  disk  brw-rw----
├─sda1        500M root  disk  brw-rw---- ext4                    ac517bba-03bb-44c8-bac4-bb92ab386826
├─sda2        5.9G root  disk  brw-rw---- swap                    35d54b3c-2b7a-48ab-93c2-f08d0c0de171                [SWAP]
└─sda3      105.4G root  disk  brw-rw---- btrfs    fedora_mario00 cdb00230-d06c-4cf8-b11a-e8b1abf13500

If sda3 is your "/" partition, I suggest that you repartition so that "/home" is in a separate partition. That way you can do clean installs without losing your important data. And you could probably use a smaller swap partition -- the installer default is merely advisory.

Comment 16 Steve 2020-03-20 16:30:41 UTC

(In reply to Steve from comment #14)
...
> See the "journalctl" option "--root=ROOT" in "man journalctl" for how to read a journal in a non-standard location.
...

I couldn't get that to work in a VM, but this seems to work:

Boot F31 Live and open a terminal window.
# cd /mnt
# mkdir root
# mount -r /dev/sda3 root
# journalctl -b    --no-hostname --file=/mnt/root/var/log/journal/*/* > /tmp/dmesg-1.txt
# journalctl -b -1 --no-hostname --file=/mnt/root/var/log/journal/*/* > /tmp/dmesg-2.txt

Attach dmesg-1.txt and dmesg-2.txt.

If there are any error messages, please post them.

Comment 17 Steve 2020-03-20 16:43:19 UTC

Could you also check sda for any SMART errors:

# smartctl -H /dev/sda

Comment 18 Steve 2020-03-20 17:12:51 UTC

This works:

$ journalctl -b    --no-hostname --root=/mnt/root > /tmp/dmesg-1.txt
$ journalctl -b -1 --no-hostname --root=/mnt/root > /tmp/dmesg-2.txt

(I was incorrectly using the path to the journal directory.)

Tested in a VM with F31 Live.

Comment 19 Steve 2020-03-20 18:11:35 UTC

(In reply to Steve from comment #17)
> Could you also check sda for any SMART errors:
> 
> # smartctl -H /dev/sda

That's not in the F31 live image:

# smartctl -H /dev/vda3
bash: smartctl: command not found...

# dnf install smartmontools -y

Comment 20 tomasy 2020-03-20 19:36:05 UTC

Hi Steve, I have the output from the bootlog it is about 45 MB so it is not possible for me to check if there are some private data. Can I send it to you instead of making it public?

Boot from F31 Live 
>  journalctl -b -1 --no-hostname --file=/mnt/root/var/log/journal/*/* > /tmp/dmesg-2.txt
Output: "Specifying boot ID or boot offset has no effect, no persistent journal was found."

My /var is a mount from /dev/sdc1

$ sudo smartctl -H /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.3.7-301.fc31.x86_64] (local build)

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

$ sudo smartctl -H /dev/sdc
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.3.7-301.fc31.x86_64] (local build)

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

Comment 21 Steve 2020-03-20 20:40:10 UTC

(In reply to tomasy from comment #20)
> Hi Steve, I have the output from the bootlog it is about 45 MB so it is not
> possible for me to check if there are some private data. Can I send it to
> you instead of making it public?

That seems very large, but maybe it is filled with error messages. There is no consistency as to what to look for, but here is a command to start to investigate:

$ cat dmesg-1.txt | egrep -i 'error|fail|dnf' | less

Do not post any of the output. Just see if you can figure out what is in there. I frequently do this when looking at log files:

$ less -N dmesg-1.txt

Try adding the "-k" option to make the file smaller:

$ journalctl -k -b    --no-hostname --root=/mnt/root > /tmp/dmesg-3.txt
$ journalctl -k -b -1 --no-hostname --root=/mnt/root > /tmp/dmesg-4.txt

> Boot from F31 Live 
> >  journalctl -b -1 --no-hostname --file=/mnt/root/var/log/journal/*/* > /tmp/dmesg-2.txt
> Output: "Specifying boot ID or boot offset has no effect, no persistent journal was found."

OK. I saw that while testing. Try the alternative "--root" option, as above.

> My /var is a mount from /dev/sdc1

Thanks for pointing that out. Could you post the output from this:

$ mount -t btrfs

> $ sudo smartctl -H /dev/sda
...
> SMART overall-health self-assessment test result: PASSED
> 
> $ sudo smartctl -H /dev/sdc
...
> SMART overall-health self-assessment test result: PASSED

That's good.

Comment 22 Steve 2020-03-20 21:07:03 UTC

(In reply to tomasy from comment #20)
> ... I have the output from the bootlog it is about 45 MB ...

On second thought, that sounds like you got the full log instead of the log for one boot. The "-b" option extracts the log for the current boot. What does this show:

$ fgrep 'Command line:' dmesg-1.txt | wc -l

Comment 23 Steve 2020-03-20 21:41:53 UTC

(In reply to tomasy from comment #20)
...
> My /var is a mount from /dev/sdc1
...

I haven't completely tested this, but this might work:

# cd /mnt
# mkdir var
# mount -r /dev/sdc1 var

And then use the "-D" option:

$ journalctl -k -b    --no-hostname -D /mnt/var/log/journal/* > /tmp/dmesg-3.txt
$ journalctl -k -b -1 --no-hostname -D /mnt/var/log/journal/* > /tmp/dmesg-4.txt

Comment 24 tomasy 2020-03-20 22:30:28 UTC

Created attachment 1672108 [details]
cat dmesg-1.txt | egrep -i 'error|fail' > dmesg-1-error_fail.txt

Comment 25 tomasy 2020-03-20 22:32:32 UTC

(In reply to Steve from comment #22)
> (In reply to tomasy from comment #20)
> > ... I have the output from the bootlog it is about 45 MB ...
> 
> On second thought, that sounds like you got the full log instead of the log
> for one boot. The "-b" option extracts the log for the current boot. What
> does this show:
> 
> $ fgrep 'Command line:' dmesg-1.txt | wc -l
It is just one boot. Upgrade of 5700 packages
# fgrep 'Command line:' dmesg-1.txt | wc -l
1

Comment 26 tomasy 2020-03-20 22:35:30 UTC

FYI. I did a boot with F32 Live Beta and it was OK to boot and no problems to mount the drives

Comment 27 tomasy 2020-03-20 22:39:15 UTC

About boot with Kernel Panic. The Kernel Panic happens immediately before any log is written to disk

Comment 28 Steve 2020-03-20 22:52:14 UTC

(In reply to tomasy from comment #26)
> FYI. I did a boot with F32 Live Beta and it was OK to boot and no problems to mount the drives

Thanks for checking that. This is the problem (from the attached dmesg-1-error_fail.txt):

Mar 18 11:35:20 dnf[665]: Error: Transaction failed

Can you post some more context for that:

$ fgrep -B 30 -n 'Transaction failed' dmesg-1.txt

"30" is an arbitrary number of lines before the matching text. Feel free to experiment. :-)

Running out of disk space is a possible reason from the upgrade failure. The dmesg-1.txt log might have more details about the reason for the transaction failure.

(In reply to tomasy from comment #27)
> About boot with Kernel Panic. The Kernel Panic happens immediately before any log is written to disk

If this were a kernel bug, it would still be useful to have information about the hardware (CPU, memory, peripherals, etc.), which is why I was trying to recover some logs.

Comment 29 Steve 2020-03-20 22:54:44 UTC

(In reply to tomasy from comment #25)
...
> It is just one boot. Upgrade of 5700 packages
> # fgrep 'Command line:' dmesg-1.txt | wc -l
> 1

Thanks for checking that.

Comment 30 Steve 2020-03-20 23:10:18 UTC

(In reply to Steve from comment #28)
...
> Running out of disk space is a possible reason from the upgrade failure.
...

Or running out of memory:

$ cat dmesg-1.txt | egrep -i 'memory|oom|killed process' | less

"oom" means "out of memory", and the Linux kernel implements an "oom killer", which means that the kernel starts killing processes when the system runs out of memory.

The last text pattern is from here:

Finding which process was killed by Linux OOM killer
https://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer

Comment 31 Steve 2020-03-21 02:52:02 UTC

There are dnf logs in /var/log/. This finds the dnf logs with "system-upgrade" commands:

# grep 'Extra commands.*system-upgrade' /var/log/dnf.log*

Comment 32 Steve 2020-03-21 06:37:07 UTC

(In reply to tomasy from comment #13)
> Created attachment 1671823 [details]
> Boot waits in resuce mode

The error messages in that screenshot are from systemd. That means that the rescue kernel booted successfully. The problem appears to be that most of the systemd units cannot be started. Indeed, systemd tries to start D-Bus five times.

Comment 33 tomasy 2020-03-21 07:01:15 UTC

(In reply to Steve from comment #28)
>...
> Mar 18 11:35:20 dnf[665]: Error: Transaction failed
> 
> Can you post some more context for that:
> 
> $ fgrep -B 30 -n 'Transaction failed' dmesg-1.txt
> 
]# fgrep -B 10 -n 'Transaction failed' dmesg-1.txt
292294-Mar 18 11:35:20 dnf[665]:   python2-zope-event-4.2.0-13.fc31.noarch
292295-Mar 18 11:35:20 dnf[665]:   python2-zope-interface-4.6.0-2.fc31.x86_64
292296-Mar 18 11:35:20 dnf[665]: Failed:
292297-Mar 18 11:35:20 dnf[665]:   crun-0.12.2.1-1.fc32.x86_64
292298-Mar 18 11:35:20 dnf[665]:   crun-0.13-1.fc31.x86_64
292299-Mar 18 11:35:20 dnf[665]:   lxc-libs-3.0.4-2.fc31.x86_64
292300-Mar 18 11:35:20 dnf[665]:   lxc-libs-3.2.1-2.fc32.x86_64
292301-Mar 18 11:35:20 dnf[665]:   rtkit-0.11-19.fc29.x86_64
292302-Mar 18 11:35:20 dnf[665]:   runc-2:1.0.0-102.dev.gitdc9208a.fc31.x86_64
292303-Mar 18 11:35:20 dnf[665]:   runc-2:1.0.0-144.dev.gite6555cc.fc32.x86_64
292304:Mar 18 11:35:20 dnf[665]: Error: Transaction failed

And search for crun

# fgrep -B 5 -n 'crun' dmesg-1.txt
4121-Mar 18 07:59:27 dnf[665]: Downgrading:
4122-Mar 18 07:59:27 dnf[665]:  ansible                                       noarch  2.9.5-1.fc32                        fedora                  15 M
4123-Mar 18 07:59:27 dnf[665]:  bsdtar                                        x86_64  3.4.0-2.fc32                        fedora                  65 k
4124-Mar 18 07:59:27 dnf[665]:  buildah                                       x86_64  1.14.0-0.31.dev.git82ff48a.fc32     fedora                 8.6 M
4125-Mar 18 07:59:27 dnf[665]:  conmon                                        x86_64  2:2.0.11-0.6.dev.git86aa80b.fc32    fedora                  40 k
4126:Mar 18 07:59:27 dnf[665]:  crun                                          x86_64  0.12.2.1-1.fc32                     fedora                 166 k
--
268243-Mar 18 09:13:26 kernel: audit: type=1400 audit(1584537206.736:131139): avc:  denied  { mac_admin } for  pid=665 comm="dnf" capability=33  scontext=system_u:system_r:rpm_t:s0 tcontext=system_u:system_r:rpm_t:s0 tclass=capability2 permissive=0
268244-Mar 18 09:13:26 kernel: audit: type=1401 audit(1584537206.736:131139): op=setxattr invalid_context="system_u:object_r:container_runtime_exec_t:s0"
268245-Mar 18 09:13:26 audit[665]: AVC avc:  denied  { mac_admin } for  pid=665 comm="dnf" capability=33  scontext=system_u:system_r:rpm_t:s0 tcontext=system_u:system_r:rpm_t:s0 tclass=capability2 permissive=0
268246-Mar 18 09:13:26 audit: SELINUX_ERR op=setxattr invalid_context="system_u:object_r:container_runtime_exec_t:s0"
268247-Mar 18 09:13:26 dnf[665]:   Upgrading        : librtmp-2.4-17.20190330.gitc5f04a5.fc32.x86_64   1283/5740
268248:Mar 18 09:13:26 dnf[665]:   Downgrading      : crun-0.12.2.1-1.fc32.x86_64                      1284/5740
268249:Mar 18 09:13:26 dnf[665]: error: lsetfilecon: (/usr/bin/crun;5e721555, system_u:object_r:container_runtime_exec_t:s0) Invalid argument
268250-Mar 18 09:13:26 dnf[665]: error: Plugin selinux: hook fsm_file_prepare failed
268251:Mar 18 09:13:26 dnf[665]: Error unpacking rpm package crun-0.12.2.1-1.fc32.x86_64
268252-Mar 18 09:13:26 dnf[665]:   Upgrading        : libwinpr-2:2.0.0-56.20200207git245fc60.fc32.x8   1285/5740
268253:Mar 18 09:13:26 dnf[665]: error: unpacking of archive failed on file /usr/bin/crun;5e721555: cpio: (error 0x2)
268254:Mar 18 09:13:26 dnf[665]: error: crun-0.12.2.1-1.fc32.x86_64: install failed
--
276982-Mar 18 10:32:27 dnf[665]:   Cleanup          : kf5-kiconthemes-5.67.0-1.fc31.x86_64             4118/5740
276983-Mar 18 10:32:29 dnf[665]:   Cleanup          : kf5-kglobalaccel-libs-5.67.0-1.fc31.x86_64       4119/5740
276984-Mar 18 10:32:30 dnf[665]:   Cleanup          : kf5-ktextwidgets-5.67.0-1.fc31.x86_64            4120/5740
276985-Mar 18 10:32:32 dnf[665]:   Cleanup          : kf5-kconfigwidgets-5.67.0-1.fc31.x86_64          4121/5740
276986-Mar 18 10:32:32 dnf[665]:   Cleanup          : pulseaudio-libs-glib2-13.0-2.fc31.x86_64         4122/5740
276987:Mar 18 10:32:32 dnf[665]: error: crun-0.13-1.fc31.x86_64: erase skipped
--
283619-Mar 18 11:14:30 dnf[665]:   Verifying        : bsdtar-3.4.2-1.fc31.x86_64                          6/5740
283620-Mar 18 11:14:30 dnf[665]:   Verifying        : buildah-1.14.0-0.31.dev.git82ff48a.fc32.x86_64      7/5740
283621-Mar 18 11:14:30 dnf[665]:   Verifying        : buildah-1.14.2-1.fc31.x86_64                        8/5740
283622-Mar 18 11:14:31 dnf[665]:   Verifying        : conmon-2:2.0.11-0.6.dev.git86aa80b.fc32.x86_64      9/5740
283623-Mar 18 11:14:31 dnf[665]:   Verifying        : conmon-2:2.0.11-1.fc31.x86_64                      10/5740
283624:Mar 18 11:14:31 dnf[665]:   Verifying        : crun-0.12.2.1-1.fc32.x86_64                        11/5740
283625:Mar 18 11:14:31 dnf[665]:   Verifying        : crun-0.13-1.fc31.x86_64                            12/5740
--
292292-Mar 18 11:35:20 dnf[665]:   python2-urllib3-1.25.7-1.fc31.noarch
292293-Mar 18 11:35:20 dnf[665]:   python2-webencodings-0.5.1-8.fc31.noarch
292294-Mar 18 11:35:20 dnf[665]:   python2-zope-event-4.2.0-13.fc31.noarch
292295-Mar 18 11:35:20 dnf[665]:   python2-zope-interface-4.6.0-2.fc31.x86_64
292296-Mar 18 11:35:20 dnf[665]: Failed:
292297:Mar 18 11:35:20 dnf[665]:   crun-0.12.2.1-1.fc32.x86_64
292298:Mar 18 11:35:20 dnf[665]:   crun-0.13-1.fc31.x86_64

and 'runc' 

]# fgrep -B 5 -n 'runc' dmesg-1.txt
3728-Mar 18 07:59:26 dnf[665]:  rubygem-openssl                               x86_64  2.1.2-127.fc32                      fedora                 159 k
3729-Mar 18 07:59:26 dnf[665]:  rubygem-psych                                 x86_64  3.1.0-127.fc32                      fedora                  51 k
3730-Mar 18 07:59:26 dnf[665]:  rubygem-rdoc                                  noarch  6.2.1-127.fc32                      fedora                 405 k
3731-Mar 18 07:59:26 dnf[665]:  rubygems                                      noarch  3.1.2-127.fc32                      fedora                 249 k
3732-Mar 18 07:59:26 dnf[665]:  rubypick                                      noarch  1.1.1-12.fc32                       fedora                 9.8 k
3733:Mar 18 07:59:26 dnf[665]:  runc                                          x86_64  2:1.0.0-144.dev.gite6555cc.fc32     fedora                 2.7 M
--
268346-Mar 18 09:15:29 audit[665]: AVC avc:  denied  { mac_admin } for  pid=665 comm="dnf" capability=33  scontext=system_u:system_r:rpm_t:s0 tcontext=system_u:system_r:rpm_t:s0 tclass=capability2 permissive=0
268347-Mar 18 09:15:29 audit: SELINUX_ERR op=setxattr invalid_context="system_u:object_r:container_runtime_exec_t:s0"
268348-Mar 18 09:15:29 kernel: audit: type=1400 audit(1584537329.078:131140): avc:  denied  { mac_admin } for  pid=665 comm="dnf" capability=33  scontext=system_u:system_r:rpm_t:s0 tcontext=system_u:system_r:rpm_t:s0 tclass=capability2 permissive=0
268349-Mar 18 09:15:29 kernel: audit: type=1401 audit(1584537329.078:131140): op=setxattr invalid_context="system_u:object_r:container_runtime_exec_t:s0"
268350-Mar 18 09:15:29 dnf[665]:   Upgrading        : NetworkManager-team-1:1.22.8-1.fc32.x86_64       1368/5740
268351:Mar 18 09:15:29 dnf[665]:   Upgrading        : runc-2:1.0.0-144.dev.gite6555cc.fc32.x86_64      1369/5740
268352:Mar 18 09:15:29 dnf[665]: error: lsetfilecon: (/usr/bin/runc;5e721555, system_u:object_r:container_runtime_exec_t:s0) Invalid argument
268353-Mar 18 09:15:29 dnf[665]: error: Plugin selinux: hook fsm_file_prepare failed
268354:Mar 18 09:15:29 dnf[665]: Error unpacking rpm package runc-2:1.0.0-144.dev.gite6555cc.fc32.x86_64
268355-Mar 18 09:15:29 dnf[665]:   Upgrading        : device-mapper-event-1.02.167-2.fc32.x86_64       1370/5740
268356:Mar 18 09:15:29 dnf[665]: error: unpacking of archive failed on file /usr/bin/runc;5e721555: cpio: (error 0x2)
268357:Mar 18 09:15:29 dnf[665]: error: runc-2:1.0.0-144.dev.gite6555cc.fc32.x86_64: install failed
--
277261-Mar 18 10:38:17 dnf[665]:   Cleanup          : xdg-dbus-proxy-0.1.2-1.fc31.x86_64               4359/5740
277262-Mar 18 10:38:19 dnf[665]:   Cleanup          : woff2-1.0.2-6.fc31.x86_64                        4360/5740
277263-Mar 18 10:38:20 dnf[665]:   Cleanup          : libwpe-1.4.0-1.fc31.x86_64                       4361/5740
277264-Mar 18 10:38:21 dnf[665]:   Cleanup          : libsigc++20-2.10.2-2.fc31.x86_64                 4362/5740
277265-Mar 18 10:38:21 dnf[665]:   Cleanup          : criu-3.13-5.fc31.x86_64                          4363/5740
277266:Mar 18 10:38:21 dnf[665]: error: runc-2:1.0.0-102.dev.gitdc9208a.fc31.x86_64: erase skipped
--
288592-Mar 18 11:19:19 dnf[665]:   Verifying        : rubygem-rdoc-6.1.2-124.fc31.noarch               4965/5740
288593-Mar 18 11:19:19 dnf[665]:   Verifying        : rubygems-3.1.2-127.fc32.noarch                   4966/5740
288594-Mar 18 11:19:19 dnf[665]:   Verifying        : rubygems-3.0.3-124.fc31.noarch                   4967/5740
288595-Mar 18 11:19:19 dnf[665]:   Verifying        : rubypick-1.1.1-12.fc32.noarch                    4968/5740
288596-Mar 18 11:19:19 dnf[665]:   Verifying        : rubypick-1.1.1-11.fc31.noarch                    4969/5740
288597:Mar 18 11:19:19 dnf[665]:   Verifying        : runc-2:1.0.0-144.dev.gite6555cc.fc32.x86_64      4970/5740
288598:Mar 18 11:19:19 dnf[665]:   Verifying        : runc-2:1.0.0-102.dev.gitdc9208a.fc31.x86_64      4971/5740
--
292297-Mar 18 11:35:20 dnf[665]:   crun-0.12.2.1-1.fc32.x86_64
292298-Mar 18 11:35:20 dnf[665]:   crun-0.13-1.fc31.x86_64
292299-Mar 18 11:35:20 dnf[665]:   lxc-libs-3.0.4-2.fc31.x86_64
292300-Mar 18 11:35:20 dnf[665]:   lxc-libs-3.2.1-2.fc32.x86_64
292301-Mar 18 11:35:20 dnf[665]:   rtkit-0.11-19.fc29.x86_64
292302:Mar 18 11:35:20 dnf[665]:   runc-2:1.0.0-102.dev.gitdc9208a.fc31.x86_64
292303:Mar 18 11:35:20 dnf[665]:   runc-2:1.0.0-144.dev.gite6555cc.fc32.x86_64

> Running out of disk space is a possible reason from the upgrade failure. The
> dmesg-1.txt log might have more details about the reason for the transaction
> failure.
free disk space on / is    5.1 GB
                   /var is 414 GB

> 
> (In reply to tomasy from comment #27)
> > About boot with Kernel Panic. The Kernel Panic happens immediately before any log is written to disk
> 
> If this were a kernel bug, it would still be useful to have information
> about the hardware (CPU, memory, peripherals, etc.), which is why I was
> trying to recover some logs.
I can send you the complete log file by using e.g. Firefox Send,

Comment 34 tomasy 2020-03-21 07:25:09 UTC

Checking for reported errors in journalctl

$ sudo journalctl -b -p err --no-hostname --file=/run/media/liveuser/3TB\ some/@var/log/journal/*/*
-- Logs begin at Sat 2019-10-26 06:35:06 EDT, end at Wed 2020-03-18 11:35:25 EDT. --
Mar 18 09:46:03 root[9309]: pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
Mar 18 09:46:03 root[9314]: pmlogger_check failed - see /var/log/pcp/pmlogger/pmlogger_check.log
Mar 18 11:35:24 systemd[1]: Failed to start System Upgrade using DNF.

## dnf upgrade
Mar 18 11:35:20 dnf[665]:   python2-zope-event-4.2.0-13.fc31.noarch
Mar 18 11:35:20 dnf[665]:   python2-zope-interface-4.6.0-2.fc31.x86_64
Mar 18 11:35:20 dnf[665]: Failed:
Mar 18 11:35:20 dnf[665]:   crun-0.12.2.1-1.fc32.x86_64
Mar 18 11:35:20 dnf[665]:   crun-0.13-1.fc31.x86_64
Mar 18 11:35:20 dnf[665]:   lxc-libs-3.0.4-2.fc31.x86_64
Mar 18 11:35:20 dnf[665]:   lxc-libs-3.2.1-2.fc32.x86_64
Mar 18 11:35:20 dnf[665]:   rtkit-0.11-19.fc29.x86_64
Mar 18 11:35:20 dnf[665]:   runc-2:1.0.0-102.dev.gitdc9208a.fc31.x86_64
Mar 18 11:35:20 dnf[665]:   runc-2:1.0.0-144.dev.gite6555cc.fc32.x86_64
Mar 18 11:35:20 dnf[665]: Error: Transaction failed
Mar 18 11:35:24 systemd[1]: dnf-system-upgrade.service: Main process exited, code=exited, status=1/FAILURE
Mar 18 11:35:24 systemd[1]: dnf-system-upgrade.service: Failed with result 'exit-code'.
Mar 18 11:35:24 systemd[1]: Failed to start System Upgrade using DNF.
Mar 18 11:35:24 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-system-upgrade comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? termin
al=? res=failed'
Mar 18 11:35:24 systemd[1]: dnf-system-upgrade.service: Triggering OnFailure= dependencies.
Mar 18 11:35:24 systemd[1]: dnf-system-upgrade.service: Consumed 49min 9.626s CPU time.
Mar 18 11:35:24 kernel: audit: type=1130 audit(1584545724.970:131193): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dnf-system-upgrade comm="systemd" exe="/usr/lib/systemd/sy
stemd" hostname=? addr=? terminal=? res=failed'
Mar 18 11:35:24 systemd[1]: Reached target Offline System Update.
Mar 18 11:35:24 systemd[1]: Starting System Upgrade using DNF failed...
Mar 18 11:35:24 systemd[1]: Condition check resulted in Remove the Offline System Updates Symlink being skipped.
Mar 18 11:35:24 systemd-logind[737]: System is rebooting.
Mar 18 11:35:25 systemd[1]: systemd-ask-password-plymouth.path: Succeeded.
Mar 18 11:35:25 systemd[1]: Stopped Forward Password Requests to Plymouth Directory Watch.
Mar 18 11:35:25 systemd[1]: mlocate-updatedb.timer: Succeeded.
Mar 18 11:35:25 systemd[1]: Stopped Updates mlocate database every day.

## pmlogger
Mar 18 09:46:03 dnf[665]:   Downgrading      : nodejs-full-i18n-1:12.16.0-1.fc32.x86_64         2355/5740
Mar 18 09:46:03 audit[9177]: AVC avc:  denied  { write } for  pid=9177 comm="restorecon" path="/var/log/pcp/pmlogger/pmlogger_daily-K.log" dev="sdb1" ino=3382994 scontext=system_u:system_r:setfiles_t:s0 tcontext=system_u:object_r:pcp_log_t:s0 tclass=file permissive=0
Mar 18 09:46:03 kernel: audit: type=1400 audit(1584539163.632:131157): avc:  denied  { write } for  pid=9177 comm="restorecon" path="/var/log/pcp/pmlogger/pmlogger_daily-K.log" dev="sdb1" ino=3382994 scontext=system_u:system_r:setfiles_t:s0 tcontext=system_u:object_r:pcp_log_t:s0 tclass=file permissive=0
Mar 18 09:46:03 kernel: audit: type=1400 audit(1584539163.632:131158): avc:  denied  { write } for  pid=9177 comm="restorecon" path="/var/log/pcp/pmlogger/pmlogger_daily-K.log" dev="sdb1" ino=3382994 scontext=system_u:system_r:setfiles_t:s0 tcontext=system_u:object_r:pcp_log_t:s0 tclass=file permissive=0
Mar 18 09:46:03 audit[9177]: AVC avc:  denied  { write } for  pid=9177 comm="restorecon" path="/var/log/pcp/pmlogger/pmlogger_daily-K.log" dev="sdb1" ino=3382994 scontext=system_u:system_r:setfiles_t:s0 tcontext=system_u:object_r:pcp_log_t:s0 tclass=file permissive=0
Mar 18 09:46:03 root[9309]: pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
Mar 18 09:46:03 root[9314]: pmlogger_check failed - see /var/log/pcp/pmlogger/pmlogger_check.log
Mar 18 09:46:05 dnf[665]:   Downgrading      : npm-1:6.13.4-1.12.16.0.1.fc32.x86_64             2356/5740
Mar 18 09:46:06 dnf[665]:   Downgrading      : nodejs-1:12.16.0-1.fc32.x86_64                   2357/5740

Comment 35 tomasy 2020-03-21 11:22:23 UTC

(In reply to Steve from comment #30)
> > Running out of disk space is a possible reason from the upgrade failure.
> ...
> 
> Or running out of memory:
> 
> $ cat dmesg-1.txt | egrep -i 'memory|oom|killed process' | less
> 

I can not see any problem
]# cat dmesg-1.txt | egrep -i 'memory|oom|killed process'
Mar 18 07:57:49 kernel: Early memory node ranges
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0x000a0000-0x000dffff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0x000e0000-0x000fffff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0xbffa0000-0xbffadfff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0xbffae000-0xbffeffff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0xbfff0000-0xbfffffff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0xc0000000-0xfebfffff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0xfec00000-0xfec00fff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0xfec01000-0xfedfffff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0xfee00000-0xfeefffff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0xfef00000-0xffefffff]
Mar 18 07:57:49 kernel: PM: Registered nosave memory: [mem 0xfff00000-0xffffffff]
Mar 18 07:57:49 kernel: Memory: 6036280K/6290680K available (14339K kernel code, 2297K rwdata, 4976K rodata, 2556K init, 4156K bss, 254400K reserved, 0K cma-reserved)
Mar 18 07:57:49 kernel: Freeing SMP alternatives memory: 36K
Mar 18 07:57:49 kernel: x86/mm: Memory block size: 128MB
Mar 18 07:57:49 kernel: Freeing initrd memory: 37288K
Mar 18 07:57:49 kernel: Non-volatile memory driver v1.3
Mar 18 07:57:49 kernel: Freeing unused decrypted memory: 2040K
Mar 18 07:57:49 kernel: Freeing unused kernel image (initmem) memory: 2556K
Mar 18 07:57:49 kernel: Freeing unused kernel image (text/rodata gap) memory: 2044K
Mar 18 07:57:49 kernel: Freeing unused kernel image (rodata/data gap) memory: 1168K
Mar 18 07:57:49 kernel: [TTM] Zone  kernel: Available graphics memory: 3040706 KiB
Mar 18 07:57:49 kernel: [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
Mar 18 07:57:49 kernel: [drm] amdgpu: 2048M of VRAM memory ready
Mar 18 07:57:49 kernel: [drm] amdgpu: 3072M of GTT memory ready.
Mar 18 07:57:49 kernel: [drm] DM_PPLIB:    memory_max_clock: 175000
Mar 18 07:57:49 kernel: [drm] DM_PPLIB: values for Memory clock
Mar 18 07:57:49 kernel: [drm] DM_PPLIB:    memory_max_clock: 175000
Mar 18 07:59:23 dnf[665]:  low-memory-monitor                            x86_64  2.0-4.fc32                          fedora                  34 k
Mar 18 07:59:23 dnf[665]:  foomatic                                      x86_64  4.0.13-10.fc32                      fedora                 237 k
Mar 18 07:59:23 dnf[665]:  foomatic-db                                   noarch  4.0-65.20190128.fc32                fedora                 1.2 M
Mar 18 07:59:23 dnf[665]:  foomatic-db-filesystem                        noarch  4.0-65.20190128.fc32                fedora                 8.3 k
Mar 18 07:59:23 dnf[665]:  foomatic-db-ppds                              noarch  4.0-65.20190128.fc32                fedora                  53 M
Mar 18 08:36:22 dnf[665]:   Upgrading        : foomatic-db-filesystem-4.0-65.20190128.fc32.no     45/5740
Mar 18 09:25:18 dnf[665]:   Installing       : low-memory-monitor-2.0-4.fc32.x86_64             1644/5740
Mar 18 09:25:18 dnf[665]:   Running scriptlet: low-memory-monitor-2.0-4.fc32.x86_64             1644/5740
Mar 18 09:44:04 dnf[665]:   Upgrading        : foomatic-db-ppds-4.0-65.20190128.fc32.noarch     2306/5740
Mar 18 09:44:16 dnf[665]:   Upgrading        : foomatic-db-4.0-65.20190128.fc32.noarch          2307/5740
Mar 18 09:46:35 dnf[665]:   Upgrading        : foomatic-4.0.13-10.fc32.x86_64                   2374/5740
Mar 18 09:46:36 dnf[665]:   Running scriptlet: foomatic-4.0.13-10.fc32.x86_64                   2374/5740
Mar 18 10:08:59 dnf[665]:   Cleanup          : foomatic-4.0.13-9.fc31.x86_64                    3065/5740
Mar 18 10:25:52 dnf[665]:   Cleanup          : foomatic-db-4.0-64.20190128.fc31.noarch          3819/5740
Mar 18 10:25:53 dnf[665]:   Cleanup          : foomatic-db-ppds-4.0-64.20190128.fc31.noarch     3820/5740
Mar 18 10:46:30 dnf[665]:   Cleanup          : foomatic-db-filesystem-4.0-64.20190128.fc31.no   4647/5740
Mar 18 11:14:41 dnf[665]:   Verifying        : low-memory-monitor-2.0-4.fc32.x86_64              201/5740
Mar 18 11:15:30 dnf[665]:   Verifying        : foomatic-4.0.13-10.fc32.x86_64                   1087/5740
Mar 18 11:15:30 dnf[665]:   Verifying        : foomatic-4.0.13-9.fc31.x86_64                    1088/5740
Mar 18 11:15:30 dnf[665]:   Verifying        : foomatic-db-4.0-65.20190128.fc32.noarch          1089/5740
Mar 18 11:15:30 dnf[665]:   Verifying        : foomatic-db-4.0-64.20190128.fc31.noarch          1090/5740
Mar 18 11:15:30 dnf[665]:   Verifying        : foomatic-db-filesystem-4.0-65.20190128.fc32.no   1091/5740
Mar 18 11:15:30 dnf[665]:   Verifying        : foomatic-db-filesystem-4.0-64.20190128.fc31.no   1092/5740
Mar 18 11:15:30 dnf[665]:   Verifying        : foomatic-db-ppds-4.0-65.20190128.fc32.noarch     1093/5740
Mar 18 11:15:30 dnf[665]:   Verifying        : foomatic-db-ppds-4.0-64.20190128.fc31.noarch     1094/5740
Mar 18 11:35:17 dnf[665]:   foomatic-4.0.13-10.fc32.x86_64
Mar 18 11:35:17 dnf[665]:   foomatic-db-4.0-65.20190128.fc32.noarch
Mar 18 11:35:17 dnf[665]:   foomatic-db-filesystem-4.0-65.20190128.fc32.noarch
Mar 18 11:35:17 dnf[665]:   foomatic-db-ppds-4.0-65.20190128.fc32.noarch
Mar 18 11:35:20 dnf[665]:   low-memory-monitor-2.0-4.fc32.x86_64

Comment 36 tomasy 2020-03-21 11:28:46 UTC

(In reply to Steve from comment #31)
> There are dnf logs in /var/log/. This finds the dnf logs with
> "system-upgrade" commands:
> 
> # grep 'Extra commands.*system-upgrade' /var/log/dnf.log*

$ sudo grep 'Extra commands.*system-upgrade' /run/media/liveuser/3TB\ some/@var/log/dnf.log*
/run/media/liveuser/3TB some/@var/log/dnf.log.2:2020-03-18T11:57:10Z DDEBUG Extra commands: ['system-upgrade', 'reboot']
/run/media/liveuser/3TB some/@var/log/dnf.log.2:2020-03-18T11:58:03Z DDEBUG Extra commands: ['system-upgrade', 'upgrade']
/run/media/liveuser/3TB some/@var/log/dnf.log.3:2020-03-18T07:27:07Z DDEBUG Extra commands: ['system-upgrade', 'download', '--releasever=32']
/run/media/liveuser/3TB some/@var/log/dnf.log.3:2020-03-18T08:32:09Z DDEBUG Extra commands: ['system-upgrade', 'download', '--releasever=32']
/run/media/liveuser/3TB some/@var/log/dnf.log.3:2020-03-18T08:34:40Z DDEBUG Extra commands: ['system-upgrade', 'download', '--releasever=32']
/run/media/liveuser/3TB some/@var/log/dnf.log.3:2020-03-18T08:36:04Z DDEBUG Extra commands: ['system-upgrade', 'download', '--releasever=32', '--skip-broken']
/run/media/liveuser/3TB some/@var/log/dnf.log.3:2020-03-18T08:37:58Z DDEBUG Extra commands: ['system-upgrade', 'download', '--releasever=32']
/run/media/liveuser/3TB some/@var/log/dnf.log.3:2020-03-18T09:25:59Z DDEBUG Extra commands: ['system-upgrade', 'download', '--releasever=32']

Comment 37 Steve 2020-03-21 12:27:43 UTC

See if this gives you a smaller file that you can review for sensitive information:

$ sudo journalctl -k --no-hostname --file=/run/media/liveuser/3TB\ some/@var/log/journal/*/* > /tmp/dmesg-2.txt

Replace any strings that you don't want in an attachment with "XXX".

Comment 38 Steve 2020-03-21 13:44:34 UTC

Mar 18 09:46:03 audit[9177]: AVC avc:  denied  { write } for  pid=9177 comm="restorecon" path="/var/log/pcp/pmlogger/pmlogger_daily-K.log" dev="sdb1" ino=3382994 scontext=system_u:system_r:setfiles_t:s0 tcontext=system_u:object_r:pcp_log_t:s0 tclass=file permissive=0

"sdb1" is not in your lsblk output (Comment 12). What does this show:

$ cat /etc/fstab  # (Adjust the path according to how you mount the file system that has "fstab".)

AVCs are error messages from selinux. There are some selinux-related messages in the attached file (dmesg-1-error_fail.txt). For example:

Mar 18 09:07:05 dnf[665]: /usr/sbin/semodule:  Failed!

Let's try to get some context for that message:

$ fgrep -C 20 -m 1 '/usr/sbin/semodule:  Failed!' dmesg-1.txt

"20" is arbitrary.

Comment 39 Steve 2020-03-21 14:34:49 UTC

Comment 36:

$ sudo grep 'Extra commands.*system-upgrade' /run/media/liveuser/3TB\ some/@var/log/dnf.log*
/run/media/liveuser/3TB some/@var/log/dnf.log.2:2020-03-18T11:57:10Z DDEBUG Extra commands: ['system-upgrade', 'reboot']
/run/media/liveuser/3TB some/@var/log/dnf.log.2:2020-03-18T11:58:03Z DDEBUG Extra commands: ['system-upgrade', 'upgrade']
...

Can you review, compress, and attach dnf.log.2.

Compress with "xz":

$ cp -ip /run/media/liveuser/3TB some/@var/log/dnf.log.2 /tmp/dnf.log.2

$ xz /tmp/dnf.log.2

Attach /tmp/dnf.log.2.xz.

CAUTION: The "Files" app mounts file systems read-write.

Comment 40 Steve 2020-03-21 15:06:47 UTC

(In reply to tomasy from comment #33)
...
> ]# fgrep -B 10 -n 'Transaction failed' dmesg-1.txt
...

Nice work investigating that. Those are all packaging problems. So the kernel and your system are off the hook. :-)

It's hard to believe problems with upgrading a few non-essential packages would leave a system unbootable.

Comment 41 Steve 2020-03-21 15:11:24 UTC

(In reply to Steve from comment #40)
...
> It's hard to believe problems with upgrading a few non-essential packages would leave a system unbootable.

Could you confirm that your F31 system was fully updated before you started the system upgrade?

Comment 42 Steve 2020-03-21 15:42:04 UTC

Comment 34:

## pmlogger
Mar 18 09:46:03 dnf[665]:   Downgrading      : nodejs-full-i18n-1:12.16.0-1.fc32.x86_64         2355/5740
... AVCs snipped ...
Mar 18 09:46:03 root[9309]: pmlogger_daily failed - see /var/log/pcp/pmlogger/pmlogger_daily-K.log
Mar 18 09:46:03 root[9314]: pmlogger_check failed - see /var/log/pcp/pmlogger/pmlogger_check.log
Mar 18 09:46:05 dnf[665]:   Downgrading      : npm-1:6.13.4-1.12.16.0.1.fc32.x86_64             2356/5740
Mar 18 09:46:06 dnf[665]:   Downgrading      : nodejs-1:12.16.0-1.fc32.x86_64                   2357/5740

It looks like pmlogger was running during the upgrade. That scenario probably doesn't get a lot of testing.

The AVCs seem to indicate that the selinux context is wrong or that selinux-policy needs to be changed. There appears to be a bug against selinux-policy for that:

Bug 1778449 - SELinux is preventing restorecon from 'write' accesses on the файл /var/log/pcp/pmlogger/pmlogger_daily-K.log.

Comment 43 Steve 2020-03-21 15:47:21 UTC

(In reply to tomasy from comment #34)
> Checking for reported errors in journalctl
> 
> $ sudo journalctl -b -p err --no-hostname --file=/run/media/liveuser/3TB\ some/@var/log/journal/*/*
                       ^^^^^^

Good idea. I didn't know about the "-p" option, even though it is in the journalctl man page. :-)

Comment 44 tomasy 2020-03-21 17:11:10 UTC

From my point there are 3 issues. (I think this has happen me before but I am not sure.)
1. "Failed to start System Upgrade using DNF" what shall happen in this case?
   Will the system be left in unbootable state if this happen?
2. Shall it be possible to boot with an old kernel if the upgrade failed in 
   the end of an system upgrade using dnf.
3. Why do we get a Kernel Panic?
I sent a download link to my logfile to your mail address

Comment 45 Steve 2020-03-21 17:44:14 UTC

(In reply to tomasy from comment #44)

I suggest changing the bug component to "dnf-plugins-extras", which is for "dnf system-upgrade" bugs.

I also suggest updating the bug summary to:

"boot failure after failed dnf system-upgrade from F31 to F32 Beta (was: Kernel panic during boot on kernel-5.6.0-0.rc5.git0.2.fc32.x86_64)"

> From my point there are 3 issues. (I think this has happen me before but I am not sure.)
> 1. "Failed to start System Upgrade using DNF" what shall happen in this case?
>    Will the system be left in unbootable state if this happen?

This shouldn't happen, but the upgrade was to BETA software, so it hasn't been fully tested. Unfortunately, you "tested" dnf system-upgrade with a prime system, rather than with a test system. The Fedora Project needs to do a better job of warning people of the risks of installing pre-release software on prime systems.

> 2. Shall it be possible to boot with an old kernel if the upgrade failed in the end of an system upgrade using dnf.

It is possible to boot the rescue kernel, but systemd needs to have all its files and executables installed and readable too. See Comment 32.

> 3. Why do we get a Kernel Panic?

Probably because there are missing or unreadable kernel modules. The rescue kernel boots because it has a huge initramfs to start from:

$ ll /boot/initramfs-*

See "man lsinitrd" for more.

> I sent a download link to my logfile to your mail address

It would be better to attach "dnf.log.2" and to save everything in /var/log/ for future reference.

Comment 46 tomasy 2020-03-21 17:53:30 UTC

One more thing. My / is a subvolume in a btrfs disk so my /etc/fstab looks something like
UUID=xxxxx / btrfs subol=root_mario 0 0

and boot

linux ..vmliuz.... root=UUID=xxxx ro rootflags=subol=root_mario

Perhaps not the most common configuration the same for /var is a subvolume on another btrfs disk

Comment 47 tomasy 2020-03-21 17:58:33 UTC

(In reply to Steve from comment #45)
> (In reply to tomasy from comment #44)
... 
> This shouldn't happen, but the upgrade was to BETA software, so it hasn't
> been fully tested. Unfortunately, you "tested" dnf system-upgrade with a
> prime system, rather than with a test system. The Fedora Project needs to do
> a better job of warning people of the risks of installing pre-release
> software on prime systems.

I know I did an update to a Beta. So I know the consequences. Update just one test 
machine to another test will not find all problem. I just want to test.
SO I do not think they must be clearer.

Comment 48 Steve 2020-03-21 18:09:59 UTC

(In reply to Steve from comment #45)
...
> Probably because there are missing or unreadable kernel modules.
...

In the first screenshot (with the panic) there are no modules listed after "Modules linked in:". If you want to investigate, look here:

$ ls /lib/modules/

In an installed system, a diagnostic test would be:

$ rpm -Va kernel\*

Comment 49 Steve 2020-03-21 18:26:02 UTC

(In reply to tomasy from comment #46)
> One more thing. My / is a subvolume in a btrfs disk so my /etc/fstab looks something like
> UUID=xxxxx / btrfs subol=root_mario 0 0
> 
> and boot
> 
> linux ..vmliuz.... root=UUID=xxxx ro rootflags=subol=root_mario
> 
> Perhaps not the most common configuration the same for /var is a subvolume
> on another btrfs disk

Thanks for pointing that out. The log should show any problems with mounting those:

$ fgrep -C 4 -n 'mounted' dmesg-1.txt | less

Systemd creates some units with encoded names:

$ systemctl list-units

lsblk shows a label with a space in it (Comment 12):

└─sdc1      465.8G root  disk  brw-rw---- btrfs    3TB some       e1e3ef48-7d30-4e84-be3c-46999e59ffd5

Perhaps that causes a problem:

$ fgrep -C 4 -n '3TB' dmesg-1.txt | less

Comment 50 Steve 2020-03-21 20:00:39 UTC

(In reply to Steve from comment #48)
... 
> In an installed system, a diagnostic test would be:
> 
> $ rpm -Va kernel\*

You might be able to do that from the dracut shell.

From the grub2 menu select the rescue kernel and append this to the kernel command-line:

rd.break

Boot.

If the dracut command prompt is displayed, you can run a limited set of Linux commands.

This should show that /sysroot has a file system mounted on it:

# mount

# chroot /sysroot
# rpm -Va kernel\*

Exit the chroot shell with:

# exit
# reboot

Documentation: "man dracut.cmdline"

Tested in an F31 VM.

Comment 51 Steve 2020-03-21 20:09:28 UTC

(In reply to Steve from comment #50)
...
> You might be able to do that from the dracut shell.
> 
> From the grub2 menu select the rescue kernel and append this to the kernel command-line:
> 
> rd.break
...

Change "ro" to "rw" and append "rd.break".

Then you can run dnf out of the cache:

# chroot /sysroot
# dnf -C list kernel

NB: There is no networking.

Comment 52 Steve 2020-03-22 02:09:18 UTC

I could not reproduce the dnf system-upgrade failure in a VM: F31 fully updated -> F32.

One difference is that crun was not downgraded as it was in Comment 33. Both were crun-0.13-1.

$ journalctl -b -2 --no-hostname
...
Mar 21 16:47:37 dnf[605]: Transaction Summary
Mar 21 16:47:37 dnf[605]: ========================
Mar 21 16:47:37 dnf[605]: Install      30 Packages
Mar 21 16:47:37 dnf[605]: Upgrade    1596 Packages
Mar 21 16:47:37 dnf[605]: Downgrade    33 Packages
Mar 21 16:47:58 dnf[605]: Total size: 1.5 G
...

Comment 53 Steve 2020-03-22 06:04:58 UTC

It's possible there is a problem with the selinux contexts. It's easy to test. Select the rescue kernel in grub2 and add this to the kernel command-line:

enforcing=0

Boot.

Tested in a VM with:

$ getenforce
Permissive

Documentation:

The kernel’s command-line parameters
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

Comment 54 tomasy 2020-03-22 12:07:26 UTC

(In reply to Steve from comment #49)
> (In reply to tomasy from comment #46)
> > One more thing. My / is a subvolume in a btrfs disk so my /etc/fstab looks something like
> > UUID=xxxxx / btrfs subol=root_mario 0 0
> > 
> > and boot
> > 
> > linux ..vmliuz.... root=UUID=xxxx ro rootflags=subvol=root_mario
> > 
> > Perhaps not the most common configuration the same for /var is a subvolume
> > on another btrfs disk
> 
> Thanks for pointing that out. The log should show any problems with mounting
> those:
> 
But this is in grub and if he kernel can not load "root=UUID=xxx" where shall it log? You said no modules are loaded. Where are they loaded from? From "root" and in that case it can not read because it is a btrfs subvolume.

Comment 55 tomasy 2020-03-22 13:30:33 UTC

(In reply to Steve from comment #53)
> It's possible there is a problem with the selinux contexts. It's easy to
> test. Select the rescue kernel in grub2 and add this to the kernel
> command-line:
> 
> enforcing=0

Rescue mode will boot now when adding enforcing=0. I then relabeled the filesystem but it will not boot in rescue mode without enforcing=0. Kernel is rather old for rescue. it is 4.5.2-301.fc24.x86_64. How do I update kernel for rescue?

Comment 56 Steve 2020-03-22 14:22:08 UTC

(In reply to tomasy from comment #55)
> (In reply to Steve from comment #53)
> > It's possible there is a problem with the selinux contexts. It's easy to
> > test. Select the rescue kernel in grub2 and add this to the kernel
> > command-line:
> > 
> > enforcing=0
> 
> Rescue mode will boot now when adding enforcing=0.

That's very good to hear.

> I then relabeled the filesystem but it will not boot in rescue mode without enforcing=0.

How did you relabel?

# fixfiles onboot # per "man fixfiles"

In "Permissive" mode, "AVC"s are still logged, so after booting:

$ journalctl -b | grep AVC > /tmp/avc-1.txt

Please review and attach. (NB: I don't know how big that file will be.)

> Kernel is rather old for rescue. it is 4.5.2-301.fc24.x86_64.

There is a bug report about that problem:

Bug 1768132 - if new kernel version needs new rescue kernel, say so or make it 

> How do I update kernel for rescue?

I've been wondering about that too. "man mkinitrd" is good place to start.

Comment 57 Steve 2020-03-22 14:47:17 UTC

Could you also verify all installed packages:

# rpm -Va

That will take a while. In my F32 VM, that generates about 50 lines of output.

Comment 58 tomasy 2020-03-22 15:49:02 UTC

(In reply to Steve from comment #56)
> (In reply to tomasy from comment #55)
> 
> How did you relabel?
fixfiles relabel

> In "Permissive" mode, "AVC"s are still logged, so after booting:
> 
> $ journalctl -b | grep AVC > /tmp/avc-1.txt
> 
I will attach the logs, but why the rescue mode is not working is not related to why this bug was created. The bug is why I get a kernel panic after upgrade from F31 to F32 Beta.

Comment 59 tomasy 2020-03-22 15:50:37 UTC

Created attachment 1672313 [details]
journalctl -b | grep AVC > avc-1.txt

Comment 60 Steve 2020-03-22 16:07:58 UTC

(In reply to tomasy from comment #58)
...
> I will attach the logs, but why the rescue mode is not working is not
> related to why this bug was created. The bug is why I get a kernel panic
> after upgrade from F31 to F32 Beta.

Thanks for attaching avc-1.txt. I am not an selinux expert, so let's move on to that problem. What do these show:

$ ls -lF /boot
$ du -s /lib/modules/*

# rpm -Va

Comment 61 Steve 2020-03-22 16:14:40 UTC

(In reply to Steve from comment #60)
...
> $ ls -lF /boot
> $ du -s /lib/modules/*
> 
> # rpm -Va

One more:

$ rpm -qa kernel\* | sort

Comment 62 Steve 2020-03-22 18:09:27 UTC

(In reply to tomasy from comment #54)
...
> But this is in grub and if he kernel can not load "root=UUID=xxx" where
> shall it log? You said no modules are loaded. Where are they loaded from?
> From "root" and in that case it can not read because it is a btrfs subvolume.

Good point. I'm not an expert on how the kernel boots, but the kernel initially loads modules from the initramfs, so that if there is no initramfs or it wasn't built correctly, the kernel couldn't load any modules.

What do these show:

$ ls -lF /boot

# lsinitrd /boot/initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img | fgrep 'usr/lib/modules/' | wc -l

To read btrfs file systems, grub2 probably needs to load the "btrfs" module:

# grep insmod /boot/grub2/grub.cfg
# ls /boot/grub2/i386-pc/btrfs.mod

Comment 63 Steve 2020-03-22 18:37:48 UTC

From Comment 12:

├─sda1        500M root  disk  brw-rw---- ext4                    ac517bba-03bb-44c8-bac4-bb92ab386826

If that is your "/boot" partition, grub2 doesn't need to load the "btrfs" module, just the "ext4" module.

However, the "btrfs" kernel module is not linked with the kernel:

# egrep 'EXT4_FS=|BTRFS_FS=' /boot/config-5.6.0-0.rc5.git0.2.fc32.x86_64 
CONFIG_EXT4_FS=y
CONFIG_BTRFS_FS=m

And it is not in the initramfs in my F32 VM:

# lsinitrd /boot/initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img | grep btrfs

So it has to be loaded from the file system:

# ls /lib/modules/5.6.0-0.rc5.git0.2.fc32.x86_64/kernel/fs/btrfs/
btrfs.ko.xz

Comment 64 Steve 2020-03-22 18:48:41 UTC

(In reply to Steve from comment #63)
...
> And it is not in the initramfs in my F32 VM:
> 
> # lsinitrd /boot/initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img | grep btrfs
...

If that is true for your system, it might be sufficient to reinstall the kernel while booted from the rescue kernel:

# dnf reinstall kernel\*-5.6.0-0.rc5.git0.2.fc32

NB: The "dracut" command creates the initramfs: "man dracut".

The "btrfs" module should be added automatically to the initramfs, but, if not, see "Adding Kernel Modules" in the "dracut" man page.

Comment 65 Steve 2020-03-22 19:10:40 UTC

(In reply to Steve from comment #63)
...
> And it is not in the initramfs in my F32 VM:
> 
> # lsinitrd /boot/initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img | grep btrfs
...

However, the rescue initramfs in my F32 VM, DOES have btrfs:

# lsinitrd /boot/initramfs-0-rescue-4661409848ef4e829dad2d41f100ac09.img | grep btrfs
btrfs
drwxr-xr-x   2 root     root            0 Jul 24  2019 usr/lib/modules/5.3.7-301.fc31.x86_64/kernel/fs/btrfs
-rw-r--r--   1 root     root       528952 Jul 24  2019 usr/lib/modules/5.3.7-301.fc31.x86_64/kernel/fs/btrfs/btrfs.ko.xz
-rw-r--r--   1 root     root          616 Jul 24  2019 usr/lib/udev/rules.d/64-btrfs.rules
-rwxr-xr-x   1 root     root       943456 Jul 24  2019 usr/sbin/btrfs
lrwxrwxrwx   1 root     root            5 Jul 24  2019 usr/sbin/btrfsck -> btrfs
-rwxr-xr-x   1 root     root         1189 Jul 24  2019 usr/sbin/fsck.btrfs

Note that the kernel version is for the F31 install that I started with: 5.3.7-301.fc31.x86_64.

Comment 66 Steve 2020-03-22 20:08:31 UTC

(In reply to Steve from comment #64)
...
> If that is true for your system, it might be sufficient to reinstall the kernel while booted from the rescue kernel:
> 
> # dnf reinstall kernel\*-5.6.0-0.rc5.git0.2.fc32
> 
> NB: The "dracut" command creates the initramfs: "man dracut".
...

Tested in an F32 VM:

Boot from the rescue kernel.

# cd /boot

# mv -i initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img.BAK1
# dracut initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img 5.6.0-0.rc5.git0.2.fc32.x86_64

There is a small size difference:

# ls -l initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img*
-rw-------. 1 root root 31335646 Mar 22 12:59 initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img
-rw-------. 1 root root 31335644 Mar 22 08:31 initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img.BAK1

# reboot

Comment 67 tomasy 2020-03-22 21:48:09 UTC

(In reply to Steve from comment #61)
> (In reply to Steve from comment #60)
> ...
> > $ ls -lF /boot
# ls -lF /boot/
total 216254
-rw-r--r--. 1 root root   216925 20 feb 00.41 config-5.5.5-200.fc31.x86_64
-rw-r--r--. 1 root root   217172  5 mar 22.42 config-5.5.8-200.fc31.x86_64
-rw-r--r--. 1 root root   219328 10 mar 20.22 config-5.6.0-0.rc5.git0.2.fc32.x86_64
drwx------. 3 root root     1024 28 jan 18.03 efi/
drwxr-xr-x. 2 root root     3072 18 mar 13.49 extlinux/
drwx------. 6 root root     1024 22 mar 22.35 grub2/
-rw-------. 1 root root 50077408  1 maj  2016 initramfs-0-rescue-caf34df54d3d4538bdd85b6bc8bed915.img
-rw-------. 1 root root 38724089 18 mar 14.52 initramfs-5.5.5-200.fc31.x86_64.img
-rw-------. 1 root root 38723461 18 mar 14.53 initramfs-5.5.8-200.fc31.x86_64.img
-rw-------. 1 root root 38770567 18 mar 16.13 initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img
-rw-r--r--. 1 root root   560055  1 maj  2016 initrd-plymouth.img
drwxr-xr-x. 3 root root     1024 10 apr  2018 loader/
drwx------. 2 root root    12288  1 maj  2016 lost+found/
-rw-------. 1 root root  5038766 20 feb 00.41 System.map-5.5.5-200.fc31.x86_64
-rw-------. 1 root root  5040726  5 mar 22.42 System.map-5.5.8-200.fc31.x86_64
-rw-------. 1 root root  5053138 10 mar 20.22 System.map-5.6.0-0.rc5.git0.2.fc32.x86_64
-rwxr-xr-x. 1 root root  6268344  1 maj  2016 vmlinuz-0-rescue-caf34df54d3d4538bdd85b6bc8bed915*
-rwxr-xr-x. 1 root root 10834632 20 feb 00.41 vmlinuz-5.5.5-200.fc31.x86_64*
-rwxr-xr-x. 1 root root 10838728  5 mar 22.42 vmlinuz-5.5.8-200.fc31.x86_64*
-rwxr-xr-x. 1 root root 10815592 10 mar 20.22 vmlinuz-5.6.0-0.rc5.git0.2.fc32.x86_64*

> > $ du -s /lib/modules/*
# du -s /lib/modules/*
80	/lib/modules/5.1.11-300.fc30.x86_64
80	/lib/modules/5.1.15-300.fc30.x86_64
80	/lib/modules/5.1.16-300.fc30.x86_64
80	/lib/modules/5.1.17-300.fc30.x86_64
80	/lib/modules/5.1.19-300.fc30.x86_64
80	/lib/modules/5.1.20-300.fc30.x86_64
80	/lib/modules/5.2.14-200.fc30.x86_64
24	/lib/modules/5.2.7-200.fc30.x86_64
80	/lib/modules/5.3.5-300.fc31.x86_64
24	/lib/modules/5.4.7-200.fc31.x86_64
76732	/lib/modules/5.5.5-200.fc31.x86_64
76848	/lib/modules/5.5.8-200.fc31.x86_64
77096	/lib/modules/5.6.0-0.rc5.git0.2.fc32.x86_64

> > # rpm -Va
See attachment
> One more:
> 
> $ rpm -qa kernel\* | sort
# rpm -qa kernel\* | sort

kernel-5.5.5-200.fc31.x86_64
kernel-5.5.8-200.fc31.x86_64
kernel-5.6.0-0.rc5.git0.2.fc32.x86_64
kernel-core-5.5.5-200.fc31.x86_64
kernel-core-5.5.8-200.fc31.x86_64
kernel-core-5.6.0-0.rc5.git0.2.fc32.x86_64
kernel-devel-5.5.5-200.fc31.x86_64
kernel-devel-5.5.8-200.fc31.x86_64
kernel-devel-5.6.0-0.rc5.git0.2.fc32.x86_64
kernel-headers-5.6.0-0.rc5.git0.1.fc32.x86_64
kernel-modules-5.5.5-200.fc31.x86_64
kernel-modules-5.5.8-200.fc31.x86_64
kernel-modules-5.6.0-0.rc5.git0.2.fc32.x86_64
kernel-tools-5.5.0-0.rc6.git0.1.fc32.x86_64
kernel-tools-libs-5.5.0-0.rc6.git0.1.fc32.x86_64

Comment 68 tomasy 2020-03-22 21:50:07 UTC

Created attachment 1672387 [details]
rpm -Va

Comment 69 tomasy 2020-03-22 22:03:15 UTC

(In reply to Steve from comment #62)
> What do these show:
> $ ls -lF /boot
See comment above
> # lsinitrd /boot/initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img | fgrep
> 'usr/lib/modules/' | wc -l
# lsinitrd /boot/initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img | fgrep 'usr/lib/modules/' | wc -l
252

# lsinitrd /boot/initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img | fgrep 'usr/lib/modules/' | grep btrfs
drwxr-xr-x   1 root     root            0 Jan 28 17:17 usr/lib/modules/5.6.0-0.rc5.git0.2.fc32.x86_64/kernel/fs/btrfs
-rw-r--r--   1 root     root       531072 Jan 28 17:17 usr/lib/modules/5.6.0-0.rc5.git0.2.fc32.x86_64/kernel/fs/btrfs/btrfs.ko.xz
> To read btrfs file systems, grub2 probably needs to load the "btrfs" module:
> 
> # grep insmod /boot/grub2/grub.cfg
# grep insmod /boot/grub2/grub.cfg
    insmod all_video
    insmod efi_gop
    insmod efi_uga
    insmod ieee1275_fb
    insmod vbe
    insmod vga
    insmod video_bochs
    insmod video_cirrus
insmod increment
insmod part_msdos
insmod ext2
insmod part_msdos
insmod ext2
insmod blscfg

> # ls /boot/grub2/i386-pc/btrfs.mod
# ls /boot/grub2/i386-pc/btrfs.mod
/boot/grub2/i386-pc/btrfs.mod

Comment 70 tomasy 2020-03-22 22:08:38 UTC

(In reply to Steve from comment #63)
> From Comment 12:
> 
> ├─sda1        500M root  disk  brw-rw---- ext4                   
> ac517bba-03bb-44c8-bac4-bb92ab386826
> 
> If that is your "/boot" partition, grub2 doesn't need to load the "btrfs"
> module, just the "ext4" module.
I do not think btrfs is supported as filesystem for /boot 
> However, the "btrfs" kernel module is not linked with the kernel:
> 
> # egrep 'EXT4_FS=|BTRFS_FS=' /boot/config-5.6.0-0.rc5.git0.2.fc32.x86_64 
> CONFIG_EXT4_FS=y
> CONFIG_BTRFS_FS=m
Same for me 
> And it is not in the initramfs in my F32 VM:
> 
> # lsinitrd /boot/initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img | grep btrfs
But I have
# lsinitrd /boot/initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img | grep btrfs
btrfs
drwxr-xr-x   1 root     root            0 Jan 28 17:17 usr/lib/modules/5.6.0-0.rc5.git0.2.fc32.x86_64/kernel/fs/btrfs
-rw-r--r--   1 root     root       531072 Jan 28 17:17 usr/lib/modules/5.6.0-0.rc5.git0.2.fc32.x86_64/kernel/fs/btrfs/btrfs.ko.xz
-rw-r--r--   1 root     root          616 Jan 28 17:17 usr/lib/udev/rules.d/64-btrfs.rules
-rwxr-xr-x   1 root     root       948896 Jan 28 14:26 usr/sbin/btrfs
lrwxrwxrwx   1 root     root            5 Jan 28 17:17 usr/sbin/btrfsck -> btrfs
-rwxr-xr-x   1 root     root         1189 Jan 28 14:26 usr/sbin/fsck.btrfs

> So it has to be loaded from the file system:
> 
> # ls /lib/modules/5.6.0-0.rc5.git0.2.fc32.x86_64/kernel/fs/btrfs/
> btrfs.ko.xz
It can not load from /lib/modules/... because / is btrfs so it must be loaded before.

Comment 71 Steve 2020-03-22 22:53:30 UTC

(In reply to tomasy from comment #68)
> Created attachment 1672387 [details]
> rpm -Va

Thanks for the additional info and that attachment. All of that looks reasonable. And the initramfs has the btrfs module.

However, I compared the selinux lines from the above attachment with what my F32 VM shows:

# rpm -Va selinux\*
.......T.  c /etc/selinux/targeted/contexts/customizable_types
..5....T.    /var/lib/selinux/targeted/active/commit_num
S.5....T.    /var/lib/selinux/targeted/active/file_contexts
.......T.    /var/lib/selinux/targeted/active/homedir_template
S.5....T.    /var/lib/selinux/targeted/active/policy.kern
.......T.    /var/lib/selinux/targeted/active/seusers
.......T.    /var/lib/selinux/targeted/active/users_extra

The attachment shows two additional entries. They might be local customizations that the setroubleshooter sometimes recommends as a workaround for selinux labeling problems. Do you recall ever creating custom selinux policies? (There might be a way to check in the logs.)

From my F32 VM:

# ls -l /etc/selinux/targeted/contexts/files/file_contexts.local 
-rw-r--r--. 1 root root 0 Mar 19 03:44 /etc/selinux/targeted/contexts/files/file_contexts.local
                        ^
Note size is zero ------^

# ls -l /var/lib/selinux/targeted/active/policy.linked
-rw-------. 1 root root 8268746 Mar 21 18:58 /var/lib/selinux/targeted/active/policy.linked

# rpm -qa selinux\* | sort
selinux-policy-3.14.5-31.fc32.noarch
selinux-policy-targeted-3.14.5-31.fc32.noarch

Comment 72 Steve 2020-03-22 23:36:27 UTC

(In reply to tomasy from comment #70)
...
> > So it has to be loaded from the file system:
> > 
> > # ls /lib/modules/5.6.0-0.rc5.git0.2.fc32.x86_64/kernel/fs/btrfs/
> > btrfs.ko.xz
> It can not load from /lib/modules/... because / is btrfs so it must be loaded before.

The kernel should able to read the btrfs root file system, because it has the module to do so in the initramfs.

See what you get for this after booting from the rescue kernel:

$ journalctl -b --no-hostname | fgrep -C 4 -n '/sysroot' | fgrep -v audit

Comment 73 tomasy 2020-03-22 23:47:30 UTC

(In reply to Steve from comment #71)
> (In reply to tomasy from comment #68)
> > Created attachment 1672387 [details]
> The attachment shows two additional entries. They might be local
> customizations that the setroubleshooter sometimes recommends as a
> workaround for selinux labeling problems. Do you recall ever creating custom
> selinux policies? (There might be a way to check in the logs.)
Yes I have created custom selinux policies

Comment 74 tomasy 2020-03-22 23:48:39 UTC

(In reply to Steve from comment #72)
> (In reply to tomasy from comment #70)
> 
> See what you get for this after booting from the rescue kernel:
> 
> $ journalctl -b --no-hostname | fgrep -C 4 -n '/sysroot' | fgrep -v audit

# journalctl -b --no-hostname | fgrep -C 4 -n '/sysroot' | fgrep -v audit
818-mar 23 00:25:37 systemd[1]: Reached target Basic System.
819-mar 23 00:25:37 systemd[1]: Starting File System Check on /dev/disk/by-uuid/cdb00230-d06c-4cf8-b11a-e8b1abf13500...
820-mar 23 00:25:37 systemd[1]: Started File System Check on /dev/disk/by-uuid/cdb00230-d06c-4cf8-b11a-e8b1abf13500.
822:mar 23 00:25:37 systemd[1]: Mounting /sysroot...
823-mar 23 00:25:37 kernel: BTRFS info (device sda3): disk space caching is enabled
824-mar 23 00:25:37 kernel: BTRFS: has skinny extents
825-mar 23 00:25:37 kernel: BTRFS: detected SSD devices, enabling SSD mode
826:mar 23 00:25:37 systemd[1]: Mounted /sysroot.
827-mar 23 00:25:37 systemd[1]: Reached target Initrd Root File System.
828-mar 23 00:25:37 systemd[1]: Starting Reload Configuration from the Real Root...
829-mar 23 00:25:37 systemd[1]: Reloading.
830-mar 23 00:25:37 kernel: scsi 6:0:0:0: Direct-Access     Generic  USB SD Reader    1.00 PQ: 0 ANSI: 0

Comment 75 tomasy 2020-03-22 23:52:20 UTC

(In reply to Steve from comment #64)
> (In reply to Steve from comment #63)
> ...
> If that is true for your system, it might be sufficient to reinstall the
> kernel while booted from the rescue kernel:
> 
> # dnf reinstall kernel\*-5.6.0-0.rc5.git0.2.fc32
I tried to reinstall the kernel but I receive same Kernel panic error.
When I reinstall the kernel I get

depmod: ERROR: fstatat(4, wireguard.ko.xz): No such file or directory

Even when I removed wireguard. Shall check what cause this.

Comment 76 Steve 2020-03-23 00:18:10 UTC

(In reply to tomasy from comment #75)
...
> I tried to reinstall the kernel but I receive same Kernel panic error.
> When I reinstall the kernel I get
> 
> depmod: ERROR: fstatat(4, wireguard.ko.xz): No such file or directory
> 
> Even when I removed wireguard. Shall check what cause this.

That sounds like a kernel packaging bug, although I didn't see it when I reinstalled the same kernel in my F32 VM.

Can you reinstall an earlier release kernel? This worked in my F32 VM:

# dnf install kernel-5.5.10-200.fc31 --releasever=31

(Actually, I had to first boot from the rescue kernel and remove 5.5.10-200, because it was already installed, but I believe the test is valid.)

Comment 77 Steve 2020-03-23 02:47:44 UTC

(In reply to tomasy from comment #73)
...
> Yes I have created custom selinux policies

Thanks, that would explain the "rpm -Va" report.

(In reply to tomasy from comment #74)
...
> # journalctl -b --no-hostname | fgrep -C 4 -n '/sysroot' | fgrep -v audit
...
> 822:mar 23 00:25:37 systemd[1]: Mounting /sysroot...
> 823-mar 23 00:25:37 kernel: BTRFS info (device sda3): disk space caching is enabled
> 824-mar 23 00:25:37 kernel: BTRFS: has skinny extents
> 825-mar 23 00:25:37 kernel: BTRFS: detected SSD devices, enabling SSD mode
> 826:mar 23 00:25:37 systemd[1]: Mounted /sysroot.
...

Excellent. The /sysroot mount occurs when the kernel is running out of the initramfs. The log shows that systemd starts twice, and the /sysroot mount is before systemd restarts:

$ journalctl -b --no-hostname | egrep -n 'systemd v|/sysroot'
505:Mar 22 17:35:29 systemd[1]: systemd v245.2-1.fc32 running in system mode. [removed]
631:Mar 22 17:35:31 systemd[1]: Mounting /sysroot...
633:Mar 22 17:35:31 systemd[1]: Mounted /sysroot.
784:Mar 22 17:35:34 systemd[1]: systemd v245.2-1.fc32 running in system mode. [removed]

More details:

$ grep ExecStart /usr/lib/systemd/system/initrd-switch-root.service
ExecStart=systemctl --no-block switch-root /sysroot

The "switch-root" section in "man systemctl" has the details.

Comment 78 Steve 2020-03-23 03:45:23 UTC

(In reply to tomasy from comment #59)
> Created attachment 1672313 [details]
> journalctl -b | grep AVC > avc-1.txt

I'm no expert, but those look incorrect. Before asking an selinux maintainer to get involved, I suggest relabeling with a newer kernel, if possible. The 4.5.2-301.fc24.x86_64 kernel (Comment 55) may be doing something different with labels than newer kernels.

(In reply to tomasy from comment #55)
...
> Kernel is rather old for rescue. it is 4.5.2-301.fc24.x86_64. How do I update kernel for rescue?

That turns out to be fairly simple:

# cd  /boot
# mkdir backup1
# mv -i *rescue* backup1/

If you want to save the grub2 ".conf" file, rename it:

# ls /boot/loader/entries/*rescue*.conf

# Now, install or reinstall any kernel you want -- and Presto! It is cloned as a rescue kernel too. :-)

NB: If you want to have more than one rescue kernel, rename the two files (vmlinuz, initramfs) and edit the ".conf" file to match.

Tested in an F32 VM.

Technical details:

$ rpm -ql dracut-config-rescue.x86_64
/etc/kernel/postinst.d/51-dracut-rescue-postinst.sh
/usr/lib/dracut/dracut.conf.d/02-rescue.conf
/usr/lib/kernel/install.d/51-dracut-rescue.install

$ less -N /usr/lib/kernel/install.d/51-dracut-rescue.install
...
     76 case "$COMMAND" in
     77     add)
     78         [[ -f "$LOADER_ENTRY" ]] && [[ -f "$BOOT_DIR_ABS/$KERNEL" ]] \
     79             && [[ -f "$BOOT_DIR_ABS/$INITRD" ]] && exit 0
...

BTW, the long number in the name is:
$ cat /etc/machine-id
$ man machine-id

Comment 79 Steve 2020-03-23 18:13:28 UTC

(In reply to tomasy from comment #75)
...
> > # dnf reinstall kernel\*-5.6.0-0.rc5.git0.2.fc32
> I tried to reinstall the kernel but I receive same Kernel panic error.
> When I reinstall the kernel I get
> 
> depmod: ERROR: fstatat(4, wireguard.ko.xz): No such file or directory
> 
> Even when I removed wireguard. Shall check what cause this.

Actually, it might be sufficient to rebuild the initramfs:

Boot from the rescue kernel.

# cd /boot
# mv -i initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img.BAK1
# dracut initramfs-5.6.0-0.rc5.git0.2.fc32.x86_64.img 5.6.0-0.rc5.git0.2.fc32.x86_64
# reboot

If you want to build a "rescue" initramfs, add these options:

--no-hostonly -a "rescue"

Those options are from the dracut shell script that builds the rescue kernel:
https://git.kernel.org/pub/scm/boot/dracut/dracut.git/tree/51-dracut-rescue-postinst.sh#n55

Documentation:

$ man dracut

Comment 80 tomasy 2020-03-23 19:13:23 UTC

(In reply to Steve from comment #76)
> (In reply to tomasy from comment #75)
...
> Can you reinstall an earlier release kernel? This worked in my F32 VM:
> 
> # dnf install kernel-5.5.10-200.fc31 --releasever=31
> 
This worked. I can boot with this new kernel. But I started to remove 
kernel\*-5.6.0-0.rc5.git0.1.fc32.x86_64
and now I discovered that 
kernel-headers-5.6.0-0.rc5.git0.2.fc32.x86_64 was NOT removed
The are from *git0.1* and *git0.2* Is this correct?

Comment 81 Steve 2020-03-23 20:29:06 UTC

(In reply to tomasy from comment #80)
> (In reply to Steve from comment #76)
> > (In reply to tomasy from comment #75)
> ...
> > Can you reinstall an earlier release kernel? This worked in my F32 VM:
> > 
> > # dnf install kernel-5.5.10-200.fc31 --releasever=31
> > 
> This worked. I can boot with this new kernel.

Hooray! :-)

> But I started to remove kernel\*-5.6.0-0.rc5.git0.1.fc32.x86_64
> and now I discovered that kernel-headers-5.6.0-0.rc5.git0.2.fc32.x86_64 was NOT removed
> The are from *git0.1* and *git0.2* Is this correct?

I've seen that too and had the same reaction. :-)

I believe kernel-headers is pulled in when you add various packages related to development. Comment 67 shows that you have kernel-devel packages installed.

This will show you what is currently in the repos:

$ dnf --refresh repoquery kernel-headers
...
kernel-headers-0:5.6.0-0.rc4.git0.1.fc32.x86_64
kernel-headers-0:5.6.0-0.rc5.git0.1.fc32.x86_64

This will show you packages that pull in kernel-headers:

$ dnf repoquery --whatrequires kernel-headers

And if you want to see what is really happening with updates, check bodhi:
https://bodhi.fedoraproject.org/updates/?packages=kernel-headers

The pre-release repos can sometimes be inconsistent to the point of causing update failures. Further, different mirrors can have slightly different package sets, because they don't all update at the same time. And sometimes packages are actually pushed to the repos and then unpushed because they have a serious problem.

So use the "--refresh" dnf option freely with pre-release repos. 

And the short answer is: no, it is not a problem.

Comment 82 Steve 2020-03-23 20:59:32 UTC

(In reply to tomasy from comment #80)
...
> kernel-headers-5.6.0-0.rc5.git0.2.fc32.x86_64 was NOT removed
> The are from *git0.1* and *git0.2* Is this correct?

This update was edited twice:

kernel-5.6.0-0.rc5.git0.2.fc32 and kernel-headers-5.6.0-0.rc5.git0.1.fc32
                        ^                                          ^
https://bodhi.fedoraproject.org/updates/FEDORA-2020-55b2b79091

The problem appears to have been that there was no "git0.2" build of kernel-headers:
https://koji.fedoraproject.org/koji/packageinfo?packageID=27325

Comment 83 Steve 2020-03-23 21:07:01 UTC

(In reply to tomasy from comment #80)
...
> But I started to remove kernel\*-5.6.0-0.rc5.git0.1.fc32.x86_64
...

Could you preserve those until we have confirmed that the initramfs was the problem, as proposed in Comment 79?

Comment 84 tomasy 2020-03-23 21:16:02 UTC

It seems like installing and building the kernel kernel-5.6.0-0.rc5.git0.2.fc32 causes problem for also other installed kernel.
1. Installed kernel-5.5.10-200.fc31.
2. OK to boot kernel-5.5.10-200.fc31
3. Removed kernel-5.6.0-0.rc5.git0.2.fc32 and installed it again
4. Kernel Panic when booting kernel-5.6.0-0.rc5.git0.2.fc32
5. Kernel Panic when booting kernel-5.5.10-200.fc31. !! But it worked before
6. Removed and installed kernel-5.5.10-200.fc31.
7. OK to boot kernel-5.5.10-200.fc31
8. OK to boot kernel-5.6.0-0.rc5.git0.2.fc32

Seems like installation of kernel-5.6.0-0.rc5.git0.2.fc32 will do something strange that also impact the other kernels possibility to boot (except resuce)

Comment 85 Steve 2020-03-23 21:26:37 UTC

(In reply to tomasy from comment #84)
> It seems like installing and building the kernel
> kernel-5.6.0-0.rc5.git0.2.fc32 causes problem for also other installed
> kernel.
> 1. Installed kernel-5.5.10-200.fc31.
> 2. OK to boot kernel-5.5.10-200.fc31
> 3. Removed kernel-5.6.0-0.rc5.git0.2.fc32 and installed it again
> 4. Kernel Panic when booting kernel-5.6.0-0.rc5.git0.2.fc32
> 5. Kernel Panic when booting kernel-5.5.10-200.fc31. !! But it worked before
> 6. Removed and installed kernel-5.5.10-200.fc31.
> 7. OK to boot kernel-5.5.10-200.fc31
> 8. OK to boot kernel-5.6.0-0.rc5.git0.2.fc32
> 
> Seems like installation of kernel-5.6.0-0.rc5.git0.2.fc32 will do something
> strange that also impact the other kernels possibility to boot (except
> resuce)

That's very strange. Could you run fsck on all your hard drives again?

The only other thing I can think of is that the panic leaves the hardware in an inconsistent state.

Did you try power-cycling instead of just rebooting?

Comment 86 tomasy 2020-03-23 21:42:39 UTC

(In reply to Steve from comment #85)
> (In reply to tomasy from comment #84)
> > It seems like installing and building the kernel
> > kernel-5.6.0-0.rc5.git0.2.fc32 causes problem for also other installed
> > kernel.
> > 1. Installed kernel-5.5.10-200.fc31.
> > 2. OK to boot kernel-5.5.10-200.fc31
> > 3. Removed kernel-5.6.0-0.rc5.git0.2.fc32 and installed it again
> > 4. Kernel Panic when booting kernel-5.6.0-0.rc5.git0.2.fc32
> > 5. Kernel Panic when booting kernel-5.5.10-200.fc31. !! But it worked before
> > 6. Removed and installed kernel-5.5.10-200.fc31.
> > 7. OK to boot kernel-5.5.10-200.fc31
> > 8. OK to boot kernel-5.6.0-0.rc5.git0.2.fc32
> > 
> > Seems like installation of kernel-5.6.0-0.rc5.git0.2.fc32 will do something
> > strange that also impact the other kernels possibility to boot (except
> > resuce)
> 
> That's very strange. Could you run fsck on all your hard drives again?
No problem 
> The only other thing I can think of is that the panic leaves the hardware in
> an inconsistent state.
> 
> Did you try power-cycling instead of just rebooting?
Yes

My theory is that when grub is updated something goes wrong

Comment 87 Steve 2020-03-23 21:58:50 UTC

(In reply to tomasy from comment #86)
...
> > That's very strange. Could you run fsck on all your hard drives again?
> No problem 

OK. That would seem to eliminate the btrfs driver.

> > The only other thing I can think of is that the panic leaves the hardware in
> > an inconsistent state.
> > 
> > Did you try power-cycling instead of just rebooting?
> Yes

OK.

> My theory is that when grub is updated something goes wrong

You could try reinstalling grub2 to your boot drive. This is what I have in my F32 VM:

$ rpm -qa grub2\* | sort
grub2-common-2.04-10.fc32.noarch
grub2-efi-ia32-2.04-10.fc32.x86_64
grub2-efi-ia32-cdboot-2.04-10.fc32.x86_64
grub2-efi-x64-2.04-10.fc32.x86_64
grub2-efi-x64-cdboot-2.04-10.fc32.x86_64
grub2-pc-2.04-10.fc32.x86_64
grub2-pc-modules-2.04-10.fc32.noarch
grub2-tools-2.04-10.fc32.x86_64
grub2-tools-efi-2.04-10.fc32.x86_64
grub2-tools-extra-2.04-10.fc32.x86_64
grub2-tools-minimal-2.04-10.fc32.x86_64

Comment 88 Steve 2020-03-23 22:06:04 UTC

In my F32 VM:

$ mount -t ext4
/dev/vda2 on / type ext4 (rw,relatime,seclabel)
/dev/vda1 on /boot type ext4 (rw,relatime,seclabel)

# grub2-install /dev/vda
Installing for i386-pc platform.
Installation finished. No error reported.

Rebooted fine.

Comment 89 Steve 2020-03-23 22:15:51 UTC

Grub2 doesn't ordinarily write to the boot sector, except as suggested by "man grub2-install":

"... on some platforms [running grub2-install] may also include installing GRUB onto a boot sector."

# dd if=/dev/vda count=1 status=none | hexdump -C | grep -A 2 GRUB
00000170  be 94 7d e8 2e 00 cd 18  eb fe 47 52 55 42 20 00  |..}.......GRUB .|
00000180  47 65 6f 6d 00 48 61 72  64 20 44 69 73 6b 00 52  |Geom.Hard Disk.R|
00000190  65 61 64 00 20 45 72 72  6f 72 0d 0a 00 bb 01 00  |ead. Error......|

Comment 90 Steve 2020-03-23 22:31:57 UTC

Let's go back to looking at log files. Could you do a boot-reboot cycle and then review and attach a log:

$ journalctl -k -b -1 --no-hostname > /tmp/dmesg-1.txt

That's a reasonable size:

$ wc -l /tmp/dmesg-1.txt 
766 /tmp/dmesg-1.txt

The "-k" option should eliminate most messages that have sensitive information.

The log does, however, have the hostname:

$ fgrep 'f31-f32-test' -m 1 /tmp/dmesg-1.txt 
Mar 23 14:52:01 systemd[1]: Set hostname to <f31-f32-test>.

Comment 91 tomasy 2020-03-24 07:29:55 UTC

Created attachment 1672929 [details]
grubenv-bootable

The only difference in /boot is /boot/grub2/grubenv
This file is modified when installing a new kernel.
I will upload 2 grubenv
1. "grubenv-bootable" This grubenv is created when I installed kernel-5.5.10-200.fc31.
   Using this grubenv I can boot all kernels in grub menu including kernel-5.6.0-0.rc5.git0.2.fc32
2. "grubenv-non-bootable". Reinstalled kernel-5.6.0-0.rc5.git0.2.fc32 and 
    now I get Kernel Panic for all installed kernels except rescue
Using the same version of grub 2.04-10.fc32

Comment 92 tomasy 2020-03-24 07:31:53 UTC

Created attachment 1672930 [details]
grubenv-non-bootable

This grubenv will not boot. Instead generate kernel Panic

Comment 93 Steve 2020-03-24 09:02:46 UTC

(In reply to tomasy from comment #91)
> Created attachment 1672929 [details]
> grubenv-bootable
> 
> The only difference in /boot is /boot/grub2/grubenv

Good catch!!!

> This file is modified when installing a new kernel.

It is also touched, and possibly updated, after every boot. In my F32 VM:

# ls -l /boot/grub2/grubenv
lrwxrwxrwx. 1 root root 25 Mar 17 10:10 /boot/grub2/grubenv -> ../efi/EFI/fedora/grubenv

# ls -lL /boot/grub2/grubenv
-rw-------. 1 root root 1024 Mar 23 21:08 /boot/grub2/grubenv
                                 ^^ ^^^^^
# reboot

# ls -lL /boot/grub2/grubenv
-rw-------. 1 root root 1024 Mar 24 01:45 /boot/grub2/grubenv
                                 ^^ ^^^^^

> I will upload 2 grubenv
> 1. "grubenv-bootable" This grubenv is created when I installed
> kernel-5.5.10-200.fc31.
>    Using this grubenv I can boot all kernels in grub menu including
> kernel-5.6.0-0.rc5.git0.2.fc32
> 2. "grubenv-non-bootable". Reinstalled kernel-5.6.0-0.rc5.git0.2.fc32 and 
>     now I get Kernel Panic for all installed kernels except rescue
> Using the same version of grub 2.04-10.fc32

The files have the same length:

$ hexdump -Cv grubenv-bootable | tail -2
000003f0  23 23 23 23 23 23 23 23  23 23 23 23 23 23 23 23  |################|
00000400

$ hexdump -Cv grubenv-non-bootable | tail -2
000003f0  23 23 23 23 23 23 23 23  23 23 23 23 23 23 23 23  |################|
00000400

However, the "saved_entry" string is *longer* on the non-bootable file. That sounds like a buffer-overrun problem:

$ strings grubenv-bootable | grep saved_entry
saved_entry=caf34df54d3d4538bdd85b6bc8bed915-5.5.10-200.fc31.x86_64

$ strings grubenv-bootable | grep saved_entry | wc -c
68

$ strings grubenv-non-bootable | grep saved_entry
saved_entry=caf34df54d3d4538bdd85b6bc8bed915-5.6.0-0.rc5.git0.2.fc32.x86_64

$ strings grubenv-non-bootable | grep saved_entry | wc -c
76

Comment 94 Hans de Goede 2020-03-24 09:07:30 UTC

Note you can check and modify the grubenv with the grub2-editenv command. Note the first argument is the env file itself, you can specify a '-' there to use the default grubenv file-path (yeah confusing since - normally is stdin in most programs), e.g. :

sudo grub2-editenv - list

You can use this to shorten or even unset the saved_entry to test your theory. Note do NOT edit the grubenv manually is is somewhat special (e.g. always padded with '#' to 1024 bytes) and not suitable for manual editing.

Comment 95 Steve 2020-03-24 09:18:01 UTC

(In reply to Hans de Goede from comment #94)
> Note you can check and modify the grubenv with the grub2-editenv command.
> Note the first argument is the env file itself, you can specify a '-' there
> to use the default grubenv file-path (yeah confusing since - normally is
> stdin in most programs), e.g. :
> 
> sudo grub2-editenv - list
> 
> You can use this to shorten or even unset the saved_entry to test your
> theory. Note do NOT edit the grubenv manually is is somewhat special (e.g.
> always padded with '#' to 1024 bytes) and not suitable for manual editing.

Thanks, Hans:

# grub2-editenv - list
saved_entry=4661409848ef4e829dad2d41f100ac09-5.6.0-0.rc5.git0.2.fc32.x86_64
boot_success=1
kernelopts=root=UUID=25e06e62-cb00-4d1f-9e0f-538e5541ca4b ro  
boot_indeterminate=0

# grub2-editenv - set saved_entry=foo

# grub2-editenv - list
saved_entry=foo
boot_success=1
kernelopts=root=UUID=25e06e62-cb00-4d1f-9e0f-538e5541ca4b ro  
boot_indeterminate=0

Comment 96 Steve 2020-03-24 09:36:24 UTC

(In reply to tomasy from comment #92)
> Created attachment 1672930 [details]
> grubenv-non-bootable
> 
> This grubenv will not boot. Instead generate kernel Panic

There doesn't seem to be a way to get a version string from core.img, so what do you get for this:

# ls -l /boot/grub2/i386-pc/core.img

I'm wondering if you need to reinstall grub2, as in Comment 88:

# grub2-install /dev/sda # IIUC, that is your /boot device (per Comment 12)

Comment 97 Steve 2020-03-24 09:53:45 UTC

(In reply to Steve from comment #96)
...
> There doesn't seem to be a way to get a version string from core.img
...

You can get it from the grub2 command-line:

Press "c" at the grub2 menu.
Enter "version".

In my F32 VM, that shows:

GNU GRUB  version 2.04
Platform i386-pc
Compiler version 10.0.1 20200311 (Red Hat 10.0.1-0.9)

$ rpm -q grub2-pc
grub2-pc-2.04-10.fc32.x86_64

Comment 98 Steve 2020-03-24 10:32:59 UTC

I can't reproduce the panic in an F32 VM using:

# grub2-editenv - list | grep saved_entry
saved_entry=123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0

That string is 16*8=128 characters.

Comment 99 Steve 2020-03-24 11:11:20 UTC

(In reply to Steve from comment #97)
...
> Press "c" at the grub2 menu.
> Enter "version".
...

Try also:

# load_env
# list_env

Documentation:

$ info grub2 # 16.3 The list of command-line and menu entry commands
https://www.gnu.org/software/grub/manual/grub/html_node/Command_002dline-and-menu-entry-commands.html

Comment 100 Steve 2020-03-24 11:28:55 UTC

(In reply to Steve from comment #99)
> (In reply to Steve from comment #97)
> ...
> > Press "c" at the grub2 menu.
> > Enter "version".
> ...
> 
> Try also:
> 
> # load_env
> # list_env
...

There appears to be a grub2 bug, because "list_env" lists some environment variables *twice*.

Tested in an F32 VM with grub2-pc-2.04-10.fc32.x86_64 and grub2 installed on /dev/vda.

Comment 101 tomasy 2020-03-24 11:41:42 UTC

I use grub 2.04-10.fc32.
After the kernel is created grub2 is updated by grub2-mkconfig. Is grub2-mkconfig invoked in the same way as before?

Comment 102 Steve 2020-03-24 12:02:48 UTC

(In reply to tomasy from comment #101)
> I use grub 2.04-10.fc32.
> After the kernel is created grub2 is updated by grub2-mkconfig. Is grub2-mkconfig invoked in the same way as before?

Yes, but for this problem you might need to run grub2-install, which writes to the boot sector and creates a new /boot/grub2/i386-pc/core.img.

The grub2 code is in core.img and various modules. I found the version info in version.mod. What do you show for this:

# strings /boot/grub2/i386-pc/version.mod | egrep -A 1 'GNU|Compiler'
GNU GRUB  version %s
2.04
--
Compiler version %s
10.0.1 20200311 (Red Hat 10.0.1-0.9)

The various grub2 image files are documented here:

11 GRUB image files
https://www.gnu.org/software/grub/manual/grub/html_node/Images.html

Comment 103 Steve 2020-03-24 12:35:08 UTC

(In reply to Steve from comment #102)
...
> # strings /boot/grub2/i386-pc/version.mod | egrep -A 1 'GNU|Compiler'
...

I'm sorry, I asked the wrong question. version.mod is included in the grub2-pc-modules package.

core.img is created by grub2-install.

version.mod has the same size, but different dates. That is because it is copied from /usr/lib/grub/i386-pc/version.mod by grub2-install:

# rpm -qlv grub2-pc-modules | grep version
-rw-r--r--    1 root     root                     1884 Mar 17 10:09 /usr/lib/grub/i386-pc/version.mod

# ls -l /boot/grub2/i386-pc/version.mod
-rw-r--r--. 1 root root 1884 Mar 23 15:02 /boot/grub2/i386-pc/version.mod

But core.img is not included in any grub2 package. core.img is created by grub2-install:

# rpm -qalv grub2\* | grep core

# ls -l /boot/grub2/i386-pc/*.img
-rw-r--r--. 1 root root   512 Mar 23 15:02 /boot/grub2/i386-pc/boot.img
-rw-r--r--. 1 root root 30260 Mar 23 15:02 /boot/grub2/i386-pc/core.img

So grub2 files are not handled in the same way as files installed from other RPM packages. They are *copied*.

Comment 104 Steve 2020-03-24 12:38:40 UTC

(In reply to Steve from comment #103)
...
> So grub2 files are not handled in the same way as files installed from other RPM packages. They are *copied*.

For example:

$ rpm -qf /usr/lib/grub/i386-pc/version.mod
grub2-pc-modules-2.04-10.fc32.noarch

# rpm -qf /boot/grub2/i386-pc/version.mod
file /boot/grub2/i386-pc/version.mod is not owned by any package

Comment 105 Steve 2020-03-24 12:46:53 UTC

(In reply to Steve from comment #104)
> (In reply to Steve from comment #103)
> ...
> > So grub2 files are not handled in the same way as files installed from other RPM packages. They are *copied*.
> 
> For example:
> 
> $ rpm -qf /usr/lib/grub/i386-pc/version.mod
> grub2-pc-modules-2.04-10.fc32.noarch
> 
> # rpm -qf /boot/grub2/i386-pc/version.mod
> file /boot/grub2/i386-pc/version.mod is not owned by any package

In other words, a user could run 'rpm -Va grub2\*' and have everything verify perfectly, yet have no idea what grub2 code is actually being executed at boot time!

Comment 106 Hans de Goede 2020-03-24 13:42:13 UTC

(In reply to Steve from comment #105)
> In other words, a user could run 'rpm -Va grub2\*' and have everything
> verify perfectly, yet have no idea what grub2 code is actually being
> executed at boot time!

That is correct, but note that this only is true when using legacy BIOS booting. The grub2-efi-x64 package installs grubx64.efi directly under /boot/efi/EFI/fedora/grubx64.efi which is the location where the UEFI firmware will read it from. So when when using UEFI booting the installed version is the same as the booted version and there is no need to run grub2-install.

Comment 107 Steve 2020-03-24 16:54:57 UTC

(In reply to Hans de Goede from comment #106)
> (In reply to Steve from comment #105)
> > In other words, a user could run 'rpm -Va grub2\*' and have everything
> > verify perfectly, yet have no idea what grub2 code is actually being
> > executed at boot time!
> 
> That is correct, but note that this only is true when using legacy BIOS
> booting. The grub2-efi-x64 package installs grubx64.efi directly under
> /boot/efi/EFI/fedora/grubx64.efi which is the location where the UEFI
> firmware will read it from. So when when using UEFI booting the installed
> version is the same as the booted version and there is no need to run
> grub2-install.

Thanks, Hans. You got me to do some research. :-) In all my VMs running under virt-manager/qemu, the Firmware is "BIOS", and there is no way to change that. However, when creating a new VM, there is a Firmware option to use "UEFI x86_64: /usr/share/edk2/ovmf/OVMF_CODE.fd".

So, for an EFI VM, this package is needed on the host (it was already on mine, but I didn't know about it):

$ rpm -q edk2-ovmf 
edk2-ovmf-20190501stable-4.fc30.noarch

Documentation:

Using UEFI with QEMU
https://docs.fedoraproject.org/en-US/quick-docs/uefi-with-qemu/

Comment 108 Steve 2020-03-24 19:30:17 UTC

I completed a clean install from F32-Beta-Live using default partitioning in an F32-EFI VM.

Note the GPT partition type and the /boot/efi mountpoint:

$ lsblk -o NAME,PTTYPE,TYPE,FSTYPE,FSSIZE,MOUNTPOINT /dev/vda
NAME                              PTTYPE TYPE FSTYPE      FSSIZE MOUNTPOINT
vda                               gpt    disk                    
├─vda1                            gpt    part vfat        598.8M /boot/efi
├─vda2                            gpt    part ext4        975.9M /boot
└─vda3                            gpt    part LVM2_member        
  ├─fedora_localhost--live-root00        lvm  ext4         12.6G /
  └─fedora_localhost--live-swap00        lvm  swap               [SWAP]

$ cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Tue Mar 24 14:59:51 2020
...
#
/dev/mapper/fedora_localhost--live-root00 /                       ext4    defaults        1 1
UUID=be3b153f-87a2-431c-a181-6fb97c7311b5 /boot                   ext4    defaults        1 2
UUID=6F23-C2E8          /boot/efi               vfat    umask=0077,shortname=winnt 0 2
/dev/mapper/fedora_localhost--live-swap00 none                    swap    defaults        0 0

Comment 109 tomasy 2020-03-25 06:46:23 UTC

Hi Steve,
Have you been able to reproduce the problem?
You are testing with gpt but I am not using gpt. 
Is it something I can do?

Comment 110 Steve 2020-03-25 11:22:45 UTC

(In reply to tomasy from comment #109)
> Hi Steve,
> Have you been able to reproduce the problem?

No.

> You are testing with gpt but I am not using gpt.

I had been testing in a VM with "dos" partitioning, but I created a *second* VM to test EFI booting in a VM. The Fedora installer used "gpt" partitioning, because I booted that second VM with EFI.

This is what my VM with "dos" partitioning shows. What do you show for sda?

# fdisk -l /dev/vda | grep Disklabel
Disklabel type: dos

> Is it something I can do?

Probably not without reinstalling.

Were you able to find the grub2 version by either of these methods:

1. Press "c" when the grub2 menu is displayed. Type "version".

2. When booted from a working kernel, enter:

# strings /boot/grub2/i386-pc/version.mod | egrep -A 1 'GNU|Compiler'

Comment 111 tomasy 2020-03-25 12:57:50 UTC

(In reply to Steve from comment #110)
> (In reply to tomasy from comment #109)
...
> This is what my VM with "dos" partitioning shows. What do you show for sda?
> 
> # fdisk -l /dev/vda | grep Disklabel
> Disklabel type: dos

 # fdisk -l /dev/sda | grep Disklabel
Disklabel type: dos

> > Is it something I can do?
...
> # strings /boot/grub2/i386-pc/version.mod | egrep -A 1 'GNU|Compiler'
I do not have /boot/grub2/i386-pc/version.mod

Why are my /boot/grub2/grubenv different? Before grub is using  grubenv there are no obviously problem. grubenv includes information about the kernels in binary format except for resuce kernel and the rescue kernel is not affected so I will guess the problem is there. But this is a guess.
This with gpt I do not related to this bug

Comment 112 Steve 2020-03-25 13:13:04 UTC

(In reply to tomasy from comment #111)
...
>  # fdisk -l /dev/sda | grep Disklabel
> Disklabel type: dos

OK.

> > > Is it something I can do?
> ...
> > # strings /boot/grub2/i386-pc/version.mod | egrep -A 1 'GNU|Compiler'
> I do not have /boot/grub2/i386-pc/version.mod

That could be the problem. Please boot to the grub2 menu, press "c", enter "version", and note the version info:

GNU GRUB  version ...
Compiler version ...

> Why are my /boot/grub2/grubenv different? Before grub is using  grubenv
> there are no obviously problem. grubenv includes information about the
> kernels in binary format except for resuce kernel and the rescue kernel is
> not affected so I will guess the problem is there. But this is a guess.

If you have an old version of grub2, that could explain the problem. Newer kernels could read different sections of memory. (I don't know anything about the grub2-to-kernel handoff.)

> This with gpt I do not related to this bug

It is not related. I apologize for bringing that up, but without a complete log, it is difficult to answer such simple questions as these:

1. What is on your kernel command-line?
2. Are you booting with EFI? (probably not, since you have "dos" partitions)

Comment 113 Steve 2020-03-25 13:53:56 UTC

(In reply to Steve from comment #112)
...
> 1. What is on your kernel command-line?

$ cat /proc/cmdline

> 2. Are you booting with EFI? (probably not, since you have "dos" partitions)

$ journalctl -b --no-hostname | fgrep -A 4 'efi:'

("4" is approximate.)

Comment 114 Steve 2020-03-25 17:56:00 UTC

(In reply to Steve from comment #112)
...
> If you have an old version of grub2, that could explain the problem. Newer kernels could read different sections of memory. (I don't know anything about the grub2-to-kernel handoff.)
...

Koji has grub packages that pre-date grub2. These are the oldest:

https://koji.fedoraproject.org/koji/packageinfo?buildStart=350&packageID=6684&buildOrder=-completion_time&tagOrder=name&tagStart=100#buildlist

Comment 115 Steve 2020-03-25 19:14:30 UTC

version.mod appears to have been added in F29, because it is not in F28:

$ rpm -ql -p grub2-pc-modules-2.02-38.fc28.noarch.rpm | grep version
$

$ rpm -ql -p grub2-pc-modules-2.02-62.fc29.noarch.rpm | grep version
/usr/lib/grub/i386-pc/version.mod

And, yeah, I downloaded a lot of packages from koji to find that out. :-)

Comment 116 Steve 2020-03-25 19:22:04 UTC

Could you post the output from:

$ rpm -qa grub2\* | sort

Comment 117 tomasy 2020-03-25 22:46:12 UTC

(In reply to Steve from comment #112)
 ...
> That could be the problem. Please boot to the grub2 menu, press "c", enter
> "version", and note the version info:
> 
> GNU GRUB  version ...
> Compiler version ...
version not valid command
> 
> 1. What is on your kernel command-line?
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.0-0.rc5.git0.2.fc32.x86_64 root=UUID=cdb00230-d06c-4cf8-b11a-e8b1abf13500 ro rootflags=subvol=root_mario rhgb quiet
> 2. Are you booting with EFI? (probably not, since you have "dos" partitions)
No

# rpm -qa grub2\* | sort
grub2-common-2.04-10.fc32.noarch
grub2-pc-2.04-10.fc32.x86_64
grub2-pc-modules-2.04-10.fc32.noarch
grub2-tools-2.04-10.fc32.x86_64
grub2-tools-efi-2.04-10.fc32.x86_64
grub2-tools-extra-2.04-10.fc32.x86_64
grub2-tools-minimal-2.04-10.fc32.x86_64

And I have /usr/lib/grub/i386-pc/version.mod but not /boot/grub2/i386-pc/version.mod

So the question is when grub2 in /booot/grub2 is updated because the files that exist in both 
/usr/lib/grub/i386-pc/ and /boot/grub2/i386-pc/
have not the same file size. My original installation is from F24 after that I have updated

There is a file /boot/grub2/i386-pc/modinfo.sh it contains
- grub_target_cc_version='gcc (GCC) 6.0.0 20160331 (Red Hat 6.0.0-0.19)
- grub_version="2.02~beta3"
And F24 was released in june 2016

Interesting, grub is never updated.
So I need to figure out how to upgrade grub in /boot and boot-sector

Comment 118 tomasy 2020-03-25 22:51:14 UTC

So this is no a kernel problem, instead it is a grub2 problem?

Comment 119 Steve 2020-03-26 02:17:49 UTC

(In reply to tomasy from comment #118)
> So this is no a kernel problem, instead it is a grub2 problem?

That's hard to say, but the scenario of having an older version of grub2 is probably not uncommon.

Thanks for the info in Comment 117. That's enough info for me to try downgrading grub2 in a VM.

> Interesting, grub is never updated.
> So I need to figure out how to upgrade grub in /boot and boot-sector

That's easy, but try to save a log too:

# grub2-install -v /dev/sda 2>&1 | tee grub2-install-log-1.txt

But before doing that, I suggest downloading Super Grub2 Disk and installing it on a USB flash drive, so you can boot if something goes wrong:

https://www.supergrubdisk.org/super-grub2-disk/

While trying to get EFI configured on my test disk, I repeatedly made my system unbootable. Once, I made my prime system unbootable. With Super Grub2 Disk on a USB flash drive I could almost always boot. And I haven't had to reinstall Fedora yet. :-)

Comment 120 Steve 2020-03-26 03:10:19 UTC

(In reply to Steve from comment #119)
...
> Thanks for the info in Comment 117. That's enough info for me to try downgrading grub2 in a VM.
...

That isn't going to be as simple as I was hoping. When I tried to install these in an F32 VM:

# ls grub2*
grub2-2.02-0.38.fc24.x86_64.rpm  grub2-tools-2.02-0.38.fc24.x86_64.rpm

I ran into a dependency problem:

# dnf install grub2*.rpm
...
  - nothing provides librpm.so.7()(64bit) needed by grub2-tools-1:2.02-0.38.fc24.x86_64
...

Comment 121 Steve 2020-03-26 04:55:27 UTC

(In reply to tomasy from comment #117)
...
> There is a file /boot/grub2/i386-pc/modinfo.sh it contains
> - grub_target_cc_version='gcc (GCC) 6.0.0 20160331 (Red Hat 6.0.0-0.19)
> - grub_version="2.02~beta3"
> And F24 was released in june 2016
...

Do you know if that was a pre-release version of F24 that you installed?

I installed from Fedora-Workstation-Live-x86_64-24-1.2.iso in a VM, and see a slightly newer compiler version:

# egrep 'grub_target_cc_version|grub_version' /boot/grub2/i386-pc/modinfo.sh
grub_target_cc_version='gcc (GCC) 6.1.1 20160510 (Red Hat 6.1.1-2)'
grub_version="2.02~beta3"

$ rpm -qa grub2\* | sort
grub2-2.02-0.34.fc24.x86_64
grub2-efi-2.02-0.34.fc24.x86_64
grub2-tools-2.02-0.34.fc24.x86_64

NB: I didn't do any updates after installing.

Comment 122 Steve 2020-03-26 05:56:59 UTC

I haven't reproduced the panic in a VM using an installed version of F24 with these kernels:

$ ls -1 kernel-core*
kernel-core-5.5.10-200.fc31.x86_64.rpm
kernel-core-5.6.0-0.rc5.git0.2.fc32.x86_64.rpm
kernel-core-5.6.0-0.rc7.git0.2.fc32.x86_64.rpm

And what I believe is an exact copy of the attached grubenv-non-bootable:

# sha256sum grubenv grubenv-non-bootable
65e5b54e165b68bc5c1a89547e508649c1c9c68c69427597b28f6301e8a61d2d  grubenv
65e5b54e165b68bc5c1a89547e508649c1c9c68c69427597b28f6301e8a61d2d  grubenv-non-bootable

# ls -l /boot/grub2/grubenv
lrwxrwxrwx. 1 root root 28 Jun 10  2016 /boot/grub2/grubenv -> /boot/efi/EFI/fedora/grubenv

# ls -lL /boot/grub2/grubenv
-rw-r--r--. 1 root root 1024 Mar 25 22:07 /boot/grub2/grubenv

# ls -LZ /boot/grub2/grubenv
unconfined_u:object_r:boot_t:s0 /boot/grub2/grubenv

Comment 123 Steve 2020-03-26 06:08:26 UTC

(In reply to Steve from comment #122)
...
> # ls -lL /boot/grub2/grubenv
> -rw-r--r--. 1 root root 1024 Mar 25 22:07 /boot/grub2/grubenv
...

One possible difference is that grubenv never gets updated. What do you have for:

$ cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rhgb quiet"
GRUB_DISABLE_RECOVERY="true"

For the record:

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.6.0-0.rc5.git0.2.fc32.x86_64 root=UUID=f5052539-e146-4e23-a43a-6529335c408d ro rhgb quiet LANG=en_US.UTF-8

Comment 124 Steve 2020-03-26 06:20:27 UTC

(In reply to tomasy from comment #117)
...
> 1. What is on your kernel command-line?
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.0-0.rc5.git0.2.fc32.x86_64 ...
           ^^^^^^^^^^^^

That looks like a grub2 device name. Can you remove it and still boot?

... root=UUID=cdb00230-d06c-4cf8-b11a-e8b1abf13500 ro rootflags=subvol=root_mario rhgb quiet

Documentation:

GNU GRUB Manual 2.04
13.1 How to specify devices
https://www.gnu.org/software/grub/manual/grub/html_node/Device-syntax.html

Comment 125 tomasy 2020-03-26 06:30:05 UTC

(In reply to Steve from comment #119)
> (In reply to tomasy from comment #118)
> > So this is no a kernel problem, instead it is a grub2 problem?
> 
> That's hard to say, but the scenario of having an older version of grub2 is
> probably not uncommon.
(In reply to Steve from comment #121)
> (In reply to tomasy from comment #117)
...
> Do you know if that was a pre-release version of F24 that you installed?
Hardly know what I did yesterday so no.

> I installed from Fedora-Workstation-Live-x86_64-24-1.2.iso in a VM, and see
This is a 1.2 so perhaps it exist some before that. Normally I do not use LiveCD when I install

Comment 126 Steve 2020-03-26 06:47:12 UTC

(In reply to tomasy from comment #125)
...
> > Do you know if that was a pre-release version of F24 that you installed?
> Hardly know what I did yesterday so no.

OK.

> > I installed from Fedora-Workstation-Live-x86_64-24-1.2.iso in a VM, and see
> This is a 1.2 so perhaps it exist some before that. Normally I do not use LiveCD when I install

I already had F24 Live on my hard drive, so I used it. The only alternative is the netinst version:
https://archives.fedoraproject.org/pub/archive/fedora/linux/releases/24/Workstation/x86_64/iso/

(In reply to Steve from comment #124)
...
> > 1. What is on your kernel command-line?
> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.0-0.rc5.git0.2.fc32.x86_64 ...
>            ^^^^^^^^^^^^
...

The root device is normally set in grub.cfg, so why is it needed on the kernel command-line?

# grep 'set.root' -m 2 /boot/grub2/grub.cfg
	set root='hd0,msdos1'
	  search --no-floppy --fs-uuid --set=root --hint='hd0,msdos1'  8c5d9853-b693-474a-aab5-dd9614e47d63

Comment 127 Steve 2020-03-26 13:17:31 UTC

(In reply to Steve from comment #124)
> (In reply to tomasy from comment #117)
> ...
> > 1. What is on your kernel command-line?
> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.0-0.rc5.git0.2.fc32.x86_64 ...
>            ^^^^^^^^^^^^
...

That must be something grub2 is adding, because I don't see it in my F24 VM with new kernels (Comment 123 refers to F24), yet I see it on other systems that have newer versions of grub2. Some programs parse /proc/cmdline, so changing the format could cause problems.

And to add to the confusion, the grub2 package version doesn't always match what is in version.mod:

On my F30 bare metal system, the package is "2.02" while version.mod says "2.03":

$ rpm -q grub2-pc-modules
grub2-pc-modules-2.02-88.fc30.noarch

$ strings /usr/lib/grub/i386-pc/version.mod | egrep -A 1 'GNU|Compiler'
GNU GRUB  version %s
2.03
--
Compiler version %s
9.2.1 20190827 (Red Hat 9.2.1-1)

And grub2-pc-modules verifies as correctly installed:

$ rpm -qf /usr/lib/grub/i386-pc/version.mod
grub2-pc-modules-2.02-88.fc30.noarch

$ rpm -V grub2-pc-modules
$

Comment 128 Steve 2020-03-26 14:13:43 UTC

(In reply to Steve from comment #127)
...
> > BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.0-0.rc5.git0.2.fc32.x86_64 ...
> >            ^^^^^^^^^^^^
...
> That must be something grub2 is adding, 
...

I cloned the grub2 git repo, grepped for "BOOT_IMAGE" and, from there, found this commit from 2018:

verifiers: Add possibility to verify kernel and modules command lines
https://git.savannah.gnu.org/cgit/grub.git/commit/?id=4d4a8c96e3593d76fe7b025665ccdecc70a53c1f

Specifically, this patch:
https://git.savannah.gnu.org/cgit/grub.git/diff/grub-core/loader/i386/pc/linux.c?id=4d4a8c96e3593d76fe7b025665ccdecc70a53c1f

NB: I am NOT saying there is anything wrong here, but 2018 is after the release of F24, which was in 2016 when tomasy first installed his system (Comment 117).

However, I am asking whether grub2 device names should ever appear in user space via /proc/cmdline.

Comment 129 Steve 2020-03-26 18:14:42 UTC

(In reply to Steve from comment #119)
...
> But before doing that, I suggest downloading Super Grub2 Disk and installing
> it on a USB flash drive, so you can boot if something goes wrong:
...

You can also make your own grub2 boot images:

$ man grub2-mkrescue
https://www.gnu.org/software/grub/manual/grub/html_node/Invoking-grub_002dmkrescue.html#Invoking-grub_002dmkrescue

Either of those alternatives might be a good way to check that the issue is with grub2 without reinstalling grub2.

Comment 130 tomasy 2020-03-26 22:37:22 UTC

I now installed the new kernel  5.6.0-0.rc7.git0.2.fc32.x86_64 and got the same problem.

This do not seems to be kernel bug. What is best to move this bug to grub or is it better to create a new bug for grub?

Comment 131 Steve 2020-03-26 22:58:22 UTC

(In reply to tomasy from comment #130)
> I now installed the new kernel  5.6.0-0.rc7.git0.2.fc32.x86_64 and got the same problem.

Thanks for your report.

> This do not seems to be kernel bug. What is best to move this bug to grub or is it better to create a new bug for grub?

Open a new bug and include a link to this one. (This one is way too long for maintainers to wade through.)

If the new panic is different in any way please attach a new screenshot.

And please explain that you installed F24 and repeatedly upgraded after that.

Comment 132 Steve 2020-03-27 11:36:34 UTC

(In reply to tomasy from comment #117)
...
> My original installation is from F24 after that I have updated
> 
> There is a file /boot/grub2/i386-pc/modinfo.sh it contains
> - grub_target_cc_version='gcc (GCC) 6.0.0 20160331 (Red Hat 6.0.0-0.19)
> - grub_version="2.02~beta3"
> And F24 was released in june 2016
...

For the record, I found that version by downgrading from the version installed from the F24 Live image. In my F24 VM:

$ rpm -qa grub2\* | sort
grub2-2.02-0.30.fc24.x86_64
grub2-efi-2.02-0.30.fc24.x86_64
grub2-tools-2.02-0.30.fc24.x86_64

$ egrep 'grub_target_cc_version|grub_version' /usr/lib/grub/i386-pc/modinfo.sh
grub_target_cc_version='gcc (GCC) 6.0.0 20160331 (Red Hat 6.0.0-0.19)'
grub_version="2.02~beta3"

After running grub2-install:

$ egrep 'grub_target_cc_version|grub_version' /boot/grub2/i386-pc/modinfo.sh
grub_target_cc_version='gcc (GCC) 6.0.0 20160331 (Red Hat 6.0.0-0.19)'
grub_version="2.02~beta3"

No panics with any kernels, though.

Comment 133 Steve 2020-03-28 00:42:46 UTC

(In reply to tomasy from comment #117)
...
> > 1. What is on your kernel command-line?
> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.6.0-0.rc5.git0.2.fc32.x86_64
> root=UUID=cdb00230-d06c-4cf8-b11a-e8b1abf13500 ro
> rootflags=subvol=root_mario rhgb quiet
...

You might be able to see more output before the panic by removing "rhgb quiet" from the kernel command-line.

The panic screenshot refers to "init", which is "systemd", but it is not clear whether it is the first or the second run of systemd.

If you add this to the kernel command-line, the system will drop into the dracut shell after the first and before the second:

rd.break=cmdline

From the dracut shell, there are a limited set of Linux commands. The "journalctl --no-hostname" command works at that point, but there are no devices that can be mounted, so I'm not sure how to save any output.

You can page up and down with shift-page-up and shift-page-down.*

Enter "reboot" to reboot.

Documentation:

$ man dracut.cmdline

* How do you scroll up/down on the Linux console?
https://stackoverflow.com/questions/15255070/how-do-you-scroll-up-down-on-the-linux-console

Comment 134 Steve 2020-03-28 14:48:13 UTC

(In reply to tomasy from comment #91)
> Created attachment 1672929 [details]
> grubenv-bootable
> 
> The only difference in /boot is /boot/grub2/grubenv
> This file is modified when installing a new kernel.
...

It gets worse. There is a systemd unit that updates grubenv after booting. In my F32 VM:

$ journalctl -b | grep grub
Mar 28 07:23:03 f32-test systemd[1337]: grub-boot-success.service: Succeeded.

The grub-boot-success.service unit invokes grub2-set-bootflag:

$ grep ExecStart /usr/lib/systemd/user/grub-boot-success.service
ExecStart=/usr/sbin/grub2-set-bootflag boot_success

And there is a timer unit that has a 2 minute delay:

$ cat /usr/lib/systemd/user/grub-boot-success.timer
[Unit]
Description=Mark boot as successful after the user session has run 2 minutes
ConditionUser=!@system

[Timer]
OnActiveSec=2min

Indeed, the grub2-tools package installs several systemd units:

$ rpm -ql grub2-tools | grep systemd
/usr/lib/systemd/system/grub-boot-indeterminate.serviceat
/usr/lib/systemd/system/system-update.target.wants
/usr/lib/systemd/system/system-update.target.wants/grub-boot-indeterminate.service
/usr/lib/systemd/user/grub-boot-success.service
/usr/lib/systemd/user/grub-boot-success.timer
/usr/lib/systemd/user/timers.target.wants
/usr/lib/systemd/user/timers.target.wants/grub-boot-success.timer

Comment 135 Steve 2020-03-28 15:19:33 UTC

(In reply to Steve from comment #134)
...
> The grub-boot-success.service unit invokes grub2-set-bootflag:
> 
> $ grep ExecStart /usr/lib/systemd/user/grub-boot-success.service
> ExecStart=/usr/sbin/grub2-set-bootflag boot_success
...

This bug report suggests removing grub2-set-bootflag as a workaround for that bug (renaming would probably be sufficient):

Bug 1764925 (CVE-2019-14865) - CVE-2019-14865 grub2: grub2-set-bootflag utility causes grubenv corruption rendering the system non-bootable

Comment 136 Steve 2020-03-28 15:33:30 UTC

Hans: How do we avoid a race with grub-boot-success.service, with installers, and with users running grub2-editenv?

$ rpm -q --changelog grub2-tools | grep -A3 'Tue Nov 26 2019'
* Tue Nov 26 2019 Javier Martinez Canillas <javierm> - 2.04-4
- grub-set-bootflag: Write new env to tmpfile and then rename (hdegoede)
  Resolves: CVE-2019-14865
  Resolves: rhbz#1776580

Comment 137 tomasy 2020-05-04 20:26:28 UTC

Atter upgrade of grub2 everything works.
I close this bug.

Note You need to log in before you can comment on or make changes to this bug.