Bug 1876904 - Fedora kernel 5.8.4 fails to boot from DASD in a KVM guest.
Summary: Fedora kernel 5.8.4 fails to boot from DASD in a KVM guest.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 32
Hardware: s390x
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ZedoraTracker
TreeView+ depends on / blocked
 
Reported: 2020-09-08 13:20 UTC by IBM Bug Proxy
Modified: 2020-10-19 12:08 UTC (History)
26 users (show)

Fixed In Version: kernel-5.8.15-201.fc32 kernel-5.8.15-301.fc33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-19 12:08:59 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 187976 0 None None None 2020-09-08 13:20:47 UTC

Description IBM Bug Proxy 2020-09-08 13:20:32 UTC

Comment 1 IBM Bug Proxy 2020-09-08 13:20:40 UTC
== Comment: #0 - Viktor Mihajlovski <MIHAJLOV.com> - 2020-09-01 06:51:44 ==
---Problem Description---
Fedora kernel 5.8.4 fails to boot from DASD in a KVM guest.
 
Contact Information = Viktor Mihajlovski <mihajlov.com> 
 
---uname output---
Linux localhost.localdomain 5.8.4-200.fc32.s390x #1 SMP Wed Aug 26 22:12:29 UTC 2020 s390x s390x s390x GNU/Linux
 
Machine Type = 3096-703 
 
---System Hang---
 After an update to kernel 5.8.4 the system fails to detect the filesystems and eventually ends up in the emergency shell. Looking at /dev I don't see any partitions but only the full disk /dev/vda. This also matches the dmesg output, so maybe the partition detection is broken for virtio-attached DASD.
 
---Debugger---
A debugger is not configured
 
---Steps to Reproduce---
 1. Install a Fedora 32 KVM guest on a DASD, e.g. using virt-install
$ virt-install --name s22 --memory 2048 --disk path=/dev/disk/by-path/ccw-0.0.a03f --location https://ftp-stud.hs-esslingen.de/pub/fedora-secondary/releases/32/Everything/s390x/os/

2. Accept all defaults in the text installers, let the installation finish and reboot

3. Login to the system and run dnf update

4. Reboot, this will lead to the hang.

It is possible to recover by selecting the originally installed kernel.
 
Stack trace output:
 no
 
Oops output:
 no
 
System Dump Info:
  The system is not configured to capture a system dump.
 
I haven't tried with the latest upstream kernel, but as Fedora is pretty close to upstream I could imagine that this issue exists there as well.

Comment 2 Dan Horák 2020-09-08 15:00:26 UTC
This sounds familiar to me, perhaps there is a similar report on the enterprise side ...

Comment 3 Dan Horák 2020-09-08 15:24:51 UTC
Viktor, so you have updated from 5.7 kernel to 5.8.4 in the guest, right? And can be reproduced with the network install, because it gets 5.8 kernel from updates during the installation?

Comment 4 Jan Stodola 2020-09-08 22:20:46 UTC
I'm not aware of a similar problem on the enterprise side. Update to a newer RHEL-8.3 kernel (tested with kernel-4.18.0-234.el8) works fine in KVM (no boot problems).
But I'm able to reproduce the problem after updating kernel to 5.8.6-201.fc32 in KVM, and also after re-creating the initrd in no-hostonly mode:

[root@localhost ~]# dracut --no-hostonly /boot/initramfs-5.8.6-201.fc32.s390x.img  5.8.6-201.fc32.s390x -f
dracut: Disabling early microcode, because kernel does not support it. CONFIG_MICROCODE_[AMD|INTEL]!=y
[root@localhost ~]# zipl
..
[root@localhost ~]# reboot
...
[  OK  ] Reached target Basic System.
[    6.411734] virtio_blk virtio1: [vda] 1803060 4096-byte logical blocks (7.39 GB/6.88 GiB)
[    6.411918] vda: detected capacity change from 0 to 7385333760
[    7.204466] alg: No test for crc32be (crc32be-vx)
[    7.653375] virtio_net virtio2 enc1: renamed from eth0
[  200.989524] dracut-initqueue[403]: Warning: dracut-initqueue timeout - starting timeout scripts

But all this testing was done on a RHEL-8 system. We can also try Fedora-Rawhide, which currently uses kernel-5.8.0-1.fc33 even for the installation - I will try tomorrow.

Comment 5 IBM Bug Proxy 2020-09-09 09:01:20 UTC
------- Comment From MIHAJLOV.com 2020-09-09 04:54 EDT-------
(In reply to comment #11)
> Viktor, so you have updated from 5.7 kernel to 5.8.4 in the guest, right?
> And can be reproduced with the network install, because it gets 5.8 kernel
> from updates during the installation?
I did a network install initially from Fedora mirror  https://ftp-stud.hs-esslingen.de/pub/fedora-secondary/releases/32/Everything/s390x/os. After the installation I had the 5.6.6 running, so it doesn't seem to be updated during the installation. The trouble started, when I did a dnf update.

Comment 6 Dan Horák 2020-09-09 09:59:39 UTC
In theory it could be related to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=662155e2898dd1c3915e420378bb6c0826548e70 (which appears in 5.7 first). I think we will need IBM's s390 kernel people to take a look.

Comment 7 IBM Bug Proxy 2020-09-09 11:06:20 UTC
------- Comment From cborntra.com 2020-09-09 06:57 EDT-------
An alternative might be the rework from Christoph Hellwig regarding the removal ioctl_by_bdev that triggered a change in the dasd driver. Stefan haberland did tested that, though. Another thing, it seems that vanilla upstream 5.8 kernel does not have an issue with partition detection on dasd via virtio-blk.

Comment 8 Dan Horák 2020-09-09 11:14:34 UTC
Now I wonder if it could be caused by a different kernel config and/or by a missing module in initrd ...

Comment 9 Jan Stodola 2020-09-09 13:29:53 UTC
Tried installation of compose https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20200902.n.1/compose/Server/s390x/os/ with kernel-5.8.0-1.fc33.s390x.
The installation was successful, vda1 and vdb2 were created, but the installed system didn't boot with the error reported in this bug.
When I restarted the installation using the same disk, only the /dev/vda device was created. Parted can see both partitions:

[anaconda root@fedora ~]# ls /dev/vda*
/dev/vda
[anaconda root@fedora ~]#
[anaconda root@fedora ~]# parted /dev/vda
GNU Parted 3.3
Using /dev/vda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print                                                            
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 7385MB
Sector size (logical/physical): 4096B/4096B
Partition Table: dasd
Disk Flags: 

Number  Start   End     Size    File system  Flags
 1      98.3kB  1074MB  1074MB  xfs
 2      1074MB  7385MB  6311MB               lvm

(parted)

Comment 10 Jan Stodola 2020-09-09 20:54:22 UTC
Also reproduced with 5.9.0-0.rc3.20200902git9c7d619be5a0.1.fc34.s390x

Comment 11 IBM Bug Proxy 2020-10-07 08:41:18 UTC
------- Comment From cborntra.com 2020-10-07 04:33 EDT-------
So it seems to be dependent on the kernel config.  With the fedora32 config I could reproduce this with the upstream kernels. 5.7 is fine, 5.8 is broken.
So I could bisect this to

26d7e28e38206b1b3207af1409eee2269ab36f82 is the first bad commit
commit 26d7e28e38206b1b3207af1409eee2269ab36f82
Author: Stefan Haberland <sth.com>
Date:   Tue May 19 16:22:59 2020 +0200

s390/dasd: remove ioctl_by_bdev calls
The IBM partition parser requires device type specific information only
available to the DASD driver to correctly register partitions. The
current approach of using ioctl_by_bdev with a fake user space pointer
is discouraged.
Fix this by replacing IOCTL calls with direct in-kernel function calls.
Suggested-by: Christoph Hellwig <hch>
Signed-off-by: Stefan Haberland <sth.com>
Reviewed-by: Jan Hoeppner <hoeppner.com>
Reviewed-by: Peter Oberparleiter <oberpar.com>
Reviewed-by: Christoph Hellwig <hch>
Signed-off-by: Jens Axboe <axboe>

MAINTAINERS                     |  1 +
block/partitions/ibm.c          | 24 ++++++++++++++++++------
drivers/s390/block/dasd_ioctl.c | 34 ++++++++++++++++++++++++++++++++++
include/linux/dasd_mod.h        |  9 +++++++++
4 files changed, 62 insertions(+), 6 deletions(-)
create mode 100644 include/linux/dasd_mod.h

With the defconfig the problem is not present. Will try to identify which config option is problematic together with this patch.

Comment 12 IBM Bug Proxy 2020-10-07 09:32:27 UTC
------- Comment From cborntra.com 2020-10-07 05:22 EDT-------
So the problem happens when
CONFIG_DASD=m
and it does not happen with
CONFIG_DASD=y

this is sad, since we only need virtio-blk and the ibm partition code.

Comment 13 IBM Bug Proxy 2020-10-07 15:41:09 UTC
------- Comment From cborntra.com 2020-10-07 11:39 EDT-------
Fix is queued in the linux-block tree for 5.9 and 5.8 stable.

https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=block-5.9&id=7370997d48520ad923e8eb4deb59ebf290396202

Comment 14 IBM Bug Proxy 2020-10-09 15:32:00 UTC
------- Comment From cborntra.com 2020-10-09 03:38 EDT-------
Patch merged upstream
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7370997d48520ad923e8eb4deb59ebf290396202

and queued for 5.8 stable.

Comment 15 Dan Horák 2020-10-09 15:41:10 UTC
Christian, thanks for your work on this issue.

Comment 16 IBM Bug Proxy 2020-10-19 10:51:01 UTC
------- Comment From cborntra.com 2020-10-19 06:41 EDT-------
Dan,

any chance to get this into F33 before the release as well as into F32 updates soon?

Comment 17 Dan Horák 2020-10-19 12:04:13 UTC
If my git queries are correct, then the commit in question is first included (for stable) in 5.8.15 and kernel-5.8.15-301.fc33 is currently the latest in the F-33 nightly composes and thus should be in the GA too. For F-32 the 5.8.15 update has been already pushed out as stable. I think we are looking good.


Note You need to log in before you can comment on or make changes to this bug.