Description of problem: Use of a ECKD DASD as a virtio block device fails when installing RHCOS under a OCP installation on KVM. This path is broken since RHCOS OS tries to partition the disk, but doesn't recognize the dasd partitioning tables. Version-Release number of selected component (if applicable): 4.7.z 4.8.0 How reproducible: Consistently Steps to Reproduce: 1. The following ECKD DASD is formatted, partitioned and XFS file system: [root@m93ocp1 ~]# lsblk -f /dev/dasdm NAME FSTYPE LABEL UUID MOUNTPOINT dasdm └─dasdm1 xfs ce8ce09c-81a4-4054-8f85-f4df0e93b365 2. The domain configuration for the guest will have the following to perform an install: <os> <type arch='s390x' machine='s390-ccw-virtio-rhel8.2.0'>hvm</type> <kernel>/bootkvm/rhcos-48.84.202105021919-0-live-kernel-s390x</kernel> <initrd>/bootkvm/rhcos-48.84.202105021919-0-live-initramfs.s390x.img</initrd> <cmdline>rd.neednet=1 console=ttysclp0 coreos.inst=yes dfltcc=off coreos.inst.install_dev=vda coreos.live.rootfs_url=http://9.12.23.79:8080/CI/rhcos-48.84.202105021919-0//rhcos-48.84.202105021919-0-live-rootfs.s390x.img coreos.inst.ignition_url=http://192.168.79.1:8080/ignition/bootstrap.ign ip=dhcp nameserver=192.168.79.1</cmdline> <boot dev='hd'/> </os> 3. The disk configuration will have the following: <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/dasdm'/> <target dev='vda' bus='virtio'/> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/> </disk> 4. Start the guest 5. Installation will fail: [ 56.790258] coreos-installer-service[1120]: Read disk 3.4 GiB/3.4 GiB (100%) [ 58.563171] coreos-installer-service[1120]: Error: couldn't find any partitions on /dev/vda [ 58.563417] coreos-installer-service[1120]: Resetting partition table [ 58.810296] coreos-installer-service[1120]: Error: install failed [FAILED] Failed to start CoreOS Installer. See 'systemctl status coreos-installer.service' for details. [DEPEND] Dependency failed for CoreOS Installer Target. [DEPEND] Dependency failed for Finalize CoreOS Installer Target. [DEPEND] Dependency failed for Give Login Shell After CoreOS Installer. [DEPEND] Dependency failed for Reboot after CoreOS Installer. 6. During the installation, it sees the block device but tries to run some non-dasd type partitioning which seems to break the dasd formatting: bash-4.4# cat /proc/partitions major minor #blocks name 7 0 8244788 loop0 252 0 47174400 vda 7 1 786328 loop1 bash-4.4# ls /sys/block loop0 loop1 vda bash-4.4# ls /sys/class/block loop0 loop1 vda bash-4.4# fdasd -p /dev/vda reading volume label ..: Cannot show requested information because the disk label block is invalid exiting... Actual results: Fails and enters Emergency Mode. Expected results: Ideally to present the full DASD block device, as we do with a qcow2 image path. Additional info: This happens using RHCOS builds 47.83 and 48.83.
Can you elaborate on the use case for this? If you're running inside a KVM VM, why is it important to use DASD-style formatting during install?
I agree with @bgilbert's point. Also it will fail because you have specified the disk as vda in the kernel cmdline and not the dasd disk.
This is being driven by a customer. More details can be provided by @Holger.Wolf, @alklein Instead of using qcow images, which is supported today for the OCP clusters on KVM, they would like to use virtual block devices. KVM supports virtual block devices, and should be able to provide any block device such as /dev/dasda to the guest as /dev/vda. So they have been trying to use virtual block devices and have been struggling with the setup. We've also tried variations of the example above, such as specifying vda1 directly; coreos.inst.install_dev=vda1, and that failed. We even tried presenting the dasd partition as a base block device; <source dev='/dev/dasdb1'/>, and that also failed.
Hi Prashanth, For KVM, we've primarily used vda in the kernel cmdline even for the qemu image. We can use dasd or some variation of that here for the block device? Can you provide an example? Thanks!
If the DASD is being presented inside the guest as a non-DASD disk at /dev/vda, then coreos-installer will install with GPT partitioning. Thus, it's expected that fdasd will not find a volume label afterward. Installing directly to a partition is not expected to work; coreos-installer needs a whole-disk device. The unexpected part is that the GPT is apparently not being recognized by the kernel after the image is copied. Could you post complete logs from the install run?
Created attachment 1783429 [details] This is the console output from the RHCOS installation using bootstrap-0 as a test
Okay, thanks. It looks like coreos-installer is indeed installing the GPT image with the correct sector size, but lsblk isn't recognizing its partitions afterward.
Hints to detect underlying dasd-eckd from virtio-block: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1893775 (possible similar issue with ubuntu installer).
Hi all, for me usage of DASD with Kvm looks strange, as i know DASD is always getting configured during kernel boot: [ 3.267444] dasd-eckd 0.0.6609: A channel path to the device has become operational [ 3.276352] dasd-eckd 0.0.6609: New DASD 3390/0C (CU 3990/01) with 30051 cylinders, 15 heads, 224 sectors [ 3.277415] dasd-eckd 0.0.6609: DASD with 4 KB/block, 21636720 KB total size, 48 KB/track, compatible disk layout [ 3.278619] dasda:VOL1/ 0X6609: dasda1 dasda2 And here this step is missing, so no wonder vda device wasn't detected/used as DASD device. I'll ask people around.
The kernel, fdasd and parted DO detect a DASD passed thru via virtio-blk and this is also an important use case, e.g. for migration with shared access to the storage server. Anaconda (the RHEL installer) takes care of DASDs as well. What tool is used to partition the disk in coreos? Can this be fixed as well?
One can for instance use parted print to detect a DASD-backed virtio disk. E.g., on a RHEL 8.4 KVM guest [root@localhost ~]# parted /dev/vda print Model: Virtio Block Device (virtblk) Disk /dev/vda: 22.2GB Sector size (logical/physical): 4096B/4096B Partition Table: dasd Disk Flags: Number Start End Size File system Flags 1 98.3kB 1074MB 1074MB xfs 2 1074MB 22.2GB 21.1GB lvm
Coreos installer supports dasdfmt low level formatting and it will only format disk if install_dev is a physical disk(dasda,sda). However if install_dev is a vda, then coreos will jump directly to read the partitions(/boot). I think here dasdm is not formatted and it couldn't find any partitions and so it fails. It should work if we use KVM host to create partitions similar to the qcow2 image on dasdm.
In this case the DASD must have been formatted before, otherwise the guest couldn't use it at all. If I read the description correctly, the guest was able to write a GPT header to the disk (and destroy the DASD-specific label). So it seems that the coreos-installer is not recognizing the disk format correctly. E.g., using parted print can identify a DASD-format virtio block device. Not sure how Anaconda does it, but it detects the DASD properly.
@madeel - This is the problem, the coreOS installer cannot create the partitions for us like it performs for qcow2 partitioning. In our testing, before any CoreOS install attempt, the following would be run on the KVM Host system: 1) dasdfmt against the entire physical disk - /dev/dasdm. 2) fdasd to create one new single partition. 3) create a new file system either XFS or EXT4 - which should not matter as it will be overwritten. @Viktor, Mark and I looked into pre-creating/pre-partitioning to match what we see when actual dasd is used. The first partition is typically /boot and second partition is /sysroot. We tried to mimic this using fdasd: Disk /dev/dasdm: cylinders ............: 65520 tracks per cylinder ..: 15 blocks per track .....: 12 bytes per block ......: 4096 volume label .........: VOL1 volume serial ........: 0XB406 max partitions .......: 3 ------------------------------- tracks ------------------------------- Device start end length Id System /dev/dasdm1 2 8193 8192 1 Linux native /dev/dasdm2 8194 982799 974606 2 Linux native The only issue with the above is that two partitions are usually created, which are vda3 and vda4. So we are not sure if the installer will work with partitions 3 and 4 only. There is no way to resequence the partitions either since fdasd has a 3 partition limit. Dasd type volumes cannot get a vda3 and vda4 -- if that is a hard requirement, this could possibly be a problem? We then ensured the host writes out the I/O so that the partitions are indeed flushed before creating the guest. In this order; flush all partitions, then the full disk. We used the following commands: # blockdev --flushbufs /dev/dasdm1 # blockdev --flushbufs /dev/dasdm2 # blockdev --flushbufs /dev/dasdm We turned console logging on (I'll attach here) and run the RHCOS install. The same failure occurs, but we can see from the console log that the installer sees the two pre-created partitions /dev/dasdm1 and /dev/dasdm2: Starting udev Wait for Complete Device Initialization... [ 4.369213] virtio_blk virtio0: [vda] 11793600 4096-byte logical blocks (48.3 GB/45.0 GiB) [ 4.369215] vda: detected capacity change from 0 to 48306585600 [ 4.381569] systemd-udevd[612]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. [ 4.394518] systemd-udevd[616]: Using default interface naming scheme 'rhel-8.0'. [ 4.394534] systemd-udevd[616]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. [ 4.394573] virtio_net virtio3 enc1: renamed from eth0 [ 4.416485] vda:VOL1/ 0XB406: vda1 vda2 [ 4.417139] virtio_blk virtio4: [vdb] 575 512-byte logical blocks (294 kB/288 KiB) [ 4.417141] vdb: detected capacity change from 0 to 294400 [ 4.540451] systemd-udevd[615]: Using default interface naming scheme 'rhel-8.0'. Even though it initially sees the partitions correctly, it then tries to do something additional and fails: [ 11.454312] coreos-installer-service[1082]: Installing Red Hat Enterprise Linux CoreOS 48.84.202105021919-0 (Ootpa) s390x (4096-byte sectors) [ 11.465335] vda:VOL1/ 0XB406: vda1 vda2 [ 12.465786] coreos-installer-service[1082]: Read disk 275.1 MiB/3.4 GiB (7%) [ 13.465545] coreos-installer-service[1082]: Read disk 550.0 MiB/3.4 GiB (15%) ... [ 41.314036] coreos-installer-service[1082]: Read disk 3.4 GiB/3.4 GiB (100%) [ 42.466161] coreos-installer-service[1082]: Error: couldn't find any partitions on /dev/vda [ 42.466292] coreos-installer-service[1082]: Resetting partition table [ 42.694053] coreos-installer-service[1082]: Error: install failed [^[[0;1;31mFAILED^[[0m] Failed to start CoreOS Installer. See 'systemctl status coreos-installer.service' for details. Finally, after the failed install attempt. We can no longer successfully run fdasd against /dev/dasdm to view the partitions since the installer must've wiped out the partition table: # fdasd /dev/dasdm WARNING: Disk /dev/dasdm is online on operating system instances in 9 different LPARs. Ensure that the disk is not being used by a system outside your LPAR. Note: Your installation might include z/VM systems that are configured to automatically vary on disks, regardless of whether they are subsequently used. reading volume label ..: no known label Should I create a new one? (y/n): y Please specify volume serial (6 characters)[0Xb406]: Which forces us to re-format the disk.
Created attachment 1787408 [details] rhcos-console.log
Ideally, coreos-installer should not touch the partitioning for such a prepared disk. The console output doesn't reveal whether coreos-installer is overwriting the disk, but as the kernel messages show, Linux does see the partitions initially. It could be that the installer doesn't rely on the kernel to detect partitions but uses its own heuristics (which don't account for the DASD format). Maybe there's a way to instruct the installer to keep the partitions as they are?
coreos-installer is an image-based tool, basically a glorified `dd`. On non-s390x systems, it just copies the GPT and partitions from the install image onto the target disk. On s390x there's some special logic for /dev/dasd* which creates a VTOC matching the GPT partitions in the install image, and then copies each partition to the target disk (with adjustments for partition alignment). But overwriting the partition table is pretty fundamental to how coreos-installer works. Since we're not installing to /dev/dasd*, coreos-installer is writing out a GPT. After it finishes the copy, it wants to mount the new /boot partition to update some files on it, but apparently the guest kernel doesn't recognize the GPT partitions on that disk, which is why we die on "Error: couldn't find any partitions on /dev/vda". This seems strange to me: if the DASD is being presented as a generic virtio-blk device, shouldn't the kernel be able to parse a GPT on it?
(In reply to Benjamin Gilbert from comment #18) ... > couldn't find any partitions on /dev/vda". This seems strange to me: if the > DASD is being presented as a generic virtio-blk device, shouldn't the kernel > be able to parse a GPT on it? Unfortunately not, as the DASD is an ECKD device (enhanced count key data) and not a fixed block device. It is formatted in way to present up to three partitions of same-sized blocks to Linux which will show up as datasets in z/OS. Overwriting the first track will destroy essential volume information and make the DASD unusable until it's formatted again.
(In reply to Viktor Mihajlovski from comment #19) > Overwriting the first track will destroy essential volume information > and make the DASD unusable until it's formatted again. Ah, I see. It's unfortunate that the VM is able to affect the host in that way. Would it make sense to partition the DASD with a single Linux partition in the host, and then pass through only that partition to the VM as a virtio-blk device? That'd produce a somewhat strange partition layout (multiple GPT partitions inside a DASD partition) but it might make both host and VM happy.
(In reply to Benjamin Gilbert from comment #20) > Ah, I see. It's unfortunate that the VM is able to affect the host in that > way. Would it make sense to partition the DASD with a single Linux > partition in the host, and then pass through only that partition to the VM > as a virtio-blk device? That'd produce a somewhat strange partition layout > (multiple GPT partitions inside a DASD partition) but it might make both > host and VM happy. From a technical perspective this is surely possible (it might be necessary to tweak the geometry of the guest definition though). But, as you say it's unusual to have such recursive partitioning schemes, which complicates the disk management. It is relatively easy to fix the DASD recognition, so that they can be handled like all other block devices from both host and guest point of view.
(In reply to Viktor Mihajlovski from comment #21) > It is relatively easy to fix the DASD recognition, so that > they can be handled like all other block devices from both host and guest > point of view. My current understanding is that the device doesn't really act like a DASD inside the guest (we can't low-level format it, or perform DASD-specific ioctls if we wanted to), but we should put a VTOC on it because someone else previously put a VTOC on it. Is that correct? That's doable, but architecturally it seems suboptimal. It means the device has to be preconfigured by the host (in a guest-visible way) before the guest can use it, and also means that the guest can brick the device in a way that affects the host.
Yes, if the host pre-configures the device by performing low-level formatting, the guest should be able to install CoreOS with this pull request: https://github.com/coreos/coreos-installer/pull/551
fix - https://github.com/coreos/coreos-installer/pull/552
Clearing target release since 4.9 isn't available yet.
Fixed in coreos-installer-0.9.1-3.rhaos4.9.el8, which is built and in plashet. Awaiting bootimage bump.
Is the fix officially in the latest RHCOS builds? I tried builds rhcos-48.84.202107040919-0 and rhcos-49.84.202107170847-0, both fail as originally reported: [ 51.719211] coreos-installer-service[1144]: Error: couldn't find any partitions on /dev/vda [ 51.719405] coreos-installer-service[1144]: Resetting partition table [ 51.976257] coreos-installer-service[1144]: Error: install failed [FAILED] Failed to start CoreOS Installer. See 'systemctl status coreos-installer.service' for details. [DEPEND] Dependency failed for CoreOS Installer Target. [DEPEND] Dependency failed for Finalize CoreOS Installer Target. [DEPEND] Dependency failed for Reboot after CoreOS Installer.
Both of those builds should have the fix. Could you confirm the version of the coreos-installer RPM you're seeing in the live system?
For build rhcos-48.84.202107040919-0: bash-4.4# rpm -qa |grep coreos-installer coreos-installer-0.9.0-6.rhaos4.8.el8.s390x coreos-installer-bootinfra-0.9.0-6.rhaos4.8.el8.s390x For build rhcos-49.84.202107170847-0: bash-4.4# rpm -qa |grep coreos-installer coreos-installer-0.9.1-4.rhaos4.9.el8.s390x coreos-installer-bootinfra-0.9.1-4.rhaos4.9.el8.s390x However, I ran a dasdfmt and fdasd against the DASD device before re-attempting the virt-install. The RHCOS installed successfully for both 48.84 and 49.84. The fix looks good. Thank you.
Bootimage bump landed.
Verified based on https://bugzilla.redhat.com/show_bug.cgi?id=1960485#c29 OCP image registry.ci.openshift.org/ocp-s390x/release-s390x:4.9.0-0.nightly-s390x-2021-08-18-165339 is running with RHCOS 49.84.202108181206-0 and has a boot image of 49.84.202106302347-0, which has coreos-installer 0.9.1-4.rhaos4.9.el8 that contains the fix.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759