Thanks for reporting your issue! In order for the CoreOS team to be able to quickly and successfully triage your issue, please fill out the following template as completely as possible. Be ready for follow-up questions and please respond in a timely manner. If we can't reproduce a bug, we might close your issue. --- OCP Version at Install Time: registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-05-25-193227 RHCOS Version at Install Time: rhcos-411.85.202205101201-0-azure.x86_64.vhd OCP Version after Upgrade (if applicable): No RHCOS Version after Upgrade (if applicable): Platform: Azure Architecture: x86_64 What are you trying to do? What is your use case? Ref https://docs.openshift.com/container-platform/4.10/installing/installing_azure/installing-azure-user-infra.html#installation-disk-partitioning-upi-templates_installing-azure-user-infra 1. Create manifest files $openshift-install create manifests --dir $HOME/clusterconfig 2. Create a butane config that configures the additional partition. variant: openshift version: 4.10.0 metadata: labels: machineconfiguration.openshift.io/role: master name: 98-var-lib-container-partition-master storage: disks: - device: /dev/disk/azure/scsi1/lun0 partitions: - label: data01 start_mib: 0 size_mib: 0 filesystems: - device: /dev/disk/by-partlabel/data01 path: /var/lib/containers format: xfs mount_options: [defaults, prjquota] with_mount_unit: true 3. Create a manifest from the butane config and save it. $butane 98-var-lib-container-partition-master.bu -o openshift/98-var-lib-container-partition-master.yaml 4. $openshift-install create ignition-configs --dir $HOME/clusterconfig 5. Start the upi install based on https://github.com/openshift/installer/blob/master/docs/user/azure/install_upi.md What happened? What went wrong or what did you expect? Install failed , got “level=error msg="Bootstrap failed to complete: timed out waiting for the condition" level=error msg="Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane." Checked the “Serial console” of the master vm on azure portal, entering emergency mode. “ Failed to start Ignition (disks).” Ignition failed: create partitions failed: failed to wait on disks devs: device unit dev-disk-azure-scsi1-lun0.device timeout In step 2, simplify the .bu file to narrow down the issue: variant: openshift version: 4.10.0 metadata: labels: machineconfiguration.openshift.io/role: master name: 98-var-lib-container-partition-master storage: disks: - device: /dev/disk/azure/scsi1/lun0 partitions: - label: data01 start_mib: 0 size_mib: 0 Still has the same issue. Expected result: Install cluster successfully, and created the separate partitions for /var/lib/containers on a new disk. Additional info: 1.Same test pass on 4.10 azure 2.Same test pass on 4.11 GCP If you're having problems booting/installing RHCOS, please provide: - the full contents of the serial console showing disk initialization, network configuration, and Ignition stage (see https://access.redhat.com/articles/7212 for information about configuring your serial console) - Ignition JSON - output of `journalctl -b` attached Please add anything else that might be useful, for example: cat /proc/cmdline BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-f9e266ad83004cbc6b43098c17d5d1a2b55c4eb586e5df92e489bfc012d6937b/vmlinuz-4.18.0-348.23.1.el8_5.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.firstboot ostree=/ostree/boot.1/rhcos/f9e266ad83004cbc6b43098c17d5d1a2b55c4eb586e5df92e489bfc012d6937b/0 ignition.platform.id=azure
Thanks for the report. This was broken in 4.11 by https://github.com/coreos/fedora-coreos-config/pull/1686 landing in RHCOS in https://github.com/openshift/os/pull/792. https://github.com/openshift/os/pull/785 has a fix, but it looks like the fix might land via https://github.com/openshift/os/pull/803 instead.
This bug has been reported fixed in a new RHCOS build and is ready for QE verification. To mark the bug verified, set the Verified field to Tested. This bug will automatically move to MODIFIED once the fix has landed in a new bootimage.
Test with rhcos-411.86.202206101554-0-qemu.x86_64.qcow2, azure rule files are included in initramfs $ lsinitrd /boot/ostree/rhcos-ea11226db1a77b02f841513020dfe6e4c76e72d5c2bad8bb72f6eee376f413be/initramfs-4.18.0-372.9.1.el8.x86_64.img | grep azure rhcos-azure-udev -rw-r--r-- 1 root root 1829 Jan 1 1970 usr/lib/udev/rules.d/66-azure-storage.rules -rw-r--r-- 1 root root 343 Jan 1 1970 usr/lib/udev/rules.d/99-azure-product-uuid.rules $ rpm -qa | grep WALinux WALinuxAgent-udev-2.3.0.2-2.el8.noarch
The fix for this bug has landed in a bootimage bump, as tracked in bug 2093126 (now in status MODIFIED). Moving this bug to MODIFIED.
Verified on 4.11.0-0.nightly-2022-07-05-083948 on Azure. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-07-05-083948 True False 24m Cluster version is 4.11.0-0.nightly-2022-07-05-083948 $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-4kx6pgb-1d09d-wng42-master-0 Ready master 50m v1.24.0+2dd8bb1 ci-ln-4kx6pgb-1d09d-wng42-master-1 Ready master 53m v1.24.0+2dd8bb1 ci-ln-4kx6pgb-1d09d-wng42-master-2 Ready master 50m v1.24.0+2dd8bb1 ci-ln-4kx6pgb-1d09d-wng42-worker-centralus1-cvscz Ready worker 40m v1.24.0+2dd8bb1 ci-ln-4kx6pgb-1d09d-wng42-worker-centralus2-dsnc8 Ready worker 40m v1.24.0+2dd8bb1 ci-ln-4kx6pgb-1d09d-wng42-worker-centralus3-g5js6 Ready worker 40m v1.24.0+2dd8bb1 $ oc debug node/ci-ln-4kx6pgb-1d09d-wng42-master-0 Starting pod/ci-ln-4kx6pgb-1d09d-wng42-master-0-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.0.8 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# ls /boot/ostree/ rhcos-2150d7de5c468ff33074df29dddc08b2faffa6ebc05eb84962431eb79f161c91 rhcos-b7024efc9c7149ccaa86c540ce45b355ea325b9c25b605291b8f11bfd39cf81d sh-4.4# ls -l /boot/ostree/ total 2 drwxr-xr-x. 2 root root 1024 Jul 5 18:46 rhcos-2150d7de5c468ff33074df29dddc08b2faffa6ebc05eb84962431eb79f161c91 drwxr-xr-x. 2 root root 1024 Jun 30 15:10 rhcos-b7024efc9c7149ccaa86c540ce45b355ea325b9c25b605291b8f11bfd39cf81d sh-4.4# lsinitrd /boot/ostree/rhcos-2150d7de5c468ff33074df29dddc08b2faffa6ebc05eb84962431eb79f161c91/initramfs-4.18.0-372.13.1.el8_6.x86_64.img | grep azure rhcos-azure-udev -rw-r--r-- 1 root root 1829 Jan 1 1970 usr/lib/udev/rules.d/66-azure-storage.rules -rw-r--r-- 1 root root 343 Jan 1 1970 usr/lib/udev/rules.d/99-azure-product-uuid.rules sh-4.4# lsinitrd /boot/ostree/rhcos-b7024efc9c7149ccaa86c540ce45b355ea325b9c25b605291b8f11bfd39cf81d/initramfs-4.18.0-372.13.1.el8_6.x86_64.img | grep azure rhcos-azure-udev -rw-r--r-- 1 root root 1829 Jan 1 1970 usr/lib/udev/rules.d/66-azure-storage.rules -rw-r--r-- 1 root root 343 Jan 1 1970 usr/lib/udev/rules.d/99-azure-product-uuid.rules sh-4.4# rpm-ostree status State: idle Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:04b54950ce296d73746f22b66ff6c5484c37be78cb34aaf352338359112fa241 CustomOrigin: Managed by machine-config-operator Version: 411.86.202207011902-0 (2022-07-01T19:05:18Z) 0505ffc1c711903785f27570819e973f086f594a8daa3ec9dfe2a059586ac42f Version: 411.86.202206301504-0 (2022-06-30T15:08:01Z) sh-4.4# rpm -qa | grep WALinux WALinuxAgent-udev-2.3.0.2-2.el8.noarch
check the cases with registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-07-05-083948 PASS
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069