Description of problem: This is a continuation of https://bugzilla.redhat.com/show_bug.cgi?id=1747575. In order for Ignition configs to be able to reliably manipulate Azure data disks, the /dev/disk/azure paths are necessary. This requires the inclusion of /etc/udev/rules.d/66-azure-storage.rules in the initramfs. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Boot an RHCOS instance with an Ignition config which references /dev/disk/azure/scsi1/lun0 Actual results: RHCOS gets stuck while waiting for /dev/disk/azure/scsi1/lun0. Expected results: Boots without issue. Additional info:
We've received no additional requests for this functionality and there other higher priority efforts we like to focus on as we close out 4.3. Moving to 4.4.
In https://bugzilla.redhat.com/show_bug.cgi?id=1747575 we landed that udev rule, and from just launching a 4.3 cluster I see: # rpm-ostree status -b State: idle AutomaticUpdates: disabled BootedDeployment: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8c4059f184596157f64d69c4edbea9c9ef600560b7804a482779f513c3e0f40e CustomOrigin: Managed by machine-config-operator Version: 43.81.202001142154.0 (2020-01-14T21:59:51Z) # ls -al /dev/disk/azure/ total 0 drwxr-xr-x. 2 root root 180 Jan 31 21:16 . drwxr-xr-x. 9 root root 180 Jan 31 21:16 .. lrwxrwxrwx. 1 root root 9 Jan 31 21:16 resource -> ../../sdb lrwxrwxrwx. 1 root root 10 Jan 31 21:16 resource-part1 -> ../../sdb1 lrwxrwxrwx. 1 root root 9 Jan 31 21:16 root -> ../../sda lrwxrwxrwx. 1 root root 10 Jan 31 21:16 root-part1 -> ../../sda1 lrwxrwxrwx. 1 root root 10 Jan 31 21:16 root-part2 -> ../../sda2 lrwxrwxrwx. 1 root root 10 Jan 31 21:16 root-part3 -> ../../sda3 lrwxrwxrwx. 1 root root 10 Jan 31 21:16 root-part4 -> ../../sda4 # So it looks to me like we have what's requested.
Hi, Reopening this BZ as we are trying to do an UPI installation on Azure but it seems that this rules are not present at install time. We are installing an Openshift 4.6.6 cluster with three master nodes. Master nodes has OS_Disk and data disk attached, and we want to mount the data disk on /var/lib/etcd , we are using the following MachineConfig for this purpose: ~~~ apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: master name: 98-var-lib-etcd-partition spec: config: ignition: version: 3.1.0 storage: disks: - device: /dev/disk/azure/scsi1/lun0 partitions: - sizeMiB: 0 startMiB: 0 label: varlibetcd filesystems: - path: /var/lib/etcd device: /dev/disk/by-partlabel/varlibetcd format: xfs systemd: units: - name: var-lib-etcd.mount enabled: true contents: | [Unit] Before=local-fs.target [Mount] Where=/var/lib/etcd What=/dev/disk/by-partlabel/varlibetcd [Install] WantedBy=local-fs.target ~~~ We can see the following error in the master: ~~~ Dec 16 14:45:24 ignition[969]: disks: createPartitions: op(1): [started] waiting for devices [/dev/disk/azure/scsi1/lun0] Dec 16 14:46:54 systemd[1]: ignition-disks.service: Main process exited, code=exited, status=1/FAILURE Dec 16 14:46:54 systemd[1]: ignition-disks.service: Failed with result 'exit-code'. Dec 16 14:46:54 systemd[1]: Failed to start Ignition (disks). Dec 16 14:46:54 systemd[1]: ignition-disks.service: Triggering OnFailure= dependencies. Press Enter for emergency shell or wait 4 minutes 45 seconds for reboot. ~~~ ~~~ **] (3 of 3) A start job is running for Ignition (disks) (43s / no limit)[ 48.690324] systemd[1]: Started Afterburn (Check In - from the initramfs). [ OK ] Started Afterburn (Check In - from the initramfs). [ T[ 108.006075] systemd[1]: dev-disk-azure-scsi1-lun0.device: Job dev-disk-azure-scsi1-lun0.device/start timed out. IME ] Timed out waiting for [ 108.014081] systemd[1]: Timed out waiting for device dev-disk-azure-scsi1-lun0.device. device dev-disk-azure-scsi1-lun0.device. [ 108.020537] systemd[1]: dev-disk-azure-scsi1-lun0.device: Job dev-disk-azure-scsi1-lun0.device/start failed with result 'timeout'. [ 108.028158] ignition[969]: disks: createPartitions: op(1): [failed] waiting for devices [/dev/disk/azure/scsi1/lun0]: device unit dev-disk-azure-scsi1-lun0.device timeout [FAILED] Failed to [ 108.037499] ignition[969]: disks failedFull config: start Ignition (disks). [ 108.041284] ignition[969]: { ~~~ We can see the following devices through the emergency shell: ~~~ # ls /dev/disk by-id by-label by-partlabel by-partuuid by-path by-uuid # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 128G 0 disk `-sda1 8:1 0 128G 0 part sdb 8:16 0 2T 0 disk |-sdb1 8:17 0 384M 0 part |-sdb2 8:18 0 127M 0 part |-sdb3 8:19 0 1M 0 part `-sdb4 8:20 0 2.8G 0 part sdc 8:32 0 2T 0 disk sr0 11:0 1 634K 0 rom ~~~ Regards María
Discussing possible solution upstream - https://github.com/coreos/fedora-coreos-config/pull/786
We've fixed the use of the Azure udev rules in the initramfs via https://github.com/coreos/fedora-coreos-config/pull/786 for Fedora CoreOS But because the `WALinuxAgent-udev` package is not available for RHEL8 yet (see https://bugzilla.redhat.com/show_bug.cgi?id=1913074), we need to carry the rules ourselves and install them in the initramfs. This is captured in https://github.com/openshift/os/pull/480. Once that merges, we can update our RHCOS build configuration to pull it in.
This will be included with the boot image bump to openshift-install that is tracked in BZ#1915617
This was first included in RHCOS 47.83.202101100439-0
(In reply to Micah Abbott from comment #10) > This was first included in RHCOS 47.83.202101100439-0 Was able to successfully boot the image version described above in Azure, using the following config: Boot the image with: az vm create -n "${az_vm_name}" -g "${az_resource_group}" --image "${az_image_name}" --custom-data "$(cat ${ignition_path})" --attach-data-disks coreos0 Using the ignition file: ``` { "ignition": { "version": "3.2.0" }, "passwd": { "users": [ { "name": "core", "passwordHash": "$6$jamyHU6tcWovxP.e$rasKzY7tDn.LlazCF6Z4osY86aaXGEFOnkDSClPCw1B/DzPn2knv/kHCwncynti2r3k8MSLwcEsyEwqkDwZd8/", "sshAuthorizedKeys": [ ] } ] }, "storage": { "disks": [ { "device": "/dev/disk/azure/scsi1/lun0", "partitions": [ { "label": "varlibetcd", "sizeMiB": 0, "startMiB": 0 } ] } ], "filesystems": [ { "device": "/dev/disk/by-partlabel/varlibetcd", "format": "xfs", "path": "/var/lib/etcd" } ] }, "systemd": { "units": [ { "contents": "# Generated by FCCT\n[Unit]\nBefore=local-fs.target\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-varlibetcd.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-varlibetcd.service\n\n[Mount]\nWhere=/var/lib/etcd\nWhat=/dev/disk/by-partlabel/varlibetcd\nType=xfs\n\n[Install]\nRequiredBy=local-fs.target", "enabled": true, "name": "var-lib-etcd.mount" } ] } } [core@my-coreos-vm ~]$ ls /etc/udev/rules.d/ 70-persistent-ipoib.rules [core@my-coreos-vm ~]$ ls /usr/lib/dracut/modules.d/25* module-setup.sh [core@my-coreos-vm ~]$ ls /usr/lib/udev/rules.d/66-azure-storage.rules /usr/lib/udev/rules.d/66-azure-storage.rules [core@my-coreos-vm ~]$ ls /usr/lib/udev/rules.d/99-azure-product-uuid.rules /usr/lib/udev/rules.d/99-azure-product-uuid.rules ```
Marking VERIFIED based on comment #12. Thanks Renata!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633