Bug 1756173 - /etc/udev/rules.d/66-azure-storage.rules missing from initramfs
Summary: /etc/udev/rules.d/66-azure-storage.rules missing from initramfs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.7.0
Assignee: Micah Abbott
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1915617
TreeView+ depends on / blocked
 
Reported: 2019-09-26 23:44 UTC by Alex Crawford
Modified: 2024-12-20 18:54 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The udev rules for Azure disks was missing from the RHCOS initramfs. Consequence: Azure users that tried to configure additional disks during RHCOS install time would face failure. Fix: Include the necessary udev rules for Azure disks in the RHCOS initramfs. Result: Azure users are able to configure additional disks during RHCOS install time.
Clone Of:
Environment:
Last Closed: 2021-02-24 15:10:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift os pull 480 0 None closed Bump fedora-coreos-config for various fixes 2021-02-15 09:12:24 UTC
Red Hat Knowledge Base (Solution) 4952011 0 None None None 2021-05-18 16:35:36 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:11:52 UTC

Description Alex Crawford 2019-09-26 23:44:38 UTC
Description of problem:

This is a continuation of https://bugzilla.redhat.com/show_bug.cgi?id=1747575. In order for Ignition configs to be able to reliably manipulate Azure data disks, the /dev/disk/azure paths are necessary. This requires the inclusion of /etc/udev/rules.d/66-azure-storage.rules in the initramfs.


Version-Release number of selected component (if applicable):


How reproducible:

Always


Steps to Reproduce:

1. Boot an RHCOS instance with an Ignition config which references /dev/disk/azure/scsi1/lun0


Actual results:

RHCOS gets stuck while waiting for /dev/disk/azure/scsi1/lun0.


Expected results:

Boots without issue.


Additional info:

Comment 2 Micah Abbott 2019-11-08 19:09:36 UTC
We've received no additional requests for this functionality and there other higher priority efforts we like to focus on as we close out 4.3.  Moving to 4.4.

Comment 3 Colin Walters 2020-01-31 21:46:30 UTC
In https://bugzilla.redhat.com/show_bug.cgi?id=1747575 we landed that udev rule, and from just launching a 4.3 cluster I see:

# rpm-ostree status -b
State: idle
AutomaticUpdates: disabled
BootedDeployment:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8c4059f184596157f64d69c4edbea9c9ef600560b7804a482779f513c3e0f40e
              CustomOrigin: Managed by machine-config-operator
                   Version: 43.81.202001142154.0 (2020-01-14T21:59:51Z)
# ls -al /dev/disk/azure/
total 0
drwxr-xr-x. 2 root root 180 Jan 31 21:16 .
drwxr-xr-x. 9 root root 180 Jan 31 21:16 ..
lrwxrwxrwx. 1 root root   9 Jan 31 21:16 resource -> ../../sdb
lrwxrwxrwx. 1 root root  10 Jan 31 21:16 resource-part1 -> ../../sdb1
lrwxrwxrwx. 1 root root   9 Jan 31 21:16 root -> ../../sda
lrwxrwxrwx. 1 root root  10 Jan 31 21:16 root-part1 -> ../../sda1
lrwxrwxrwx. 1 root root  10 Jan 31 21:16 root-part2 -> ../../sda2
lrwxrwxrwx. 1 root root  10 Jan 31 21:16 root-part3 -> ../../sda3
lrwxrwxrwx. 1 root root  10 Jan 31 21:16 root-part4 -> ../../sda4
# 

So it looks to me like we have what's requested.

Comment 4 Maria Alonso 2020-12-16 15:36:27 UTC
Hi,

Reopening this BZ as we are trying to do an UPI installation on Azure but it seems that this rules are not present at install time.

We are installing an Openshift 4.6.6 cluster with three master nodes. Master nodes has OS_Disk and data disk attached, 
and we want to mount the data disk on /var/lib/etcd , we are using the following MachineConfig for this purpose:

~~~
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 98-var-lib-etcd-partition
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      disks:
      - device: /dev/disk/azure/scsi1/lun0
        partitions:
        - sizeMiB: 0
          startMiB: 0
          label: varlibetcd
      filesystems:
        - path: /var/lib/etcd
          device: /dev/disk/by-partlabel/varlibetcd
          format: xfs
    systemd:
      units:
        - name: var-lib-etcd.mount
          enabled: true
          contents: |
            [Unit]
            Before=local-fs.target
            [Mount]
            Where=/var/lib/etcd
            What=/dev/disk/by-partlabel/varlibetcd
            [Install]
            WantedBy=local-fs.target
~~~

We can see the following error in the master:

~~~
Dec 16 14:45:24 ignition[969]: disks: createPartitions: op(1): [started]  waiting for devices [/dev/disk/azure/scsi1/lun0]
Dec 16 14:46:54 systemd[1]: ignition-disks.service: Main process exited, code=exited, status=1/FAILURE
Dec 16 14:46:54 systemd[1]: ignition-disks.service: Failed with result 'exit-code'.
Dec 16 14:46:54 systemd[1]: Failed to start Ignition (disks).
Dec 16 14:46:54 systemd[1]: ignition-disks.service: Triggering OnFailure= dependencies.
Press Enter for emergency shell or wait 4 minutes 45 seconds for reboot.
~~~

~~~
**] (3 of 3) A start job is running for Ignition (disks) (43s / no limit)[   48.690324] systemd[1]: Started Afterburn (Check In - from the initramfs).
[  OK  ] Started Afterburn (Check In - from the initramfs).
[ T[  108.006075] systemd[1]: dev-disk-azure-scsi1-lun0.device: Job dev-disk-azure-scsi1-lun0.device/start timed out.
IME ] Timed out waiting for [  108.014081] systemd[1]: Timed out waiting for device dev-disk-azure-scsi1-lun0.device.
device dev-disk-azure-scsi1-lun0.device.
[  108.020537] systemd[1]: dev-disk-azure-scsi1-lun0.device: Job dev-disk-azure-scsi1-lun0.device/start failed with result 'timeout'.
[  108.028158] ignition[969]: disks: createPartitions: op(1): [failed]   waiting for devices [/dev/disk/azure/scsi1/lun0]: device unit dev-disk-azure-scsi1-lun0.device timeout
[FAILED] Failed to [  108.037499] ignition[969]: disks failedFull config:
start Ignition (disks).
[  108.041284] ignition[969]: {
~~~

We can see the following devices through the emergency shell:

~~~
# ls /dev/disk
by-id  by-label  by-partlabel  by-partuuid  by-path  by-uuid

# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0  128G  0 disk
`-sda1   8:1    0  128G  0 part
sdb      8:16   0    2T  0 disk
|-sdb1   8:17   0  384M  0 part
|-sdb2   8:18   0  127M  0 part
|-sdb3   8:19   0    1M  0 part
`-sdb4   8:20   0  2.8G  0 part
sdc      8:32   0    2T  0 disk
sr0     11:0    1  634K  0 rom
~~~

Regards

  María

Comment 7 Micah Abbott 2020-12-18 19:50:57 UTC
Discussing possible solution upstream - https://github.com/coreos/fedora-coreos-config/pull/786

Comment 8 Micah Abbott 2021-01-07 15:56:23 UTC
We've fixed the use of the Azure udev rules in the initramfs via https://github.com/coreos/fedora-coreos-config/pull/786 for Fedora CoreOS

But because the `WALinuxAgent-udev` package is not available for RHEL8 yet (see https://bugzilla.redhat.com/show_bug.cgi?id=1913074), we need to carry the rules ourselves and install them in the initramfs.

This is captured in https://github.com/openshift/os/pull/480.  Once that merges, we can update our RHCOS build configuration to pull it in.

Comment 9 Micah Abbott 2021-01-15 20:43:54 UTC
This will be included with the boot image bump to openshift-install that is tracked in BZ#1915617

Comment 10 Micah Abbott 2021-01-15 20:49:09 UTC
This was first included in RHCOS 47.83.202101100439-0

Comment 12 Renata Ravanelli 2021-01-20 12:57:59 UTC
(In reply to Micah Abbott from comment #10)
> This was first included in RHCOS 47.83.202101100439-0


Was able to successfully boot the image version described above in Azure, using the following config:

Boot the image with:
az vm create -n "${az_vm_name}" -g "${az_resource_group}" --image "${az_image_name}" --custom-data "$(cat ${ignition_path})" --attach-data-disks coreos0


Using the ignition file:

```
{
  "ignition": {
    "version": "3.2.0"
  },
  "passwd": {
    "users": [
      {
        "name": "core",
        "passwordHash": "$6$jamyHU6tcWovxP.e$rasKzY7tDn.LlazCF6Z4osY86aaXGEFOnkDSClPCw1B/DzPn2knv/kHCwncynti2r3k8MSLwcEsyEwqkDwZd8/",
        "sshAuthorizedKeys": [
        ]
      }
    ]
  },
  "storage": {
    "disks": [
      {
        "device": "/dev/disk/azure/scsi1/lun0",
        "partitions": [
          {
            "label": "varlibetcd",
            "sizeMiB": 0,
            "startMiB": 0
          }
        ]
      }
    ],
    "filesystems": [
      {
        "device": "/dev/disk/by-partlabel/varlibetcd",
        "format": "xfs",
        "path": "/var/lib/etcd"
      }
    ]
  },
  "systemd": {
    "units": [
      {
        "contents": "# Generated by FCCT\n[Unit]\nBefore=local-fs.target\nRequires=systemd-fsck@dev-disk-by\\x2dpartlabel-varlibetcd.service\nAfter=systemd-fsck@dev-disk-by\\x2dpartlabel-varlibetcd.service\n\n[Mount]\nWhere=/var/lib/etcd\nWhat=/dev/disk/by-partlabel/varlibetcd\nType=xfs\n\n[Install]\nRequiredBy=local-fs.target",
        "enabled": true,
        "name": "var-lib-etcd.mount"
      }
    ]
  }
}


[core@my-coreos-vm ~]$ ls /etc/udev/rules.d/
70-persistent-ipoib.rules
[core@my-coreos-vm ~]$ ls /usr/lib/dracut/modules.d/25*
module-setup.sh
[core@my-coreos-vm ~]$ ls /usr/lib/udev/rules.d/66-azure-storage.rules
/usr/lib/udev/rules.d/66-azure-storage.rules
[core@my-coreos-vm ~]$  ls /usr/lib/udev/rules.d/99-azure-product-uuid.rules
/usr/lib/udev/rules.d/99-azure-product-uuid.rules

```

Comment 13 Micah Abbott 2021-01-20 14:21:40 UTC
Marking VERIFIED based on comment #12.  Thanks Renata!

Comment 18 errata-xmlrpc 2021-02-24 15:10:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.