2092966 – [OCP 4.11] [azure] /etc/udev/rules.d/66-azure-storage.rules missing from initramfs

Bug 2092966 - [OCP 4.11] [azure] /etc/udev/rules.d/66-azure-storage.rules missing from initramfs

Summary: [OCP 4.11] [azure] /etc/udev/rules.d/66-azure-storage.rules missing from init...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Timothée Ravier
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:	2093126
Blocks:
TreeView+	depends on / blocked

Reported:	2022-06-02 16:06 UTC by MayXu
Modified:	2022-08-10 11:16 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 11:15:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift os pull 815	0	None	Merged	Rebase to RHEL 8.6	2022-06-07 09:30:39 UTC
Red Hat Product Errata	RHSA-2022:5069	0	None	None	None	2022-08-10 11:16:05 UTC

Description MayXu 2022-06-02 16:06:48 UTC

Thanks for reporting your issue!

In order for the CoreOS team to be able to quickly and successfully triage your issue, please fill out the following template as completely as possible.

Be ready for follow-up questions and please respond in a timely manner.

If we can't reproduce a bug, we might close your issue.

---

OCP Version at Install Time: registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-05-25-193227

RHCOS Version at Install Time: rhcos-411.85.202205101201-0-azure.x86_64.vhd
OCP Version after Upgrade (if applicable): No
RHCOS Version after Upgrade (if applicable):
Platform: Azure
Architecture: x86_64


What are you trying to do? What is your use case?

Ref https://docs.openshift.com/container-platform/4.10/installing/installing_azure/installing-azure-user-infra.html#installation-disk-partitioning-upi-templates_installing-azure-user-infra 
1. Create manifest files 
$openshift-install create manifests --dir $HOME/clusterconfig
2. Create a butane config that configures the additional partition.
variant: openshift
version: 4.10.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 98-var-lib-container-partition-master
storage:
  disks:
  - device: /dev/disk/azure/scsi1/lun0
    partitions:
    - label: data01
      start_mib: 0
      size_mib: 0
  filesystems:
    - device: /dev/disk/by-partlabel/data01
      path: /var/lib/containers
      format: xfs
      mount_options: [defaults, prjquota] 
      with_mount_unit: true

3. Create a manifest from the butane config and save it. 
$butane 98-var-lib-container-partition-master.bu -o openshift/98-var-lib-container-partition-master.yaml
4. $openshift-install create ignition-configs --dir $HOME/clusterconfig
5. Start the upi install based on https://github.com/openshift/installer/blob/master/docs/user/azure/install_upi.md


What happened? What went wrong or what did you expect?
Install failed , got 
“level=error msg="Bootstrap failed to complete: timed out waiting for the condition"
level=error msg="Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane."

Checked the “Serial console” of the master vm on azure portal, entering emergency mode.  
“ Failed to start Ignition (disks).”
Ignition failed: create partitions failed: failed to wait on disks devs: device unit dev-disk-azure-scsi1-lun0.device timeout

In step 2, simplify the .bu file to narrow down the issue:
variant: openshift
version: 4.10.0
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 98-var-lib-container-partition-master
storage:
  disks:
  - device: /dev/disk/azure/scsi1/lun0
    partitions:
    - label: data01
      start_mib: 0
      size_mib: 0
Still has the same issue. 

Expected result:
Install cluster successfully, and created the separate partitions for /var/lib/containers on a new disk. 

Additional info:
1.Same test pass on 4.10 azure 
2.Same test pass on 4.11 GCP 


If you're having problems booting/installing RHCOS, please provide:
- the full contents of the serial console showing disk initialization, network configuration, and Ignition stage (see https://access.redhat.com/articles/7212 for information about configuring your serial console)

- Ignition JSON
- output of `journalctl -b`
attached 


Please add anything else that might be useful, for example:
cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-f9e266ad83004cbc6b43098c17d5d1a2b55c4eb586e5df92e489bfc012d6937b/vmlinuz-4.18.0-348.23.1.el8_5.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.firstboot ostree=/ostree/boot.1/rhcos/f9e266ad83004cbc6b43098c17d5d1a2b55c4eb586e5df92e489bfc012d6937b/0 ignition.platform.id=azure

Comment 4 Benjamin Gilbert 2022-06-03 02:33:40 UTC

Thanks for the report.  This was broken in 4.11 by https://github.com/coreos/fedora-coreos-config/pull/1686 landing in RHCOS in https://github.com/openshift/os/pull/792.  https://github.com/openshift/os/pull/785 has a fix, but it looks like the fix might land via https://github.com/openshift/os/pull/803 instead.

Comment 5 RHCOS Bug Bot 2022-06-07 09:33:53 UTC

This bug has been reported fixed in a new RHCOS build and is ready for QE verification.  To mark the bug verified, set the Verified field to Tested.  This bug will automatically move to MODIFIED once the fix has landed in a new bootimage.

Comment 6 HuijingHei 2022-06-13 01:35:27 UTC

Test with rhcos-411.86.202206101554-0-qemu.x86_64.qcow2, azure rule files are included in initramfs

$ lsinitrd /boot/ostree/rhcos-ea11226db1a77b02f841513020dfe6e4c76e72d5c2bad8bb72f6eee376f413be/initramfs-4.18.0-372.9.1.el8.x86_64.img | grep azure
rhcos-azure-udev
-rw-r--r--   1 root     root         1829 Jan  1  1970 usr/lib/udev/rules.d/66-azure-storage.rules
-rw-r--r--   1 root     root          343 Jan  1  1970 usr/lib/udev/rules.d/99-azure-product-uuid.rules
$ rpm -qa | grep WALinux
WALinuxAgent-udev-2.3.0.2-2.el8.noarch

Comment 7 RHCOS Bug Bot 2022-06-30 22:16:24 UTC

The fix for this bug has landed in a bootimage bump, as tracked in bug 2093126 (now in status MODIFIED).  Moving this bug to MODIFIED.

Comment 10 Michael Nguyen 2022-07-05 19:44:23 UTC

Verified on 4.11.0-0.nightly-2022-07-05-083948 on Azure.

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-07-05-083948   True        False         24m     Cluster version is 4.11.0-0.nightly-2022-07-05-083948
$ oc get nodes
NAME                                                STATUS   ROLES    AGE   VERSION
ci-ln-4kx6pgb-1d09d-wng42-master-0                  Ready    master   50m   v1.24.0+2dd8bb1
ci-ln-4kx6pgb-1d09d-wng42-master-1                  Ready    master   53m   v1.24.0+2dd8bb1
ci-ln-4kx6pgb-1d09d-wng42-master-2                  Ready    master   50m   v1.24.0+2dd8bb1
ci-ln-4kx6pgb-1d09d-wng42-worker-centralus1-cvscz   Ready    worker   40m   v1.24.0+2dd8bb1
ci-ln-4kx6pgb-1d09d-wng42-worker-centralus2-dsnc8   Ready    worker   40m   v1.24.0+2dd8bb1
ci-ln-4kx6pgb-1d09d-wng42-worker-centralus3-g5js6   Ready    worker   40m   v1.24.0+2dd8bb1
$ oc debug node/ci-ln-4kx6pgb-1d09d-wng42-master-0

Starting pod/ci-ln-4kx6pgb-1d09d-wng42-master-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.8
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# ls /boot/ostree/
rhcos-2150d7de5c468ff33074df29dddc08b2faffa6ebc05eb84962431eb79f161c91	rhcos-b7024efc9c7149ccaa86c540ce45b355ea325b9c25b605291b8f11bfd39cf81d
sh-4.4# ls -l /boot/ostree/
total 2
drwxr-xr-x. 2 root root 1024 Jul  5 18:46 rhcos-2150d7de5c468ff33074df29dddc08b2faffa6ebc05eb84962431eb79f161c91
drwxr-xr-x. 2 root root 1024 Jun 30 15:10 rhcos-b7024efc9c7149ccaa86c540ce45b355ea325b9c25b605291b8f11bfd39cf81d
sh-4.4# lsinitrd /boot/ostree/rhcos-2150d7de5c468ff33074df29dddc08b2faffa6ebc05eb84962431eb79f161c91/initramfs-4.18.0-372.13.1.el8_6.x86_64.img  | grep azure
rhcos-azure-udev
-rw-r--r--   1 root     root         1829 Jan  1  1970 usr/lib/udev/rules.d/66-azure-storage.rules
-rw-r--r--   1 root     root          343 Jan  1  1970 usr/lib/udev/rules.d/99-azure-product-uuid.rules
sh-4.4# lsinitrd /boot/ostree/rhcos-b7024efc9c7149ccaa86c540ce45b355ea325b9c25b605291b8f11bfd39cf81d/initramfs-4.18.0-372.13.1.el8_6.x86_64.img  | grep azure
rhcos-azure-udev
-rw-r--r--   1 root     root         1829 Jan  1  1970 usr/lib/udev/rules.d/66-azure-storage.rules
-rw-r--r--   1 root     root          343 Jan  1  1970 usr/lib/udev/rules.d/99-azure-product-uuid.rules
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:04b54950ce296d73746f22b66ff6c5484c37be78cb34aaf352338359112fa241
              CustomOrigin: Managed by machine-config-operator
                   Version: 411.86.202207011902-0 (2022-07-01T19:05:18Z)

  0505ffc1c711903785f27570819e973f086f594a8daa3ec9dfe2a059586ac42f
                   Version: 411.86.202206301504-0 (2022-06-30T15:08:01Z)
sh-4.4# rpm -qa | grep WALinux
WALinuxAgent-udev-2.3.0.2-2.el8.noarch

Comment 11 MayXu 2022-07-07 04:36:09 UTC

check the cases with registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-07-05-083948 PASS

Comment 12 errata-xmlrpc 2022-08-10 11:15:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.