Bug 1747575 - openshift-install 4.2 Azure does not install /etc/udev/rules.d/66-azure-storage.rules
Summary: openshift-install 4.2 Azure does not install /etc/udev/rules.d/66-azure-stora...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.2.0
Assignee: Micah Abbott
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-30 20:58 UTC by Craig Rodrigues
Modified: 2019-10-16 06:39 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:39:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1748638 0 unspecified CLOSED openshift-install 4.2 GCloud does not install /lib/udev/rules.d/65-gce-disk-naming.rules 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:39:41 UTC

Internal Links: 1748432 1748638

Description Craig Rodrigues 2019-08-30 20:58:29 UTC
Description of problem:

Version-Release number of the following components:

openshift-install version
openshift-install unreleased-master-1655-g4f3e73a0143ba36229f42e8b65b6e65342bb826b
built from commit 4f3e73a0143ba36229f42e8b65b6e65342bb826b
release image registry.svc.ci.openshift.org/origin/release:4.2

How reproducible:

Steps to Reproduce:
1.  Get the source code for openshift-install from https://github.com/openshift/installer and compile it

2.  Create a install-config.yaml file that looks like:

apiVersion: v1
baseDomain: azure.openshift.portworx.com
compute:
- hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: craig-azure-cool5
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  azure:
    baseDomainResourceGroupName: openshift
    region: westus
pullSecret: [redacted]
sshKey: [redacted]


3.  Install Openshift in Azure with:

openshift-install create cluster


apiVersion: v1
baseDomain: azure.openshift.portworx.com
compute:
- hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
metadata:
  creationTimestamp: null
  name: craig-azure-cool5
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  azure:
    baseDomainResourceGroupName: openshift
    region: westus
pullSecret: [redacted]
sshKey: [redacted]



4.  Openshift will provision a cluster.  On each node in the cluster, this OS is running:

NAME="Red Hat Enterprise Linux CoreOS"
VERSION="42.80.20190829.1"
VERSION_ID="4.2"
PRETTY_NAME="Red Hat Enterprise Linux CoreOS 42.80.20190829.1 (Ootpa)"
ID="rhcos"
ID_LIKE="rhel fedora"
ANSI_COLOR="0;31"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
REDHAT_BUGZILLA_PRODUCT_VERSION="4.2"
REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
REDHAT_SUPPORT_PRODUCT_VERSION="4.2"
OSTREE_VERSION=42.80.20190829.1


5.  If I log into one of the nodes, I see that in /etc/udev , we have:

/etc/udev/
/etc/udev/udev.conf
/etc/udev/hwdb.d
/etc/udev/rules.d
/etc/udev/rules.d/70-persistent-ipoib.rules
/etc/udev/hwdb.bin


However, according to the documentation at:

https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/troubleshoot-device-names-problems

Any VM in Azure, should have some azure specific UDEV files for dealing with
dynamic provisioning of storage devices:

Specifically, each node should have:

/etc/udev/rules.d/66-azure-storage.rules
/etc/udev/rules.d/99-azure-product-uuid.rules

which are provided by the walinuxagent package from Microsoft.



I work for Portworx ( https://www.portworx.com ), and found this problem
by trying to install Openshift 4.2 in Azure, and then dynamically provisioning Portworx storage devices, using the StorageCluster interface in Openshift.

The lack of the necessary udev files on the Openshift nodes breaks Portworx storage.

Comment 1 Craig Rodrigues 2019-08-30 21:06:03 UTC
Portworx storage works fine in Azure nodes configured via AKS.

Just to compare, I ran this command to create an AKS cluster, using Microsoft's code:


az aks create \
    --resource-group craig-awesome1-group \
    --name craig-aks-awesome2 \
    --node-count 1 \
    --enable-addons monitoring \
    --ssh-key-value ~/.ssh/id_rsa.pub \
    --debug


When I logged into the node created by this command, I found:

/etc/udev/rules.d
/etc/udev/rules.d/66-azure-storage.rules
/etc/udev/rules.d/99-azure-product-uuid.rules
/etc/udev/rules.d/70-persistent-net.rules
/etc/udev/rules.d/10-net-device-added.rules
/etc/udev/hwdb.d
/etc/udev/udev.conf


So they install that file, and thus 3rd party storage providers such as Portworx work fine in Azure.

Comment 2 Craig Rodrigues 2019-08-30 21:06:27 UTC
I also put a reference to this bug here:

https://github.com/openshift/installer/issues/2298

Comment 3 Craig Rodrigues 2019-08-30 21:13:35 UTC
66-azure-storage.rules comes from here:

https://github.com/Azure/WALinuxAgent/

so it looks like openshift-install needs to make sure that walinuxagent is installed.

Maybe the Azure rhcosimage used by openshift-install should have this installed in the image by default?

Comment 4 Abhinav Dahiya 2019-08-30 22:22:04 UTC
(In reply to Craig Rodrigues from comment #3)
> 66-azure-storage.rules comes from here:
> 
> https://github.com/Azure/WALinuxAgent/
> 
> so it looks like openshift-install needs to make sure that walinuxagent is
> installed.
> 
> Maybe the Azure rhcosimage used by openshift-install should have this
> installed in the image by default?

The RHCOS does not and probably wont ship the walinuxagent.

Comment 5 Craig Rodrigues 2019-08-30 22:35:44 UTC
Can you change the openshift-install logic to rpm install walinuxagent as part of the provisioning?

Comment 6 Craig Rodrigues 2019-08-30 22:37:58 UTC
I ran an additional experiment where I provisioned a bare Centos 7.5 VM in Azure (no AKS, nothing fancy):

The version is:

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"



The files in udev are:

/etc/udev/
/etc/udev/rules.d
/etc/udev/rules.d/66-azure-storage.rules
/etc/udev/rules.d/99-azure-product-uuid.rules
/etc/udev/rules.d/75-persistent-net-generator.rules
/etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules
/etc/udev/udev.conf
/etc/udev/hwdb.bin


rpm -qf /etc/udev/rules.d/66-azure-storage.rules

WALinuxAgent-2.2.18-1.el7.centos.noarch

Comment 7 Jan Safranek 2019-09-03 07:56:30 UTC
Storage is not the right component here - we maintain PVs and PVCs and storage plugins in kube-apiserver, kube-controller-manager or kubelet. Especially kubelet must already have all the uDev rules installed when it starts, we should not install them from inside kubelet (and reboot the machine).

The rules must either be created by installer or RHCOS. Trying RHCOS.

Comment 9 Craig Rodrigues 2019-09-03 15:36:24 UTC
I agree with Jan Safranek.

The WALinuxAgent package package which contains the necessary UDEV rules for hosts running on Azure
should not be dealt with at the Kubernetes (kube-apiserver, kube-controller-manager, or kubelet layer).

My recommendations are to either:

1.  Make the WALinuxAgent package part of the base RHCOS image which is installed on Azure, 

OR

2.  Change the terraform logic inside openshift-install to install the WALinuxAgent package when provisioning hosts on Azure.

Comment 10 Alex Crawford 2019-09-03 15:54:35 UTC
We are actually going to go with a different option altogether. WALinuxAgent invites too many anti-patterns in the OpenShift 4 model, where everything is declarative, and we've decided not to ship the agent at all. We instead have our own, minimal agent: https://github.com/coreos/afterburn. As the GitHub org implies, this utility and model originated in Container Linux (by CoreOS).

As for the udev rules, we will just include those in RHCOS.

Comment 11 Craig Rodrigues 2019-09-03 17:34:33 UTC
Ah that's interesting, I did not know about afterburn.

The udev rules looks like they come from this GitHub repository maintained by Microsoft:

https://github.com/Azure/WALinuxAgent/tree/master/config/

66-azure-storage.rules looks like it hasn't changed much in the past 3 years, so hopefully just including it without
installing WALinuxAgent should do the trick.

Comment 12 Craig Rodrigues 2019-09-11 23:23:30 UTC
At Portworx, we have extensive tests for cloud storage.

As you fix this bug on Azure, could you run the tests we have for mounting storage?
Our tests are Open Source.

You can do the following.

1.  Provision an Openshift cluster on Azure

2.  Get direct access to one of the nodes and log into it.

3.  Read: https://github.com/libopenstorage/cloudops/blob/master/azure/README.md
    for

4.  Use the following container to checkout and run the tests on Azure, replace the environment variables
    with your Azure setup:

docker run \
       --rm \
       -t \
        -i \
       -e AZURE_INSTANCE_ID=<instance-id> \
       -e AZURE_INSTANCE_REGION=<instance-region> \
       -e AZURE_SCALE_SET_NAME=<scale-set-name> \
       -e AZURE_SUBSCRIPTION_ID=<subscription-id> \
       -e AZURE_RESOURCE_GROUP_NAME=<resource-group-name-of-instance> \
       -e AZURE_ENVIRONMENT=<azure-cloud-environment> \
       -e AZURE_TENANT_ID=<tenant-id> \
       -e AZURE_CLIENT_ID=<client-id> \
       -e AZURE_CLIENT_SECRET=<client-secret> \
       -v $PWD:/go/src/github.com/libopenstorage \
       -w /go/src/github.com/libopenstorage \
       hatsunemiku/golang-dev-docker \
       bash -c 'git clone https://github.com/libopenstorage/cloudops && cd cloudops  && make && make test'

Comment 13 Craig Rodrigues 2019-09-12 00:36:43 UTC
You can run additional tests on Azure by doing:


docker run \
       --rm \
       -t \
        -i \
       -e AZURE_INSTANCE_ID=<instance-id> \
       -e AZURE_INSTANCE_REGION=<instance-region> \
       -e AZURE_SCALE_SET_NAME=<scale-set-name> \
       -e AZURE_SUBSCRIPTION_ID=<subscription-id> \
       -e AZURE_RESOURCE_GROUP_NAME=<resource-group-name-of-instance> \
       -e AZURE_ENVIRONMENT=<azure-cloud-environment> \
       -e AZURE_TENANT_ID=<tenant-id> \
       -e AZURE_CLIENT_ID=<client-id> \
       -e AZURE_CLIENT_SECRET=<client-secret> \
       -v $PWD:/go/src/github.com/libopenstorage \
       -w /go/src/github.com/libopenstorage \
       hatsunemiku/golang-dev-docker \
       bash -c 'git clone https://github.com/libopenstorage/cloudops && cd cloudops/azure && go test -v'

Comment 15 Micah Abbott 2019-09-12 18:36:53 UTC
Rules have been added to the RHCOS config and will be present in RHCOS 42.80.20190911.0 and later.

Comment 18 Craig Rodrigues 2019-09-23 22:41:05 UTC
Micah,

I verified this fix on the latest RHCOS image with openshift-install on Azure.

Specifically, I used the latest openshift-install to provision an Openshift 4 cluster in Azure,
then I provisioned Portworx, created a StorageCluster, and I observed that the disks
were created properly and mounted.

Thanks a lot for working on this fix, and running the libopenstorage/cloudops tests.

Comment 19 errata-xmlrpc 2019-10-16 06:39:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.