Bug 2015635 - Storage operator fails causing installation to fail on ASH
Summary: Storage operator fails causing installation to fail on ASH
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Jan Safranek
QA Contact: Wei Duan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-19 18:00 UTC by To Hung Sze
Modified: 2022-03-10 16:20 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:20:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
must-gather (6.55 MB, application/x-xz)
2021-10-19 18:00 UTC, To Hung Sze
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-storage-operator pull 228 0 None open Bug 2015635: Remove Azure Stack Hub detection. 2021-10-20 10:19:40 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:20:52 UTC

Description To Hung Sze 2021-10-19 18:00:49 UTC
Created attachment 1834737 [details]
must-gather

Created attachment 1834737 [details]
must-gather

Description of problem:
Installation of UPI ASH fails because of storage opeartor.

Version-Release number of selected component (if applicable):
openshift-install-linux-4.10.16-173656

How reproducible: Always

Steps to Reproduce:
1. Install 4.10 UPI ASH following:
https://deploy-preview-36950--osdocs.netlify.app/openshift-enterprise/latest/installing/installing_azure_stack_hub/installing-azure-stack-hub-user-infra.html#installation-creating-azure-dns_installing-azure-stack-hub-user-infra


Actual result:

Installation fails with
ERROR Cluster operator storage Degraded is True with AzureFileCSIDriverOperatorCR_AzureFileDriverControllerServiceController_SyncError::AzureFileDriverNodeServiceController_SyncError::AzureFileDriverStaticResourcesController_SyncError: AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverControllerServiceControllerDegraded: Deployment.apps "azure-file-csi-driver-controller" is invalid: spec.template.spec.initContainers[0].image: Required value 
ERROR AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverNodeServiceControllerDegraded: DaemonSet.apps "azure-file-csi-driver-node" is invalid: spec.template.spec.initContainers[0].image: Required value 
ERROR AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: "rbac/csi_driver_role.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" is forbidden: user "system:serviceaccount:openshift-cluster-csi-drivers:azure-file-csi-driver-operator" (groups=["system:serviceaccounts" "system:serviceaccounts:openshift-cluster-csi-drivers" "system:authenticated"]) is attempting to grant RBAC permissions not currently held: 
ERROR AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: {APIGroups:[""], Resources:["secrets"], Verbs:["create" "update" "delete" "patch"]} 
ERROR AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: "rbac/csi_driver_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" not found 
ERROR AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded:  
INFO Cluster operator storage Progressing is True with AzureFileCSIDriverOperatorCR_WaitForOperator: AzureFileCSIDriverOperatorCRProgressing: Waiting for AzureFile operator to report status 
INFO Cluster operator storage Available is False with AzureFileCSIDriverOperatorCR_WaitForOperator: AzureFileCSIDriverOperatorCRAvailable: Waiting for AzureFile operator to report status 
ERROR Cluster initialization failed because one or more operators are not functioning properly. 
ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, 
ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html 
ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation 
FATAL failed to initialize the cluster: Cluster operator storage is not available 


All operators are available except:
   
storage                                    4.10.0-0.nightly-2021-10-16-173656   False       True          True       100m    AzureFileCSIDriverOperatorCRAvailable: Waiting for AzureFile operator to report status


Must gather reports
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: 1b19050b-36d9-48e4-b668-c03c29d693b3
ClusterVersion: Installing "4.10.0-0.nightly-2021-10-16-173656" for 2 hours: Unable to apply 4.10.0-0.nightly-2021-10-16-173656: the cluster operator storage has not yet successfully rolled out
ClusterOperators:
	clusteroperator/storage is not available (AzureFileCSIDriverOperatorCRAvailable: Waiting for AzureFile operator to report status) because AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverControllerServiceControllerDegraded: Deployment.apps "azure-file-csi-driver-controller" is invalid: spec.template.spec.initContainers[0].image: Required value
AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverNodeServiceControllerDegraded: DaemonSet.apps "azure-file-csi-driver-node" is invalid: spec.template.spec.initContainers[0].image: Required value
AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: "rbac/csi_driver_role.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" is forbidden: user "system:serviceaccount:openshift-cluster-csi-drivers:azure-file-csi-driver-operator" (groups=["system:serviceaccounts" "system:serviceaccounts:openshift-cluster-csi-drivers" "system:authenticated"]) is attempting to grant RBAC permissions not currently held:
AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: {APIGroups:[""], Resources:["secrets"], Verbs:["create" "update" "delete" "patch"]}
AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: "rbac/csi_driver_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" not found
AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: 



Expected results:
Installation completes (same process completes with 4.9.0)


Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:
Happens on dev CI as well.

Comment 1 Jan Safranek 2021-10-20 10:05:53 UTC
AzureFile CSI driver operator gets degraded. ClusterCSIDriver file.csi.azure.com.yaml from the must-gather:

  - lastTransitionTime: "2021-10-19T16:02:36Z"
    message: 'DaemonSet.apps "azure-file-csi-driver-node" is invalid: spec.template.spec.initContainers[0].image:
      Required value'
    reason: SyncError
    status: "True"
    type: AzureFileDriverNodeServiceControllerDegraded
  - lastTransitionTime: "2021-10-19T16:02:36Z"
    message: 'Deployment.apps "azure-file-csi-driver-controller" is invalid: spec.template.spec.initContainers[0].image:
      Required value'
    reason: SyncError
    status: "True"
    type: AzureFileDriverControllerServiceControllerDegraded
  - lastTransitionTime: "2021-10-19T16:02:36Z"
    status: "False"
    type: ConfigObservationDegraded
  - lastTransitionTime: "2021-10-19T16:02:41Z"
    message: |
      "rbac/csi_driver_role.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" is forbidden: user "system:serviceaccount:openshift-cluster-csi-drivers:azure-file-csi-driver-operator" (groups=["system:serviceaccounts" "system:serviceaccounts:openshift-cluster-csi-drivers" "system:authe
nticated"]) is attempting to grant RBAC permissions not currently held:
      {APIGroups:[""], Resources:["secrets"], Verbs:["create" "update" "delete" "patch"]}
      "rbac/csi_driver_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" not found
    reason: SyncError
    status: "True"
    type: AzureFileDriverStaticResourcesControllerDegraded
  readyReplicas: 0

Comment 2 Jan Safranek 2021-10-20 10:11:02 UTC
There is definitely a bug in cluster-storage-operator. It starts AzureFile CSI driver operator even without TechPreviewNoUpgrade FeatuteSet. AzureFile is in development right now, not ready for CI.

Comment 6 To Hung Sze 2021-10-23 14:29:35 UTC
I am able to get ASH cluster installed successfully with 4.10.0-0.nightly-2021-10-22-061826

Comment 9 errata-xmlrpc 2022-03-10 16:20:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.