Bug 2015635

Summary: Storage operator fails causing installation to fail on ASH
Product: OpenShift Container Platform Reporter: To Hung Sze <tsze>
Component: StorageAssignee: Jan Safranek <jsafrane>
Storage sub component: Storage QA Contact: Wei Duan <wduan>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, jsafrane
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:20:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
must-gather none

Description To Hung Sze 2021-10-19 18:00:49 UTC
Created attachment 1834737 [details]
must-gather

Created attachment 1834737 [details]
must-gather

Description of problem:
Installation of UPI ASH fails because of storage opeartor.

Version-Release number of selected component (if applicable):
openshift-install-linux-4.10.16-173656

How reproducible: Always

Steps to Reproduce:
1. Install 4.10 UPI ASH following:
https://deploy-preview-36950--osdocs.netlify.app/openshift-enterprise/latest/installing/installing_azure_stack_hub/installing-azure-stack-hub-user-infra.html#installation-creating-azure-dns_installing-azure-stack-hub-user-infra


Actual result:

Installation fails with
ERROR Cluster operator storage Degraded is True with AzureFileCSIDriverOperatorCR_AzureFileDriverControllerServiceController_SyncError::AzureFileDriverNodeServiceController_SyncError::AzureFileDriverStaticResourcesController_SyncError: AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverControllerServiceControllerDegraded: Deployment.apps "azure-file-csi-driver-controller" is invalid: spec.template.spec.initContainers[0].image: Required value 
ERROR AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverNodeServiceControllerDegraded: DaemonSet.apps "azure-file-csi-driver-node" is invalid: spec.template.spec.initContainers[0].image: Required value 
ERROR AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: "rbac/csi_driver_role.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" is forbidden: user "system:serviceaccount:openshift-cluster-csi-drivers:azure-file-csi-driver-operator" (groups=["system:serviceaccounts" "system:serviceaccounts:openshift-cluster-csi-drivers" "system:authenticated"]) is attempting to grant RBAC permissions not currently held: 
ERROR AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: {APIGroups:[""], Resources:["secrets"], Verbs:["create" "update" "delete" "patch"]} 
ERROR AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: "rbac/csi_driver_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" not found 
ERROR AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded:  
INFO Cluster operator storage Progressing is True with AzureFileCSIDriverOperatorCR_WaitForOperator: AzureFileCSIDriverOperatorCRProgressing: Waiting for AzureFile operator to report status 
INFO Cluster operator storage Available is False with AzureFileCSIDriverOperatorCR_WaitForOperator: AzureFileCSIDriverOperatorCRAvailable: Waiting for AzureFile operator to report status 
ERROR Cluster initialization failed because one or more operators are not functioning properly. 
ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, 
ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html 
ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation 
FATAL failed to initialize the cluster: Cluster operator storage is not available 


All operators are available except:
   
storage                                    4.10.0-0.nightly-2021-10-16-173656   False       True          True       100m    AzureFileCSIDriverOperatorCRAvailable: Waiting for AzureFile operator to report status


Must gather reports
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: 1b19050b-36d9-48e4-b668-c03c29d693b3
ClusterVersion: Installing "4.10.0-0.nightly-2021-10-16-173656" for 2 hours: Unable to apply 4.10.0-0.nightly-2021-10-16-173656: the cluster operator storage has not yet successfully rolled out
ClusterOperators:
	clusteroperator/storage is not available (AzureFileCSIDriverOperatorCRAvailable: Waiting for AzureFile operator to report status) because AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverControllerServiceControllerDegraded: Deployment.apps "azure-file-csi-driver-controller" is invalid: spec.template.spec.initContainers[0].image: Required value
AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverNodeServiceControllerDegraded: DaemonSet.apps "azure-file-csi-driver-node" is invalid: spec.template.spec.initContainers[0].image: Required value
AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: "rbac/csi_driver_role.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" is forbidden: user "system:serviceaccount:openshift-cluster-csi-drivers:azure-file-csi-driver-operator" (groups=["system:serviceaccounts" "system:serviceaccounts:openshift-cluster-csi-drivers" "system:authenticated"]) is attempting to grant RBAC permissions not currently held:
AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: {APIGroups:[""], Resources:["secrets"], Verbs:["create" "update" "delete" "patch"]}
AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: "rbac/csi_driver_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" not found
AzureFileCSIDriverOperatorCRDegraded: AzureFileDriverStaticResourcesControllerDegraded: 



Expected results:
Installation completes (same process completes with 4.9.0)


Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:
Happens on dev CI as well.

Comment 1 Jan Safranek 2021-10-20 10:05:53 UTC
AzureFile CSI driver operator gets degraded. ClusterCSIDriver file.csi.azure.com.yaml from the must-gather:

  - lastTransitionTime: "2021-10-19T16:02:36Z"
    message: 'DaemonSet.apps "azure-file-csi-driver-node" is invalid: spec.template.spec.initContainers[0].image:
      Required value'
    reason: SyncError
    status: "True"
    type: AzureFileDriverNodeServiceControllerDegraded
  - lastTransitionTime: "2021-10-19T16:02:36Z"
    message: 'Deployment.apps "azure-file-csi-driver-controller" is invalid: spec.template.spec.initContainers[0].image:
      Required value'
    reason: SyncError
    status: "True"
    type: AzureFileDriverControllerServiceControllerDegraded
  - lastTransitionTime: "2021-10-19T16:02:36Z"
    status: "False"
    type: ConfigObservationDegraded
  - lastTransitionTime: "2021-10-19T16:02:41Z"
    message: |
      "rbac/csi_driver_role.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" is forbidden: user "system:serviceaccount:openshift-cluster-csi-drivers:azure-file-csi-driver-operator" (groups=["system:serviceaccounts" "system:serviceaccounts:openshift-cluster-csi-drivers" "system:authe
nticated"]) is attempting to grant RBAC permissions not currently held:
      {APIGroups:[""], Resources:["secrets"], Verbs:["create" "update" "delete" "patch"]}
      "rbac/csi_driver_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "azure-file-csi-driver-role" not found
    reason: SyncError
    status: "True"
    type: AzureFileDriverStaticResourcesControllerDegraded
  readyReplicas: 0

Comment 2 Jan Safranek 2021-10-20 10:11:02 UTC
There is definitely a bug in cluster-storage-operator. It starts AzureFile CSI driver operator even without TechPreviewNoUpgrade FeatuteSet. AzureFile is in development right now, not ready for CI.

Comment 6 To Hung Sze 2021-10-23 14:29:35 UTC
I am able to get ASH cluster installed successfully with 4.10.0-0.nightly-2021-10-22-061826

Comment 9 errata-xmlrpc 2022-03-10 16:20:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056