Bug 2069075 - [Alibaba 4.11.0-0.nightly] cluster storage component in Progressing state
Summary: [Alibaba 4.11.0-0.nightly] cluster storage component in Progressing state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.11
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: 4.11.0
Assignee: Jan Safranek
QA Contact: Rohit Patil
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-28 08:18 UTC by Rohit Patil
Modified: 2022-08-10 11:02 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:02:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift alibaba-cloud-csi-driver pull 11 0 None open Bug 2069075: Add explicit pciutils package 2022-03-28 09:18:17 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 11:02:47 UTC

Description Rohit Patil 2022-03-28 08:18:44 UTC
Description of problem:
Alibaba cluster storage component is in Progressing stage with error as: 
"AlibabaDiskCSIDriverOperatorCRAvailable: AlibabaCloudDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service"

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-03-27-140854

How reproducible:
Always 

Steps to Reproduce:
1. Install cluster via flexy job. 
2. https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/89082/console
3. Check the cluster version and components status.  

Actual results:
rohitpatil@ropatil-mac Downloads % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-27-140854   True        False         3h17m   Error while reconciling 4.11.0-0.nightly-2022-03-27-140854: the cluster operator storage has not yet successfully rolled out

rohitpatil@ropatil-mac Downloads % oc get co storage
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
storage   4.11.0-0.nightly-2022-03-27-140854   False       True          False      8m17s   AlibabaDiskCSIDriverOperatorCRAvailable: AlibabaCloudDriverNodeServiceControllerAvailable: Waiting for the DaemonSet to deploy the CSI Node Service

# Pods in status: CrashLoopBackOff 
NAME                                                 READY   STATUS             RESTARTS         AGE
alibaba-disk-csi-driver-controller-8db6d86ff-bk8bv   10/10   Running            0                3h42m
alibaba-disk-csi-driver-controller-8db6d86ff-zfmhr   10/10   Running            0                3h41m
alibaba-disk-csi-driver-node-glmb4                   2/3     CrashLoopBackOff   46 (5m4s ago)    3h36m
alibaba-disk-csi-driver-node-hq57z                   2/3     CrashLoopBackOff   48 (69s ago)     3h41m
alibaba-disk-csi-driver-node-ncgrf                   2/3     CrashLoopBackOff   48 (84s ago)     3h41m
alibaba-disk-csi-driver-node-rg7q8                   2/3     CrashLoopBackOff   47 (52s ago)     3h37m
alibaba-disk-csi-driver-node-t47c8                   2/3     CrashLoopBackOff   48 (36s ago)     3h41m
alibaba-disk-csi-driver-node-xbcs2                   2/3     CrashLoopBackOff   46 (2m17s ago)   3h33m
alibaba-disk-csi-driver-operator-67d49bd48c-tlr94    1/1     Running            0                3h42m

#CSI driver logs: 
W0328 08:12:03.133236  301365 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2022-03-28T08:12:03Z" level=info msg="Not found configmap named as csi-plugin under kube-system, with: configmaps \"csi-plugin\" is forbidden: User \"system:serviceaccount:openshift-cluster-csi-drivers:alibaba-disk-csi-driver-node-sa\" cannot get resource \"configmaps\" in API group \"\" in the namespace \"kube-system\""
time="2022-03-28T08:12:03Z" level=info msg="AD-Controller is enabled by Env(true), CSI Disk Plugin running in AD Controller mode."
time="2022-03-28T08:12:03Z" level=info msg="AD-Controller is enabled, CSI Disk Plugin running in AD Controller mode."
time="2022-03-28T08:12:03Z" level=error msg="Describe node wduan-0328a-al-pd7bp-master-1 with error: nodes \"wduan-0328a-al-pd7bp-master-1\" is forbidden: User \"system:serviceaccount:openshift-cluster-csi-drivers:alibaba-disk-csi-driver-node-sa\" cannot get resource \"nodes\" in API group \"\" at the cluster scope"
time="2022-03-28T08:12:03Z" level=info msg="Starting with GlobalConfigVar: region(us-east-1), NodeID(i-0xihk9cwhslvbcw1qi2o), ADControllerEnable(true), DiskTagEnable(false), DiskBdfEnable(false), MetricEnable(true), RunTimeClass(runc), DetachDisabled(false), DetachBeforeDelete(true), ClusterID()"
time="2022-03-28T08:12:03Z" level=info msg="NewNodeServer: MAX_VOLUMES_PERNODE is set to(not default): 15"
W0328 08:12:03.151204  301365 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2022-03-28T08:12:03Z" level=fatal msg="[IsVFNode] lspci -D: cmd: lspci, stdout: , stderr: , err: exec: \"lspci\": executable file not found in $PATH"

Expected results:
Storage component should not be in Progressing state.

Comment 1 Jan Safranek 2022-03-28 09:10:33 UTC
Someone has removed pciutils from 4.11 base image.

Comment 3 Rohit Patil 2022-03-29 06:08:50 UTC
Waiting for Accepted/Successful build.

Comment 5 Rohit Patil 2022-03-31 05:50:04 UTC
Verified.
 Status: PASS
 Payload: 4.11.0-0.nightly-2022-03-29-152521
 
 Executed dynamic provisioning test case wrt block
 
OP:
rohitpatil@ropatil-mac Downloads % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-29-152521   True        False         16m     Cluster version is 4.11.0-0.nightly-2022-03-29-152521

rohitpatil@ropatil-mac Downloads % oc get co storage
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
storage   4.11.0-0.nightly-2022-03-29-152521   True        False         False      33m  

#Pods running successfully
NAME                                                  READY   STATUS    RESTARTS   AGE
alibaba-disk-csi-driver-controller-645fd59565-2vr69   10/10   Running   0          54m
alibaba-disk-csi-driver-controller-645fd59565-pph8f   10/10   Running   0          54m
alibaba-disk-csi-driver-node-2hv95                    3/3     Running   0          46m
alibaba-disk-csi-driver-node-7gqf5                    3/3     Running   0          45m
alibaba-disk-csi-driver-node-b5qn7                    3/3     Running   0          54m
alibaba-disk-csi-driver-node-mwwnm                    3/3     Running   0          54m
alibaba-disk-csi-driver-node-r5x9z                    3/3     Running   0          49m
alibaba-disk-csi-driver-node-vl8g7                    3/3     Running   0          54m
alibaba-disk-csi-driver-operator-7b8dd46454-jbghx     1/1     Running   0          54m

rohitpatil@ropatil-mac shell_block % oc get pvc,pod -n testdisk -o wide
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE     VOLUMEMODE
persistentvolumeclaim/block-pvc   Bound    pvc-a5a3d9c8-437d-45cf-8c3f-53d25d6ae614   20Gi       RWO            csi-disk       3m51s   Block

NAME                             READY   STATUS    RESTARTS   AGE     IP            NODE                                              NOMINATED NODE   READINESS GATES
pod/mydep-csi-659ff8cf7c-69s8t   1/1     Running   0          3m50s   10.129.2.14   ropatil-313alinew-m6947-worker-us-east-1b-88nr9   <none>           <none>
rohitpatil@ropatil-mac shell_block % oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                STORAGECLASS   REASON   AGE
pvc-a5a3d9c8-437d-45cf-8c3f-53d25d6ae614   20Gi       RWO            Delete           Bound    testdisk/block-pvc   csi-disk                3m52s

Comment 8 errata-xmlrpc 2022-08-10 11:02:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.