Bug 1912894 - OCS storagecluster is Progressing state and some noobaa pods missing with latest 4.7 build -4.7.0-223.ci and storagecluster reflected as 4.8.0 instead of 4.7.0
Summary: OCS storagecluster is Progressing state and some noobaa pods missing with lat...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: Multi-Cloud Object Gateway
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: OCS 4.7.0
Assignee: Nimrod Becker
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-05 14:30 UTC by Neha Berry
Modified: 2021-09-29 07:29 UTC (History)
6 users (show)

Fixed In Version: 4.7.0-228.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-19 09:17:18 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:2041 0 None None None 2021-05-19 09:17:45 UTC

Description Neha Berry 2021-01-05 14:30:14 UTC
Description of problem (please be detailed as possible and provide log
snippests):
------------------------------------------------------------------------
OCS storagecluster in Progressing state and few of the noobaa pods are missing in the latest OCS 4.7 build. Observed same issue in both Internal(build email- ocs-ci) and Internal-attached(manually deployed on AWS I3 via UI) deployments



$ oc get csv -A
NAMESPACE                              NAME                                           DISPLAY                       VERSION                 REPLACES   PHASE
openshift-storage                      ocs-operator.v4.7.0-223.ci                     OpenShift Container Storage   4.7.0-223.ci                       Succeeded

$ oc get storagecluster -A
NAMESPACE           NAME                 AGE     PHASE         EXTERNAL   CREATED AT             VERSION
openshift-storage   ocs-storagecluster   6h28m   Progressing              2021-01-05T07:49:08Z   4.8.0

It is seen that noobaa db and endpoint pods are missing

$ oc get pods -o wide -A|grep noobaa
openshift-storage                                  noobaa-core-0                                                         1/1     Running     0          6h29m   10.129.2.36    ip-10-0-222-187.us-east-2.compute.internal   <none>           <none>
openshift-storage                                  noobaa-operator-9f697d45c-9l8cq                                       1/1     Running     0          6h44m   10.129.2.24    ip-10-0-222-187.us-east-2.compute.internal   <none>           <none>




Version of all relevant components (if applicable):
===================================================

OCP = 4.7.0-0.nightly-2021-01-04-215816
OCS = ocs-operator.v4.7.0-223.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
------------------------------------------------------------------
Yes the deployment of OCS is failing

Is there any workaround available to the best of your knowledge?
----------------------------------------------
Not sure

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1  very simple, 5 - very complex)?
------------------------------------------
3


Can this issue reproducible?
------------------------------
Yes

Can this issue reproduce from the UI?
-----------------------------------
Yes we tested both ocs-ci based deployments [1] and manual deployment on AWS I3 via UI


If this is a regression, please provide more details to justify this:
====================================================================
Yes

Steps to Reproduce:
======================
1. Install OCP 4.7
2. Deploy OCS build 4.7.0-223 either via ocs-ci or manually from UI
3. 


Actual results:
---------------------
Storagecluster is stuck in Progressing state and some noobaa pods(db and endpoint) are not created

Expected results:
=====================
Storagecluster creation should succeed


build emails and run details
================================

Email - A new OCS 4.7 build is available: ocs-registry:4.7.0-223.ci
[1] - OCS CI run: https://storage-jenkins-csb-ceph.cloud.paas.psi.redhat.com/job/ocs-ci/188/

Manual install on AWS I3
[2] -LSO from UI :  https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/16013/



Additional info:
======================
Outputs from LSO cluster

 $ oc describe storagecluster ocs-storagecluster -n openshift-storage
Status:
  Conditions:
    Last Heartbeat Time:   2021-01-05T14:18:34Z
    Last Transition Time:  2021-01-05T07:49:10Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  ReconcileComplete
    Last Heartbeat Time:   2021-01-05T07:49:16Z
    Last Transition Time:  2021-01-05T07:49:11Z
    Message:               CephCluster resource is not reporting status
    Reason:                CephClusterStatus
    Status:                False
    Type:                  Available
    Last Heartbeat Time:   2021-01-05T14:18:34Z
    Last Transition Time:  2021-01-05T07:49:11Z
    Message:               Waiting on Nooba instance to finish initialization
    Reason:                NoobaaInitializing
    Status:                True
    Type:                  Progressing
    Last Heartbeat Time:   2021-01-05T07:49:10Z
    Last Transition Time:  2021-01-05T07:49:08Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                False
    Type:                  Degraded
    Last Heartbeat Time:   2021-01-05T07:50:40Z
    Last Transition Time:  2021-01-05T07:49:11Z
    Message:               CephCluster is creating: Cluster is creating
    Reason:                ClusterStateCreating
    Status:                False
    Type:                  Upgradeable


POD list from a previous build where the db and endpoint pods were created
===========================================================================

Run ID = https://storage-jenkins-csb-ceph.cloud.paas.psi.redhat.com/job/ocs-ci/187/


NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.7.0-222.ci   OpenShift Container Storage   4.7.0-222.ci              Succeeded


$NAME                                                              READY   STATUS      RESTARTS   AGE   IP             NODE                                         NOMINATED NODE   READINESS GATES
csi-cephfsplugin-kkrmk                                            3/3     Running     0          37m   10.0.142.40    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>
csi-cephfsplugin-lnh56                                            3/3     Running     0          37m   10.0.166.255   ip-10-0-166-255.us-west-1.compute.internal   <none>           <none>
csi-cephfsplugin-provisioner-5c5b96fb84-ttbbp                     6/6     Running     0          37m   10.128.2.17    ip-10-0-234-224.us-west-1.compute.internal   <none>           <none>
csi-cephfsplugin-provisioner-5c5b96fb84-xgblr                     6/6     Running     0          37m   10.131.0.33    ip-10-0-166-255.us-west-1.compute.internal   <none>           <none>
csi-cephfsplugin-wmvz4                                            3/3     Running     0          37m   10.0.234.224   ip-10-0-234-224.us-west-1.compute.internal   <none>           <none>
csi-rbdplugin-2fj66                                               3/3     Running     0          37m   10.0.142.40    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>
csi-rbdplugin-p48bq                                               3/3     Running     0          37m   10.0.166.255   ip-10-0-166-255.us-west-1.compute.internal   <none>           <none>
csi-rbdplugin-pkp2l                                               3/3     Running     0          37m   10.0.234.224   ip-10-0-234-224.us-west-1.compute.internal   <none>           <none>
csi-rbdplugin-provisioner-55c8b8c747-5kq9d                        6/6     Running     0          37m   10.128.2.16    ip-10-0-234-224.us-west-1.compute.internal   <none>           <none>
csi-rbdplugin-provisioner-55c8b8c747-m5klz                        6/6     Running     0          37m   10.129.2.14    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>
noobaa-core-0                                                     1/1     Running     0          34m   10.131.0.36    ip-10-0-166-255.us-west-1.compute.internal   <none>           <none>
noobaa-db-pg-0                                                    1/1     Running     0          34m   10.129.2.20    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>
noobaa-endpoint-5f4665cb7c-nldm2                                  1/1     Running     0          32m   10.129.2.23    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>
noobaa-endpoint-5f4665cb7c-tktnw                                  1/1     Running     0          16m   10.128.2.26    ip-10-0-234-224.us-west-1.compute.internal   <none>           <none>
noobaa-operator-f7cf9598d-j74p8                                   1/1     Running     0          38m   10.128.2.14    ip-10-0-234-224.us-west-1.compute.internal   <none>           <none>
ocs-metrics-exporter-8cd4d6857-m29kc                              1/1     Running     0          38m   10.131.0.31    ip-10-0-166-255.us-west-1.compute.internal   <none>           <none>
ocs-operator-869fc6777c-4b4mj                                     1/1     Running     0          37m   10.131.0.32    ip-10-0-166-255.us-west-1.compute.internal   <none>           <none>
rook-ceph-crashcollector-ip-10-0-142-40-65c46c9f49-tjlhj          1/1     Running     0          35m   10.129.2.17    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>
rook-ceph-crashcollector-ip-10-0-166-255-75c48f9cfc-br2hr         1/1     Running     0          36m   10.131.0.37    ip-10-0-166-255.us-west-1.compute.internal   <none>           <none>
rook-ceph-crashcollector-ip-10-0-234-224-694b5b7dc5-nhtjw         1/1     Running     0          35m   10.128.2.19    ip-10-0-234-224.us-west-1.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5b944987wtkq2   1/1     Running     0          34m   10.128.2.22    ip-10-0-234-224.us-west-1.compute.internal   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5b6c5955zr8rr   1/1     Running     0          34m   10.131.0.38    ip-10-0-166-255.us-west-1.compute.internal   <none>           <none>
rook-ceph-mgr-a-5668bd7756-g6dpc                                  1/1     Running     0          35m   10.129.2.16    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>
rook-ceph-mon-a-6cfdf59957-9l7dv                                  1/1     Running     0          36m   10.131.0.34    ip-10-0-166-255.us-west-1.compute.internal   <none>           <none>
rook-ceph-mon-b-5bbf66dc8-gsxqx                                   1/1     Running     0          35m   10.129.2.15    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>
rook-ceph-mon-c-85c47768ff-65lpd                                  1/1     Running     0          35m   10.128.2.18    ip-10-0-234-224.us-west-1.compute.internal   <none>           <none>
rook-ceph-operator-5bd49bb764-2xvbp                               1/1     Running     0          38m   10.129.2.13    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>
rook-ceph-osd-0-58d796f464-458f7                                  1/1     Running     0          34m   10.131.0.39    ip-10-0-166-255.us-west-1.compute.internal   <none>           <none>
rook-ceph-osd-1-5df77df6cf-5wt4f                                  1/1     Running     0          34m   10.128.2.21    ip-10-0-234-224.us-west-1.compute.internal   <none>           <none>
rook-ceph-osd-2-848b5d8858-ck7k9                                  1/1     Running     0          34m   10.129.2.19    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-0-data-0-ll2l8-7md26          0/1     Completed   0          35m   10.129.2.18    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-1-data-0-62mjc-4x9ws          0/1     Completed   0          35m   10.131.0.35    ip-10-0-166-255.us-west-1.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-2-data-0-pzwqz-k6zsr          0/1     Completed   0          34m   10.128.2.20    ip-10-0-234-224.us-west-1.compute.internal   <none>           <none>
rook-ceph-tools-8575486ffd-p77ks                                  1/1     Running     0          33m   10.0.142.40    ip-10-0-142-40.us-west-1.compute.internal    <none>           <none>

Comment 4 Neha Berry 2021-01-05 14:43:28 UTC
@

Comment 7 Mudit Agarwal 2021-01-05 15:43:27 UTC
Umanga, can you please check https://bugzilla.redhat.com/show_bug.cgi?id=1912894#c5

Comment 14 errata-xmlrpc 2021-05-19 09:17:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041


Note You need to log in before you can comment on or make changes to this bug.