Bug 2179618

Summary: Noobaa deployment stuck during ODF upgrade
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Soumi Mitra <smitra>
Component: odf-operatorAssignee: Nitin Goyal <nigoyal>
Status: CLOSED NOTABUG QA Contact: Elad <ebenahar>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.9CC: hnallurv, muagarwa, nbecker, ocs-bugs, odf-bz-bot, rafrojas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-30 05:19:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Soumi Mitra 2023-03-19 08:16:32 UTC
Description of problem (please be detailed as possible and provide log
snippests):

1] Issue during ODF upgrade. 
2] Noobaa core and db are reporting the incorrect images causing the storagecluster to be in a progressing state
3] Steps to rebuild noobaa from article : https://access.redhat.com/solutions/5948631 was run with no success
4] The actual and desired image of noobaaCore and noobaaDB are different

   images:
      ceph:
        actualImage: registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:2296c19fbd3a0be84d6030dff789ce3e79b38cc30c39f45913aec97967b65cce
        desiredImage: registry.redhat.io/rhceph/rhceph-5-rhel8@sha256:2296c19fbd3a0be84d6030dff789ce3e79b38cc30c39f45913aec97967b65cce
      noobaaCore:
        actualImage: registry.redhat.io/odf4/mcg-core-rhel8@sha256:fbb89796c5adfee97bf64d29f72ee40857d91f8c065c5b9b96bff40dbb4931aa
        desiredImage: registry.redhat.io/odf4/mcg-core-rhel8@sha256:04209248c71e5176e631644b47d37a8e28724961ae948af5ea239db66c356ebb
      noobaaDB:
        actualImage: registry.redhat.io/rhel8/postgresql-12@sha256:81d9bf20387ecfa85bf24dd53167242393075544a4368636a4bbda79d8769f49
        desiredImage: registry.redhat.io/rhel8/postgresql-12@sha256:78ed1e1f454c49664ae653b3d52af3d77ef1e9cad37a7b0fff09feeaa8294e01


Logs:


[sarreddy@scdvlmcd13 ~]$ oc get deployments -n openshift-storage
NAME                                                            READY   UP-TO-DATE   AVAILABLE   AGE
csi-cephfsplugin-provisioner                                    2/2     2            2           2y15d
csi-rbdplugin-provisioner                                       2/2     2            2           2y15d
noobaa-operator                                                 1/1     1            1           230d
ocs-metrics-exporter                                            1/1     1            1           2y16d
ocs-operator                                                    1/1     1            1           2y16d
odf-console                                                     1/1     1            1           230d
odf-operator-controller-manager                                 1/1     1            1           230d
rook-ceph-crashcollector-infra-0.jawwy-ocs-sit-dev.stc.com.sa   1/1     1            1           33h
rook-ceph-crashcollector-infra-1.jawwy-ocs-sit-dev.stc.com.sa   1/1     1            1           33h
rook-ceph-crashcollector-infra-2.jawwy-ocs-sit-dev.stc.com.sa   1/1     1            1           34h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a               1/1     1            1           2y15d
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b               1/1     1            1           2y15d
rook-ceph-mgr-a                                                 1/1     1            1           2y15d
rook-ceph-mon-e                                                 1/1     1            1           697d
rook-ceph-mon-f                                                 1/1     1            1           324d
rook-ceph-mon-g                                                 1/1     1            1           232d
rook-ceph-operator                                              1/1     1            1           2y16d
rook-ceph-osd-0                                                 1/1     1            1           2y15d
rook-ceph-osd-1                                                 1/1     1            1           2y15d
rook-ceph-osd-2                                                 1/1     1            1           2y15d
rook-ceph-osd-3                                                 1/1     1            1           2y15d
rook-ceph-osd-4                                                 1/1     1            1           2y15d
rook-ceph-osd-5                                                 1/1     1            1           2y15d
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a              1/1     1            1           2y15d
rook-ceph-tools                                                 1/1     1            1           2y15d
[sarreddy@scdvlmcd13 ~]$

[sarreddy@scdvlmcd13 ~]$ oc get storagecluster
NAME                 AGE     PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   2y15d   Progressing              2021-03-02T16:54:04Z   4.6.0
[sarreddy@scdvlmcd13 ~]$
[sarreddy@scdvlmcd13 ~]$
[sarreddy@scdvlmcd13 ~]$
[sarreddy@scdvlmcd13 ~]$ oc get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
csi-cephfsplugin-89jf9                                            3/3     Running   0          3h53m
csi-cephfsplugin-9dc8x                                            3/3     Running   0          3h55m
csi-cephfsplugin-dhm8q                                            3/3     Running   0          3h52m
csi-cephfsplugin-ffsbf                                            3/3     Running   0          3h56m
csi-cephfsplugin-m2jbx                                            3/3     Running   0          3h55m
csi-cephfsplugin-n2l7b                                            3/3     Running   0          3h55m
csi-cephfsplugin-provisioner-d8c5c5bc4-l6bpc                      6/6     Running   0          3h58m
csi-cephfsplugin-provisioner-d8c5c5bc4-llrtk                      6/6     Running   0          3h58m
csi-cephfsplugin-wmpt9                                            3/3     Running   0          3h58m
csi-cephfsplugin-xzsfc                                            3/3     Running   0          3h55m
csi-rbdplugin-bqs8j                                               3/3     Running   0          3h53m
csi-rbdplugin-hv99f                                               3/3     Running   0          3h54m
csi-rbdplugin-lh886                                               3/3     Running   0          3h53m
csi-rbdplugin-mmf56                                               3/3     Running   0          3h56m
csi-rbdplugin-p5lrf                                               3/3     Running   0          3h55m
csi-rbdplugin-provisioner-676c856bd4-c646z                        6/6     Running   0          3h58m
csi-rbdplugin-provisioner-676c856bd4-ftsff                        6/6     Running   0          3h58m
csi-rbdplugin-r2crc                                               3/3     Running   0          3h53m
csi-rbdplugin-v5x5s                                               3/3     Running   0          3h58m
csi-rbdplugin-vkbtk                                               3/3     Running   0          3h55m
noobaa-endpoint-fb4b66487-f8kzj                                   1/1     Running   0          33h
noobaa-operator-6977f79d8d-hm4qt                                  1/1     Running   0          160m
ocs-metrics-exporter-7fbbc99c96-fxgkh                             1/1     Running   0          4h
ocs-operator-54f644cfbf-mrxxj                                     1/1     Running   0          161m
odf-console-f54f898df-gj5l9                                       1/1     Running   0          27h
odf-operator-controller-manager-5c565c5b58-4cdxl                  2/2     Running   0          161m
rook-ceph-crashcollector-infra-0.jawwy-ocs-sit-dev.stc.com9zzmj   1/1     Running   0          33h
rook-ceph-crashcollector-infra-0.jawwy-ocs-sit-dev.stc.comjgrhv   1/1     Running   0          33h
rook-ceph-crashcollector-infra-0.jawwy-ocs-sit-dev.stc.comm5g5c   1/1     Running   0          3h58m
rook-ceph-crashcollector-infra-0.jawwy-ocs-sit-dev.stc.comx8xqq   1/1     Running   0          33h
rook-ceph-crashcollector-infra-1.jawwy-ocs-sit-dev.stc.com2fm2q   1/1     Running   0          33h
rook-ceph-crashcollector-infra-1.jawwy-ocs-sit-dev.stc.com42hfm   1/1     Running   0          33h
rook-ceph-crashcollector-infra-1.jawwy-ocs-sit-dev.stc.comcvmdr   1/1     Running   0          3h58m
rook-ceph-crashcollector-infra-1.jawwy-ocs-sit-dev.stc.comqzd88   1/1     Running   0          33h
rook-ceph-crashcollector-infra-1.jawwy-ocs-sit-dev.stc.comw8t8m   1/1     Running   0          33h
rook-ceph-crashcollector-infra-2.jawwy-ocs-sit-dev.stc.comcb7s9   1/1     Running   0          36h
rook-ceph-crashcollector-infra-2.jawwy-ocs-sit-dev.stc.comdggrj   1/1     Running   0          36h
rook-ceph-crashcollector-infra-2.jawwy-ocs-sit-dev.stc.comz5lhn   1/1     Running   0          3h58m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-57f867c7nxjcq   2/2     Running   0          33h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5c996df48kg96   2/2     Running   0          33h
rook-ceph-mgr-a-5946474f67-xjvxw                                  2/2     Running   0          33h
rook-ceph-mon-e-6989f7c778-mtcrz                                  2/2     Running   0          33h
rook-ceph-mon-f-65db4c474b-vmrmp                                  2/2     Running   0          33h
rook-ceph-mon-g-7d44688797-4x7hm                                  2/2     Running   0          36h
rook-ceph-operator-7df8bd997b-zhhsj                               1/1     Running   0          4h
rook-ceph-osd-0-66557bf8f4-s5c68                                  2/2     Running   0          36h
rook-ceph-osd-1-64fb9599d7-c5gs5                                  2/2     Running   0          33h
rook-ceph-osd-2-5f6cdb546f-jbbpz                                  2/2     Running   0          33h
rook-ceph-osd-3-74567c8494-dldwr                                  2/2     Running   0          34h
rook-ceph-osd-4-8cbf76749-529jx                                   2/2     Running   0          33h
rook-ceph-osd-5-b55b6d4b-vssjc                                    2/2     Running   0          33h
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-566fff4sk5z9   2/2     Running   0          33h
rook-ceph-tools-8579d5599c-hpbmr                                  1/1     Running   0          3h59m
[sarreddy@scdvlmcd13 ~]$



sarreddy@scdvlmcd13 ~]$ oc logs noobaa-operator-6977f79d8d-wqfm5
time="2023-03-19T03:39:40Z" level=info msg="CLI version: 5.9.2\n"
time="2023-03-19T03:39:40Z" level=info msg="noobaa-image: noobaa/noobaa-core:5.9.0\n"
time="2023-03-19T03:39:40Z" level=info msg="operator-image: noobaa/noobaa-operator:5.9.2\n"
I0319 03:39:41.268357       1 request.go:668] Waited for 1.032685445s due to client-side throttling, not priority and fairness, request: GET:https://10.29.128.1:443/apis/policy/v1?timeout=32s
[sarreddy@scdvlmcd13 ~]$


[sarreddy@scdvlmcd13 ~]$ oc logs ocs-operator-54f644cfbf-tmtkf
{"level":"info","ts":1679197183.7147424,"logger":"cmd","msg":"Go Version: go1.16.12"}
{"level":"info","ts":1679197183.7156096,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
I0319 03:39:44.766062       1 request.go:668] Waited for 1.037647659s due to client-side throttling, not priority and fairness, request: GET:https://10.29.128.1:443/apis/cert-manager.io/v1alpha2?timeout=32s
{"level":"info","ts":1679197187.7708323,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1679197187.7849748,"logger":"cmd","msg":"OCSInitialization resource already exists"}
{"level":"info","ts":1679197191.84642,"logger":"cmd","msg":"starting manager"}
I0319 03:39:51.846930       1 leaderelection.go:243] attempting to acquire leader lease openshift-storage/ab76f4c9.openshift.io...
{"level":"info","ts":1679197191.846939,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
I0319 03:40:09.270292       1 leaderelection.go:253] successfully acquired lease openshift-storage/ab76f4c9.openshift.io
{"level":"info","ts":1679197209.2705457,"logger":"controller-runtime.manager.controller.persistentvolume","msg":"Starting EventSource","reconciler group":"","reconciler kind":"PersistentVolume","source":"kind source: /, Kind="}
{"level":"info","ts":1679197209.270645,"logger":"controller-runtime.manager.controller.persistentvolume","msg":"Starting Controller","reconciler group":"","reconciler kind":"PersistentVolume"}
{"level":"info","ts":1679197209.2706227,"logger":"controller-runtime.manager.controller.ocsinitialization","msg":"Starting EventSource","reconciler group":"ocs.openshift.io","reconciler kind":"OCSInitialization","source":"kind source: /, Kind="}


[sarreddy@scdvlmcd13 ~]$ oc get events -n openshift-storage
LAST SEEN   TYPE      REASON               OBJECT                                                                MESSAGE
128m        Normal    LeaderElection       configmap/4fd470de.openshift.io                                       odf-operator-controller-manager-5c565c5b58-z7ch7_8a94ef37-495c-4478-a027-994b6b934bb7 became leader
128m        Normal    LeaderElection       lease/4fd470de.openshift.io                                           odf-operator-controller-manager-5c565c5b58-z7ch7_8a94ef37-495c-4478-a027-994b6b934bb7 became leader
128m        Normal    LeaderElection       configmap/ab76f4c9.openshift.io                                       ocs-operator-54f644cfbf-tmtkf_205bfda0-7165-47de-9fa4-b5ffa962c192 became leader
128m        Normal    LeaderElection       lease/ab76f4c9.openshift.io                                           ocs-operator-54f644cfbf-tmtkf_205bfda0-7165-47de-9fa4-b5ffa962c192 became leader
128m        Normal    InstallSucceeded     clusterserviceversion/mcg-operator.v4.9.11                            install strategy completed with no errors
128m        Warning   ComponentUnhealthy   clusterserviceversion/mcg-operator.v4.9.11                            installing: waiting for deployment noobaa-operator to become ready: deployment "noobaa-operator" not available: Deployment does not have minimum availability.
2m43s       Warning   FailedGetScale       horizontalpodautoscaler/noobaa-endpoint                               deployments/scale.apps "noobaa-endpoint" not found
131m        Normal    Killing              pod/noobaa-operator-6977f79d8d-hpmqz                                  Stopping container noobaa-operator
128m        Normal    Scheduled            pod/noobaa-operator-6977f79d8d-wqfm5                                  Successfully assigned openshift-storage/noobaa-operator-6977f79d8d-wqfm5 to worker-4.jawwy-ocs-sit-dev.stc.com.sa
128m        Normal    AddedInterface       pod/noobaa-operator-6977f79d8d-wqfm5                                  Add eth0 [10.29.19.29/23] from openshift-sdn
128m        Normal    Pulled               pod/noobaa-operator-6977f79d8d-wqfm5                                  Container image "registry.redhat.io/odf4/mcg-rhel8-operator@sha256:376cdd149b926d12575caa90f391f1f5d221eee545a79ffedafca2d4fa296d00" already present on machine
128m        Normal    Created              pod/noobaa-operator-6977f79d8d-wqfm5                                  Created container noobaa-operator
128m        Normal    Started              pod/noobaa-operator-6977f79d8d-wqfm5                                  Started container noobaa-operator
131m        Normal    SuccessfulDelete     replicaset/noobaa-operator-6977f79d8d                                 Deleted pod: noobaa-operator-6977f79d8d-hpmqz
128m        Normal    SuccessfulCreate     replicaset/noobaa-operator-6977f79d8d                                 Created pod: noobaa-operator-6977f79d8d-wqfm5
128m        Normal    ScalingReplicaSet    deployment/noobaa-operator                                            Scaled up replica set noobaa-operator-6977f79d8d to 1
131m        Normal    ScalingReplicaSet    deployment/noobaa-operator                                            Scaled down replica set noobaa-operator-6977f79d8d to 0
131m        Normal    Killing              pod/ocs-operator-54f644cfbf-9qpqs                                     Stopping container ocs-operator
128m        Normal    Scheduled            pod/ocs-operator-54f644cfbf-tmtkf                                     Successfully assigned openshift-storage/ocs-operator-54f644cfbf-tmtkf to worker-4.jawwy-ocs-sit-dev.stc.com.sa
128m        Normal    AddedInterface       pod/ocs-operator-54f644cfbf-tmtkf                                     Add eth0 [10.29.19.28/23] from openshift-sdn
128m        Normal    Pulling              pod/ocs-operator-54f644cfbf-tmtkf                                     Pulling image "registry.redhat.io/odf4/ocs-rhel8-operator@sha256:0b3bb17855057eab6612881b4dccabf472d64cfb5ec8b3db7a799c5ce60e9f29"
128m        Normal    Pulled               pod/ocs-operator-54f644cfbf-tmtkf                                     Successfully pulled image "registry.redhat.io/odf4/ocs-rhel8-operator@sha256:0b3bb17855057eab6612881b4dccabf472d64cfb5ec8b3db7a799c5ce60e9f29" in 4.207312021s
128m        Normal    Created              pod/ocs-operator-54f644cfbf-tmtkf                                     Created container ocs-operator
128m        Normal    Started              pod/ocs-operator-54f644cfbf-tmtkf                                     Started container ocs-operator
131m        Normal    SuccessfulDelete     replicaset/ocs-operator-54f644cfbf                                    Deleted pod: ocs-operator-54f644cfbf-9qpqs
128m        Normal    SuccessfulCreate     replicaset/ocs-operator-54f644cfbf                                    Created pod: ocs-operator-54f644cfbf-tmtkf
128m        Normal    ScalingReplicaSet    deployment/ocs-operator                                               Scaled up replica set ocs-operator-54f644cfbf to 1
131m        Normal    ScalingReplicaSet    deployment/ocs-operator                                               Scaled down replica set ocs-operator-54f644cfbf to 0


128m        Warning   ComponentUnhealthy   clusterserviceversion/ocs-operator.v4.9.11                            installing: waiting for deployment ocs-operator to become ready: deployment "ocs-operator" not available: Deployment does not have minimum availability.
128m        Normal    NeedsReinstall       clusterserviceversion/ocs-operator.v4.9.11                            installing: waiting for deployment ocs-operator to become ready: deployment "ocs-operator" not available: Deployment does not have minimum availability.
7m48s       Warning   ReconcileFailed      cephcluster/ocs-storagecluster-cephcluster                            failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed the ceph version check: failed to complete ceph version job: failed to run CmdReporter rook-ceph-detect-version successfully. failed to delete existing results ConfigMap rook-ceph-detect-version. failed to delete ConfigMap rook-ceph-detect-version. gave up waiting after 20 retries every 2ns seconds. <nil>



Version of all relevant components (if applicable):
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.53   True        False         44h     Cluster version is 4.10.53


NAME                            DISPLAY                            VERSION   REPLACES                        PHASE
elasticsearch-operator.5.5.8    OpenShift Elasticsearch Operator   5.5.8     elasticsearch-operator.5.2.13   Succeeded
mcg-operator.v4.9.11            NooBaa Operator                    4.9.11    mcg-operator.v4.9.10            Succeeded
nginx-ingress-operator.v1.1.0   Nginx Ingress Operator             1.1.0                                     Installing
ocs-operator.v4.9.11            OpenShift Container Storage        4.9.11    ocs-operator.v4.9.10            Succeeded
odf-operator.v4.9.11            OpenShift Data Foundation          4.9.11    odf-operator.v4.9.10            Succeeded


Does this issue impact your ability to continue to work with the product
Yes, customer is unable to use their cluster and start application


Is there any workaround available to the best of your knowledge?
Workaround to rebuild noobaa was run with no success

Rate from 1 - 5 the complexity of the scenario you performed that caused this
4


Can this issue reproducible?
N/A


Can this issue reproduce from the UI?
No

If this is a regression, please provide more details to justify this:
N/A

Steps to Reproduce:
N/A


Actual results:
Noobaa deployment stuck during ODF upgrade


Expected results:
Noobaa deployment should be successful along with ODF upgrade

Additional info:

Comment 7 Nitin Goyal 2023-03-30 05:19:10 UTC
Closing it as per comment 6