Bug 1818147
Summary: | CSV: ocs-operator.v4.3.0-379.ci is stuck in installing phase after upgrade from 4.2.2 live content | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Petr Balogh <pbalogh> | ||||
Component: | Multi-Cloud Object Gateway | Assignee: | Danny <dzaken> | ||||
Status: | CLOSED ERRATA | QA Contact: | Petr Balogh <pbalogh> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 4.3 | CC: | dzaken, etamir, jarrpa, madam, nbecker, ocs-bugs, owasserm, shmohan, sostapov, tnielsen | ||||
Target Milestone: | --- | Keywords: | Automation, Regression, Upgrades | ||||
Target Release: | OCS 4.3.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | 4.3.0-rc5 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-04-14 09:48:30 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Petr Balogh
2020-03-27 20:19:56 UTC
Created attachment 1674175 [details]
Logs from operators and few other files about cluster
Some yaml files I collected from my queries about csv, subscription, logs and so on.
I don't see what would be stuck in installing status. The upgrade seems to be complete from the OCS and Rook perspective. @Jose do you see what status is causing this issue? In the StorageCluster status the reconcile indicates it is completed: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-ua/jnk-ai3c33-ua_20200327T132049/logs/failed_testcase_ocs_logs_1585323953/test_upgrade_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-4af88c1c34068413c3e838c4e1156937a44c0b53c6fca26a85b08c3f4280d368/storagecluster.yaml - lastHeartbeatTime: "2020-03-27T17:13:09Z" lastTransitionTime: "2020-03-27T14:16:20Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: ReconcileComplete The rook operator was upgraded and completed the upgrade of all the ceph components The status on the CephCluster shows Completed http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-ua/jnk-ai3c33-ua_20200327T132049/logs/failed_testcase_ocs_logs_1585323953/test_upgrade_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-4af88c1c34068413c3e838c4e1156937a44c0b53c6fca26a85b08c3f4280d368/ceph/namespaces/openshift-storage/ceph.rook.io/cephclusters/ocs-storagecluster-cephcluster.yaml I would look at the olm-operator logs. We should be able to get those from a standard OCP must-gather. Initial analysis from Jrivera, captured from gchat ============================= Looking at the storagecluster.yaml: - lastHeartbeatTime: "2020-03-27T17:13:09Z" lastTransitionTime: "2020-03-27T16:23:20Z" message: Waiting on Nooba instance to finish initialization reason: NoobaaInitializing status: "True" type: Progressing From noobaa.yaml: conditions: - lastHeartbeatTime: "2020-03-27T14:19:42Z" lastTransitionTime: "2020-03-27T16:24:24Z" message: Cannot read property 'email' of undefined reason: TemporaryError status: "False" type: Available - lastHeartbeatTime: "2020-03-27T14:19:42Z" lastTransitionTime: "2020-03-27T16:24:24Z" message: Cannot read property 'email' of undefined reason: TemporaryError status: "True" type: Progressing - lastHeartbeatTime: "2020-03-27T14:19:42Z" lastTransitionTime: "2020-03-27T14:19:42Z" message: Cannot read property 'email' of undefined reason: TemporaryError status: "False" type: Degraded - lastHeartbeatTime: "2020-03-27T14:19:42Z" lastTransitionTime: "2020-03-27T16:24:24Z" message: Cannot read property 'email' of undefined reason: TemporaryError status: "False" type: Upgradeable observedGeneration: 3 phase: Configuring From the noobaa-operator logs: time="2020-03-27T17:13:45Z" level=info msg="âœˆï¸ RPC: system.read_system() Request: <nil>" time="2020-03-27T17:13:45Z" level=error msg="âš ï¸ RPC: system.read_system() Response Error: Code=INTERNAL Message=Cannot read property 'email' of undefined" time="2020-03-27T17:13:45Z" level=error msg="failed to read system info: Cannot read property 'email' of undefined" sys=openshift-storage/noobaa time="2020-03-27T17:13:45Z" level=info msg="SetPhase: temporary error during phase \"Configuring\"" sys=openshift-storage/noobaa time="2020-03-27T17:13:45Z" level=warning msg="â³ Temporary Error: Cannot read property 'email' of undefined" sys=openshift-storage/noobaa In this run somehow it went fine ===>> http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-ai3c33-ua/jnk-ai3c33-ua_20200327T200206/logs/failed_testcase_ocs_logs_1585342169/test_upgrade_ocs_logs/ . Linking must gather so that we can make sure what is the diff b/w earlier run. Output from the above run https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/6069/, Courtesy @pbalogh. $ oc get pod -n openshift-storage NAME READY STATUS RESTARTS AGE csi-cephfsplugin-2gkwd 3/3 Running 0 40m csi-cephfsplugin-8z54x 3/3 Running 0 40m csi-cephfsplugin-provisioner-65b59d9dc9-7l2r7 5/5 Running 0 40m csi-cephfsplugin-provisioner-65b59d9dc9-87qcm 5/5 Running 0 40m csi-cephfsplugin-x5wpx 3/3 Running 0 40m csi-rbdplugin-62bbp 3/3 Running 0 40m csi-rbdplugin-9slh5 3/3 Running 0 40m csi-rbdplugin-provisioner-86c8bc888d-8kqmk 5/5 Running 0 40m csi-rbdplugin-provisioner-86c8bc888d-p58hj 5/5 Running 0 41m csi-rbdplugin-tqkbz 3/3 Running 0 40m lib-bucket-provisioner-55f74d96f6-8ll4m 1/1 Running 0 86m noobaa-core-0 1/1 Running 0 40m noobaa-db-0 1/1 Running 0 40m noobaa-endpoint-64666986b5-r25bb 1/1 Running 0 39m noobaa-operator-b77ccff86-6gcck 1/1 Running 0 41m ocs-operator-6dd9fd9d8d-ttzp7 0/1 Running 0 41m pod-test-cephfs-418a20805bb742b9af40192feb660a30 1/1 Running 0 78m rook-ceph-crashcollector-ip-10-0-140-120-7bd8c65c8d-n8bgx 1/1 Running 0 35m rook-ceph-crashcollector-ip-10-0-149-101-74c96b48f4-mtvfl 1/1 Running 0 35m rook-ceph-crashcollector-ip-10-0-165-13-6cc484f7c6-sd8x2 1/1 Running 0 35m rook-ceph-drain-canary-060f6bb0e5cb84732c4e59ceb119fdf4-76w7kw5 1/1 Running 0 34m rook-ceph-drain-canary-8ecf54a1e4f30f0a8d3f1911a74f1b81-57lb2vl 1/1 Running 0 36m rook-ceph-drain-canary-c0722e044791df50d01ec179fd6b83d7-68gqnn8 1/1 Running 0 36m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7d5d9c4bn74cg 1/1 Running 0 34m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6c4cc794v7br6 1/1 Running 0 33m rook-ceph-mgr-a-66645fdd4b-zrbb5 1/1 Running 0 37m rook-ceph-mon-a-64c7885cf5-8lhrh 1/1 Running 0 40m rook-ceph-mon-b-59f987689d-bvzlc 1/1 Running 0 38m rook-ceph-mon-c-69747d9497-bglzl 1/1 Running 0 37m rook-ceph-operator-599dbd974f-k7s75 1/1 Running 0 41m rook-ceph-osd-0-65b4bf455b-m9cll 1/1 Running 0 36m rook-ceph-osd-1-6bd897d47c-p5jh7 1/1 Running 0 34m rook-ceph-osd-2-58dccc9d46-7lcxf 1/1 Running 0 36m rook-ceph-osd-prepare-ocs-deviceset-0-0-gsd7j-85qq7 0/1 Completed 0 82m rook-ceph-osd-prepare-ocs-deviceset-1-0-9w86m-xrrtz 0/1 Completed 0 82m rook-ceph-osd-prepare-ocs-deviceset-2-0-gg9sv-qlgv8 0/1 Completed 0 82m rook-ceph-tools-fc566f885-t6d2p 1/1 Running 0 81m pbalogh@MacBook-Pro upgrade-bug $ oc get pod -n openshift-storag pbalogh@MacBook-Pro upgrade-bug $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE lib-bucket-provisioner.v1.0.0 lib-bucket-provisioner 1.0.0 Succeeded ocs-operator.v4.3.0-379.ci OpenShift Container Storage 4.3.0-379.ci ocs-operator.v4.2.2 Installing pbalogh@MacBook-Pro upgrade-bug $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE lib-bucket-provisioner.v1.0.0 lib-bucket-provisioner 1.0.0 Succeeded ocs-operator.v4.3.0-379.ci OpenShift Container Storage 4.3.0-379.ci ocs-operator.v4.2.2 Succeeded $ oc get pod -n openshift-storage NAME READY STATUS RESTARTS AGE csi-cephfsplugin-2gkwd 3/3 Running 0 128m csi-cephfsplugin-8z54x 3/3 Running 0 129m csi-cephfsplugin-provisioner-65b59d9dc9-7l2r7 5/5 Running 0 129m csi-cephfsplugin-provisioner-65b59d9dc9-87qcm 5/5 Running 0 129m csi-cephfsplugin-x5wpx 3/3 Running 0 129m csi-rbdplugin-62bbp 3/3 Running 0 129m csi-rbdplugin-9slh5 3/3 Running 0 129m csi-rbdplugin-provisioner-86c8bc888d-8kqmk 5/5 Running 0 129m csi-rbdplugin-provisioner-86c8bc888d-p58hj 5/5 Running 0 129m csi-rbdplugin-tqkbz 3/3 Running 0 129m lib-bucket-provisioner-55f74d96f6-8ll4m 1/1 Running 0 175m noobaa-core-0 1/1 Running 0 129m noobaa-db-0 1/1 Running 0 129m noobaa-endpoint-64666986b5-r25bb 1/1 Running 0 128m noobaa-operator-b77ccff86-6gcck 1/1 Running 0 130m ocs-operator-6dd9fd9d8d-ttzp7 1/1 Running 0 130m rook-ceph-crashcollector-ip-10-0-140-120-7bd8c65c8d-n8bgx 1/1 Running 0 124m rook-ceph-crashcollector-ip-10-0-149-101-74c96b48f4-mtvfl 1/1 Running 0 124m rook-ceph-crashcollector-ip-10-0-165-13-6cc484f7c6-sd8x2 1/1 Running 0 124m rook-ceph-drain-canary-060f6bb0e5cb84732c4e59ceb119fdf4-76w7kw5 1/1 Running 0 123m rook-ceph-drain-canary-8ecf54a1e4f30f0a8d3f1911a74f1b81-57lb2vl 1/1 Running 0 125m rook-ceph-drain-canary-c0722e044791df50d01ec179fd6b83d7-68gqnn8 1/1 Running 0 125m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7d5d9c4bn74cg 1/1 Running 0 123m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6c4cc794v7br6 1/1 Running 0 122m rook-ceph-mgr-a-66645fdd4b-zrbb5 1/1 Running 0 126m rook-ceph-mon-a-64c7885cf5-8lhrh 1/1 Running 0 129m rook-ceph-mon-b-59f987689d-bvzlc 1/1 Running 0 127m rook-ceph-mon-c-69747d9497-bglzl 1/1 Running 0 126m rook-ceph-operator-599dbd974f-k7s75 1/1 Running 0 130m rook-ceph-osd-0-65b4bf455b-m9cll 1/1 Running 0 125m rook-ceph-osd-1-6bd897d47c-p5jh7 1/1 Running 0 123m rook-ceph-osd-2-58dccc9d46-7lcxf 1/1 Running 0 125m rook-ceph-osd-prepare-ocs-deviceset-0-0-gsd7j-85qq7 0/1 Completed 0 171m rook-ceph-osd-prepare-ocs-deviceset-1-0-9w86m-xrrtz 0/1 Completed 0 171m rook-ceph-osd-prepare-ocs-deviceset-2-0-gg9sv-qlgv8 0/1 Completed 0 171m rook-ceph-tools-fc566f885-t6d2p 1/1 Running 0 170m I found the issue in one of noobaa's upgrade scripts. working on a fix. (In reply to Danny from comment #8) > I found the issue in one of noobaa's upgrade scripts. working on a fix. Existence if patch etc seems to imply that we should ACK it for 4.3? Adding an ack Running verification job here: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/6328/console Verified - all upgrade tests passed! https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/6328/testReport/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1437 |