Bug 2209846
| Summary: | 'ODF Operator stuck in 'Unknown Failure' | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Anjali <amenon> |
| Component: | odf-operator | Assignee: | Nitin Goyal <nigoyal> |
| Status: | CLOSED NOTABUG | QA Contact: | Elad <ebenahar> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.10 | CC: | hnallurv, mparida, muagarwa, nigoyal, ocs-bugs, odf-bz-bot |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-07-18 06:25:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I am still looking into it. As far as I understood, the customer was facing issues in upgrading other operators including ODF. After some manual fixing, it was possible to upgrade the other operators, but ODF is still not upgrading. So prima facie looks like an OCP/OLM thing. Another thing that caught my attention although it might not be related to the problem, is the high number of pod restarts of noobaa operator. noobaa-operator-56457bf44b-jj66q 1/1 Running 220 (32h ago) 208d 172.31.12.117 worker4.ocp.rosat.ro <none> <none>. Hello Anjali, Can I pls get the latest odf must gather? |
Description of problem (please be detailed as possible and provide log snippests): - Cu is trying to upgrade ODF operators from v4.10.7 to v4.10.8. Initially the upgrade was getting stuck with error Bundle unpacking failed. Reason: DeadlineExceeded, and Message: Job was active longer than specified deadline - All pods in openshift storage namespace are up and running. - ceph cluster is healthy [amenon@supportshell-1 must_gather_commands]$ cat ceph_status cluster: id: c374b71c-19a4-45e9-bc6d-fb3f90d1b0dd health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 6M) mgr: a(active, since 7M) mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3 up (since 6M), 3 in (since 7M) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 11 pools, 177 pgs objects: 9.04k objects, 14 GiB usage: 45 GiB used, 3.0 TiB / 3 TiB avail pgs: 177 active+clean io: client: 852 B/s rd, 32 KiB/s wr, 1 op/s rd, 3 op/s wr - We applied solutions https://access.redhat.com/solutions/6459071 and https://access.redhat.com/solutions/6972585, but didn't help. - Then, with help of SBR-Shift, we followed Steps 1 and 2 from the "For issues related to operator upgrade" section in https://access.redhat.com/solutions/6459071 and then deleted the relevant InstallPlan oc get ip/install-hj4ns -n openshift-storage NAME CSV APPROVAL APPROVED install-hj4ns [mcg-operator.v4.10.8, ocs-operator.v4.10.8, odf-csi-addons-operator.v4.10.8, odf-operator.v4.10.8] Automatic true - After this all OCP operators can be upgraded, but ODF Operator is stuck in 'Unknown Failure' (attaching screenshot) and version is still 4.10.7. - There's no errors or messages in the ODF m-g regarding upgrading to ODF 4.10.8. The upgrade fails to start, with UI showing "Upgrade status" is "Unknown failure" [amenon@supportshell-1 oc_output]$ cat csv NAME DISPLAY VERSION REPLACES PHASE container-security-operator.v3.8.7 Red Hat Quay Container Security Operator 3.8.7 container-security-operator.v3.8.6 Succeeded mcg-operator.v4.10.7 NooBaa Operator 4.10.7 mcg-operator.v4.10.6 Succeeded ocs-operator.v4.10.7 OpenShift Container Storage 4.10.7 ocs-operator.v4.10.6 Succeeded odf-csi-addons-operator.v4.10.7 CSI Addons 4.10.7 odf-csi-addons-operator.v4.10.6 Succeeded odf-operator.v4.10.7 OpenShift Data Foundation 4.10.7 odf-operator.v4.10.6 Succeeded red-hat-camel-k-operator.v1.10.0-0.1682325781.p Red Hat Integration - Camel K 1.10.0+0.1682325781.p red-hat-camel-k-operator.v1.10.0-0.1679561624.p Succeeded [amenon@supportshell-1 oc_output]$ cat installplan NAME CSV APPROVAL APPROVED install-2xmbm odf-operator.v4.10.7 Automatic true install-q8jr9 odf-operator.v4.10.6 Automatic true install-w5j67 mcg-operator.v4.10.5 Automatic true $ oc get subs -n openshift-storage NAME PACKAGE SOURCE CHANNEL mcg-operator-stable-4.10-redhat-operators-openshift-marketplace mcg-operator redhat-operators stable-4.10 ocs-operator-stable-4.10-redhat-operators-openshift-marketplace ocs-operator redhat-operators stable-4.10 odf-csi-addons-operator-stable-4.10-redhat-operators-openshift-marketplace odf-csi-addons-operator redhat-operators stable-4.10 odf-operator - OpenShift Data Foundation - was unlocked but didn't update to the last version. Its actual version is 4.10.7 and the installplan resource cu deleted tried to upgrade to 4.10.8. Version of all relevant components (if applicable): [amenon@supportshell-1 oc_output]$ cat csv NAME DISPLAY VERSION REPLACES PHASE container-security-operator.v3.8.7 Red Hat Quay Container Security Operator 3.8.7 container-security-operator.v3.8.6 Succeeded mcg-operator.v4.10.7 NooBaa Operator 4.10.7 mcg-operator.v4.10.6 Succeeded ocs-operator.v4.10.7 OpenShift Container Storage 4.10.7 ocs-operator.v4.10.6 Succeeded odf-csi-addons-operator.v4.10.7 CSI Addons 4.10.7 odf-csi-addons-operator.v4.10.6 Succeeded odf-operator.v4.10.7 OpenShift Data Foundation 4.10.7 odf-operator.v4.10.6 Succeeded red-hat-camel-k-operator.v1.10.0-0.1682325781.p Red Hat Integration - Camel K 1.10.0+0.1682325781.p red-hat-camel-k-operator.v1.10.0-0.1679561624.p Succeeded $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.28 True False 206d Cluster version is 4.10.28 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No, cluster is running fine but Operators cannot be upgraded Is there any workaround available to the best of your knowledge? No Actual results: Operators are not getting upgraded to 4.10.8 Expected results: Operators are successfully upgraded to 4.10.8 Additional info: - All related logs/m-g available in supportshell under ~/03469870