Bug 2142461
Summary: | [MS] Rosa4.11 with RHODF addon deployer version 2.0.9 provider cluster result into prometheus component in Pending state and alertmanager pod stuck in ContainerCreating state | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | suchita <sgatfane> |
Component: | odf-managed-service | Assignee: | Rewant <resoni> |
Status: | CLOSED EOL | QA Contact: | suchita <sgatfane> |
Severity: | unspecified | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.11 | CC: | aeyal, dbindra, fbalak, ikave, lgangava, nberry, odf-bz-bot, resoni, sgatfane |
Target Milestone: | --- | Keywords: | Regression, TestBlocker, Tracking |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2024-07-11 10:26:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
suchita
2022-11-14 05:14:25 UTC
*** Bug 2142513 has been marked as a duplicate of this bug. *** Found a similar issue https://bugzilla.redhat.com/show_bug.cgi?id=2073452#c23, where pod gets stuck in Container Creating state with OVN 4.11 clusters. Copying the contain from deplicate marked closed bug https://bugzilla.redhat.com/show_bug.cgi?id=2142513 ------------------------------------------------------------------------------------------------------------ Description of problem: After terminating a worker node on the provider, the pod "alertmanager-managed-ocs-alertmanager-0" is stuck in a "ContainerCreating" state, and the pod "prometheus-managed-ocs-prometheus-0" is stuck in an "Init:0/1" state Version-Release number of selected component (if applicable): ROSA cluster OCP4.11, ODF4.10 How reproducible: Yes, in node termination, the pods "alertmanager-managed-ocs-alertmanager-0" and "prometheus-managed-ocs-prometheus-0" are not recovered. Is there any workaround available to the best of your knowledge? Yes, after restarting the pods "alertmanager-managed-ocs-alertmanager-0" and "prometheus-managed-ocs-prometheus-0", they went back to a "Running" state. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)?1 Can this issue reproducible? yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Yes, I didn't see this issue in the previous versions Steps to Reproduce: Terminate one of the worker nodes on the provider. Actual results: the pod "alertmanager-managed-ocs-alertmanager-0" is stuck in a "ContainerCreating" state, and/or the pod "prometheus-managed-ocs-prometheus-0" is stuck in an "Init:0/1" state Expected results: All the pods should be in a Completed or Running state. Additional info: Jenkins job link to the provider cluster: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/17960/ Versions: OC version: Client Version: 4.10.24 Server Version: 4.11.12 Kubernetes Version: v1.24.6+5157800 OCS verison: ocs-operator.v4.10.5 OpenShift Container Storage 4.10.5 ocs-operator.v4.10.4 Succeeded Cluster version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.12 True False 5h36m Error while reconciling 4.11.12: the cluster operator monitoring has not yet successfully rolled out Rook version: rook: v4.10.5-0.985405daeba3b29a178cb19aa864324e65548a63 go: go1.16.12 Ceph version: ceph version 16.2.7-126.el8cp (fe0af61d104d48cb9d116cde6e593b5fc8c197e4) pacific (stable) ---------------------------------------------------------------------------------------- I was able to reproduce the bug by terminating the node on which alertmanager pod is running. We can mark it as a tracking of https://issues.redhat.com/browse/OCPBUGS-681 The workaround would be to restart the alertmanager pod. The ODF Managed Service Project has sunset and is now consider obsolete |