2142513 – After terminating a worker node on the provider, the pod "alertmanager-managed-ocs-alertmanager-0" is stuck in a "ContainerCreating" state

Bug 2142513 - After terminating a worker node on the provider, the pod "alertmanager-managed-ocs-alertmanager-0" is stuck in a "ContainerCreating" state

Summary: After terminating a worker node on the provider, the pod "alertmanager-manage...

Keywords:
Status:	CLOSED DUPLICATE of bug 2142461
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-managed-service
Sub Component:
Version:	4.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Ohad
QA Contact:	Neha Berry
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-11-14 09:58 UTC by Itzhak
Modified:	2023-08-09 17:00 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-11-15 07:58:53 UTC
Embargoed:

Attachments	(Terms of Use)

Description Itzhak 2022-11-14 09:58:19 UTC

Description of problem:
After terminating a worker node on the provider, the pod "alertmanager-managed-ocs-alertmanager-0" is stuck in a "ContainerCreating" state, and the pod "prometheus-managed-ocs-prometheus-0" is stuck in an "Init:0/1" state

Version-Release number of selected component (if applicable):
ROSA cluster OCP4.11, ODF4.10

How reproducible:
Yes, in node termination, the pods "alertmanager-managed-ocs-alertmanager-0" and "prometheus-managed-ocs-prometheus-0" are not recovered.

Is there any workaround available to the best of your knowledge?
Yes, after restarting the pods "alertmanager-managed-ocs-alertmanager-0" and "prometheus-managed-ocs-prometheus-0", they went back to a "Running" state.


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?1

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
Yes, I didn't see this issue in the previous versions

Steps to Reproduce:
Terminate one of the worker nodes on the provider. 

Actual results:
the pod "alertmanager-managed-ocs-alertmanager-0" is stuck in a "ContainerCreating" state, and/or the pod "prometheus-managed-ocs-prometheus-0" is stuck in an "Init:0/1" state


Expected results:
All the pods should be in a Completed or Running state.

Additional info:

Jenkins job link to the provider cluster: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/17960/

Versions:

OC version:
Client Version: 4.10.24
Server Version: 4.11.12
Kubernetes Version: v1.24.6+5157800

OCS verison:
ocs-operator.v4.10.5                      OpenShift Container Storage   4.10.5            ocs-operator.v4.10.4                      Succeeded

Cluster version
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.12   True        False         5h36m   Error while reconciling 4.11.12: the cluster operator monitoring has not yet successfully rolled out

Rook version:
rook: v4.10.5-0.985405daeba3b29a178cb19aa864324e65548a63
go: go1.16.12

Ceph version:
ceph version 16.2.7-126.el8cp (fe0af61d104d48cb9d116cde6e593b5fc8c197e4) pacific (stable)

Comment 2 suchita 2022-11-15 07:58:53 UTC

Closing this BUG as duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2142461 .

*** This bug has been marked as a duplicate of bug 2142461 ***

Note You need to log in before you can comment on or make changes to this bug.