1906130 – cluster-monitoring-operator pod stuck in CreateContainerConfigError after installer successfully finished deploy

Bug 1906130 - cluster-monitoring-operator pod stuck in CreateContainerConfigError after installer successfully finished deploy

Summary: cluster-monitoring-operator pod stuck in CreateContainerConfigError after ins...

Keywords:
Status:	CLOSED DUPLICATE of bug 1904538
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.6.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Beth White
QA Contact:	Amit Ugol
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-09 17:46 UTC by Lubov
Modified:	2020-12-10 19:12 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-12-10 19:12:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
cluster-monitoring-operator describe (8.86 KB, text/plain) 2020-12-09 17:46 UTC, Lubov	no flags	Details
errors reported by must-gather (5.16 KB, text/plain) 2020-12-09 17:49 UTC, Lubov	no flags	Details
View All

Description Lubov 2020-12-09 17:46:58 UTC

Created attachment 1737973 [details]
cluster-monitoring-operator describe

Created attachment 1737973 [details]
cluster-monitoring-operator describe

Version:

$ ./openshift-baremetal-install version
./openshift-baremetal-install 4.6.0-0.nightly-2020-12-08-021151
built from commit f5ba6239853f0904704c04d8b1c04c78172f1141
release image registry.svc.ci.openshift.org/ocp/release@sha256:bd84091070e50e41cd30bcda6c6bd2b821ad48a0ee9aa7637165db31e7ad51dd

Platform:
IPI Barmetal

What happened?
After deploy finished and reported successful get clusterversion returns error:
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-12-08-021151   True        False         43m     Error while reconciling 4.6.0-0.nightly-2020-12-08-021151: the workload openshift-monitoring/cluster-monitoring-operator has not yet successfully rolled out

Pod cluster-monitoring-operator stuck in CreateContainerConfigError status (container kube-rbac-proxy failed to start - see attached cluster-monitoring-operator.describe):
$ oc get pods -n openshift-monitoring
NAME                                           READY   STATUS                       RESTARTS   AGE
cluster-monitoring-operator-866c9df665-tpm9m   1/2     CreateContainerConfigError   0          100m

In pod events reported "Error: container has runAsNonRoot and image has non-numeric user (nobody), cannot verify user is non-root"

All operators are reported as Available
$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.6.0-0.nightly-2020-12-08-021151   True        False         False      61m
cloud-credential                           4.6.0-0.nightly-2020-12-08-021151   True        False         False      108m
cluster-autoscaler                         4.6.0-0.nightly-2020-12-08-021151   True        False         False      95m
config-operator                            4.6.0-0.nightly-2020-12-08-021151   True        False         False      95m
console                                    4.6.0-0.nightly-2020-12-08-021151   True        False         False      65m
csi-snapshot-controller                    4.6.0-0.nightly-2020-12-08-021151   True        False         False      95m
dns                                        4.6.0-0.nightly-2020-12-08-021151   True        False         False      94m
etcd                                       4.6.0-0.nightly-2020-12-08-021151   True        False         False      93m
image-registry                             4.6.0-0.nightly-2020-12-08-021151   True        False         False      58m
ingress                                    4.6.0-0.nightly-2020-12-08-021151   True        False         False      70m
insights                                   4.6.0-0.nightly-2020-12-08-021151   True        False         False      95m
kube-apiserver                             4.6.0-0.nightly-2020-12-08-021151   True        False         False      92m
kube-controller-manager                    4.6.0-0.nightly-2020-12-08-021151   True        False         False      92m
kube-scheduler                             4.6.0-0.nightly-2020-12-08-021151   True        False         False      92m
kube-storage-version-migrator              4.6.0-0.nightly-2020-12-08-021151   True        False         False      70m
machine-api                                4.6.0-0.nightly-2020-12-08-021151   True        False         False      79m
machine-approver                           4.6.0-0.nightly-2020-12-08-021151   True        False         False      94m
machine-config                             4.6.0-0.nightly-2020-12-08-021151   True        False         False      94m
marketplace                                4.6.0-0.nightly-2020-12-08-021151   True        False         False      93m
monitoring                                 4.6.0-0.nightly-2020-12-08-021151   True        False         False      70m
network                                    4.6.0-0.nightly-2020-12-08-021151   True        False         False      95m
node-tuning                                4.6.0-0.nightly-2020-12-08-021151   True        False         False      95m
openshift-apiserver                        4.6.0-0.nightly-2020-12-08-021151   True        False         False      74m
openshift-controller-manager               4.6.0-0.nightly-2020-12-08-021151   True        False         False      92m
openshift-samples                          4.6.0-0.nightly-2020-12-08-021151   True        False         False      56m
operator-lifecycle-manager                 4.6.0-0.nightly-2020-12-08-021151   True        False         False      94m
operator-lifecycle-manager-catalog         4.6.0-0.nightly-2020-12-08-021151   True        False         False      94m
operator-lifecycle-manager-packageserver   4.6.0-0.nightly-2020-12-08-021151   True        False         False      73m
service-ca                                 4.6.0-0.nightly-2020-12-08-021151   True        False         False      95m
storage                                    4.6.0-0.nightly-2020-12-08-021151   True        False         False      95m

What did you expect to happen?
After deploy all pods in Running/Complited state

How to reproduce it (as minimally and precisely as possible)?
1. Deploy OCP 4.6 - disconnected barmetal network ipv4 provision ipv6 
2. oc get clusterversion
3. oc get pods -A|grep -vE "Run|Comp"

Anything else we need to know?
1. It happen in 3 deploys out of 4
In the one that has no such a problem, the pod was reported as restarted twice, but in state Running by the end of deployment

2. While running must-gather there were errors (attached)

Comment 1 Lubov 2020-12-09 17:49:22 UTC

Created attachment 1737975 [details]
errors reported by must-gather

Comment 2 Lubov 2020-12-09 17:53:41 UTC

must-gather http://rhos-compute-node-10.lab.eng.rdu2.redhat.com/logs/BZ1906130-must-gather.tar.gz

Comment 3 Stephen Benjamin 2020-12-10 19:12:09 UTC

Thanks for the report! If it's not clearly specific to baremetal, generally reports should go against the failing operator.

Looks like this was a dupe of BZ1904538, the monitoring team has fixed it.

*** This bug has been marked as a duplicate of bug 1904538 ***

Note You need to log in before you can comment on or make changes to this bug.