1950789 – Deployment using virtualmedia broken - Cluster operator monitoring is not available

Bug 1950789 - Deployment using virtualmedia broken - Cluster operator monitoring is not available

Summary: Deployment using virtualmedia broken - Cluster operator monitoring is not ava...

Keywords:
Status:	CLOSED DUPLICATE of bug 1948311
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Beth White
QA Contact:	Amit Ugol
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-18 18:47 UTC by Lubov
Modified:	2021-04-20 16:20 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-20 16:20:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
openshift_install.log (139.08 KB, text/plain) 2021-04-18 18:47 UTC, Lubov	no flags	Details
View All

Description Lubov 2021-04-18 18:47:44 UTC

Created attachment 1773106 [details]
openshift_install.log

Version:

$ ./openshift-baremetal-install version
./openshift-baremetal-install 4.8.0-0.nightly-2021-04-18-101412
built from commit 907ba997eebc2a5795763d8496e36df7d1fdc51f
release image registry.ci.openshift.org/ocp/release@sha256:ec1ce643e584a273039e121ec27d23c8fc94bcf36904b0a8ab25286b49512f42

Platform:
IPI 

What happened?
Deployment using redfish-virtualmedia with provisioning network disabled fails with error
level=fatal msg="failed to initialize the cluster: Cluster operator monitoring is not available"

Reproduced twice

$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.8.0-0.nightly-2021-04-18-101412   True        False         False      30m
baremetal                                  4.8.0-0.nightly-2021-04-18-101412   True        False         False      68m
cloud-credential                           4.8.0-0.nightly-2021-04-18-101412   True        False         False      85m
cluster-autoscaler                         4.8.0-0.nightly-2021-04-18-101412   True        False         False      69m
config-operator                            4.8.0-0.nightly-2021-04-18-101412   True        False         False      69m
console                                    4.8.0-0.nightly-2021-04-18-101412   True        False         False      39m
csi-snapshot-controller                    4.8.0-0.nightly-2021-04-18-101412   True        False         False      69m
dns                                        4.8.0-0.nightly-2021-04-18-101412   True        False         False      69m
etcd                                       4.8.0-0.nightly-2021-04-18-101412   True        False         False      68m
image-registry                             4.8.0-0.nightly-2021-04-18-101412   True        False         False      63m
ingress                                    4.8.0-0.nightly-2021-04-18-101412   True        False         False      44m
insights                                   4.8.0-0.nightly-2021-04-18-101412   True        False         False      63m
kube-apiserver                             4.8.0-0.nightly-2021-04-18-101412   True        False         False      61m
kube-controller-manager                    4.8.0-0.nightly-2021-04-18-101412   True        False         False      67m
kube-scheduler                             4.8.0-0.nightly-2021-04-18-101412   True        False         False      67m
kube-storage-version-migrator              4.8.0-0.nightly-2021-04-18-101412   True        False         False      69m
machine-api                                4.8.0-0.nightly-2021-04-18-101412   True        False         False      62m
machine-approver                           4.8.0-0.nightly-2021-04-18-101412   True        False         False      68m
machine-config                             4.8.0-0.nightly-2021-04-18-101412   True        False         False      69m
marketplace                                4.8.0-0.nightly-2021-04-18-101412   True        False         False      68m
monitoring                                                                     False       True          True       68m
network                                    4.8.0-0.nightly-2021-04-18-101412   True        False         False      70m
node-tuning                                4.8.0-0.nightly-2021-04-18-101412   True        False         False      69m
openshift-apiserver                        4.8.0-0.nightly-2021-04-18-101412   True        False         False      63m
openshift-controller-manager               4.8.0-0.nightly-2021-04-18-101412   True        False         False      68m
openshift-samples                          4.8.0-0.nightly-2021-04-18-101412   True        False         False      54m
operator-lifecycle-manager                 4.8.0-0.nightly-2021-04-18-101412   True        False         False      69m
operator-lifecycle-manager-catalog         4.8.0-0.nightly-2021-04-18-101412   True        False         False      69m
operator-lifecycle-manager-packageserver   4.8.0-0.nightly-2021-04-18-101412   True        False         False      65m
service-ca                                 4.8.0-0.nightly-2021-04-18-101412   True        False         False      69m
storage                                    4.8.0-0.nightly-2021-04-18-101412   True        False         False      69m

must-gather message
[must-gather      ] OUT Using must-gather plug-in image: registry.ocp-edge-cluster-0.qe.lab.redhat.com:5000/localimages/local-release-image:4.8.0-0.nightly-2021-04-18-101412
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information.
ClusterID: d58b4edf-b2c8-44a7-bb4b-3ef750e8dff6
ClusterVersion: Installing "4.8.0-0.nightly-2021-04-18-101412" for 2 hours: Unable to apply 4.8.0-0.nightly-2021-04-18-101412: the cluster operator monitoring has not yet successfully rolled out
ClusterOperators:
	clusteroperator/monitoring is not available (Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.) because Failed to rollout the stack. Error: running task Updating prometheus-adapter failed: reconciling PrometheusAdapter Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-adapter: expected 3 replicas, got 1 updated replicas
....
error running backup collection: errors ocurred while gathering data:
    [skipping gathering clusterroles.rbac.authorization.k8s.io/system:registry due to error: clusterroles.rbac.authorization.k8s.io "system:registry" not found, skipping gathering clusterrolebindings.rbac.authorization.k8s.io/registry-registry-role due to error: clusterrolebindings.rbac.authorization.k8s.io "registry-registry-role" not found, skipping gathering secrets/support due to error: secrets "support" not found, skipping gathering podnetworkconnectivitychecks.controlplane.operator.openshift.io due to error: the server doesn't have a resource type "podnetworkconnectivitychecks", skipping gathering namespaces/openshift-marketplace due to error: one or more errors ocurred while gathering pod-specific data for namespace: openshift-marketplace

    [one or more errors ocurred while gathering container data for pod certified-operators-4fjdp:

    [previous terminated container "registry-server" in pod "certified-operators-4fjdp" not found, container "registry-server" in pod "certified-operators-4fjdp" is waiting to start: trying and failing to pull image], one or more errors ocurred while gathering container data for pod certified-operators-6bpmc:

    [container "registry-server" in pod "certified-operators-6bpmc" is waiting to start: trying and failing to pull image, previous terminated container "registry-server" in pod "certified-operators-6bpmc" not found], one or more errors ocurred while gathering container data for pod community-operators-j5gkt:

    [container "registry-server" in pod "community-operators-j5gkt" is waiting to start: trying and failing to pull image, previous terminated container "registry-server" in pod "community-operators-j5gkt" not found], one or more errors ocurred while gathering container data for pod community-operators-z5vhz:

    [container "registry-server" in pod "community-operators-z5vhz" is waiting to start: trying and failing to pull image, previous terminated container "registry-server" in pod "community-operators-z5vhz" not found], one or more errors ocurred while gathering container data for pod redhat-marketplace-cvflv:

    [container "registry-server" in pod "redhat-marketplace-cvflv" is waiting to start: image can't be pulled, previous terminated container "registry-server" in pod "redhat-marketplace-cvflv" not found], one or more errors ocurred while gathering container data for pod redhat-marketplace-mtqdg:

    [previous terminated container "registry-server" in pod "redhat-marketplace-mtqdg" not found, container "registry-server" in pod "redhat-marketplace-mtqdg" is waiting to start: trying and failing to pull image], one or more errors ocurred while gathering container data for pod redhat-operators-9w9lh:

    [container "registry-server" in pod "redhat-operators-9w9lh" is waiting to start: trying and failing to pull image, previous terminated container "registry-server" in pod "redhat-operators-9w9lh" not found], one or more errors ocurred while gathering container data for pod redhat-operators-wc5kv:

    [previous terminated container "registry-server" in pod "redhat-operators-wc5kv" not found, container "registry-server" in pod "redhat-operators-wc5kv" is waiting to start: trying and failing to pull image]], skipping gathering namespaces/openshift-monitoring due to error: one or more errors ocurred while gathering pod-specific data for namespace: openshift-monitoring

    [one or more errors ocurred while gathering container data for pod prometheus-adapter-5c488f64db-pl2wx:

    [previous terminated container "prometheus-adapter" in pod "prometheus-adapter-5c488f64db-pl2wx" not found, container "prometheus-adapter" in pod "prometheus-adapter-5c488f64db-pl2wx" is waiting to start: ContainerCreating], one or more errors ocurred while gathering container data for pod prometheus-adapter-5c488f64db-r9n68:

    [container "prometheus-adapter" in pod "prometheus-adapter-5c488f64db-r9n68" is waiting to start: ContainerCreating, previous terminated container "prometheus-adapter" in pod "prometheus-adapter-5c488f64db-r9n68" not found]], skipping gathering endpoints/host-etcd-2 due to error: endpoints "host-etcd-2" not found]error: gather did not start for pod must-gather-7wzr6: unable to pull image: ImagePullBackOff: Back-off pulling image "registry.ocp-edge-cluster-0.qe.lab.redhat.com:5000/localimages/local-release-image:4.8.0-0.nightly-2021-04-18-101412-must-gather"

What did you expect to happen?
Deployment should pass


How to reproduce it (as minimally and precisely as possible)?
Deploy OCP 4.8 usinf redfish-virtulamedia with provisioning network disabled

Comment 1 Lubov 2021-04-18 18:52:13 UTC

must-gather - http://rhos-compute-node-10.lab.eng.rdu2.redhat.com/logs/BZ1950789_must-gather.tar.gz

Comment 2 Arda Guclu 2021-04-20 16:20:46 UTC


*** This bug has been marked as a duplicate of bug 1948311 ***

Note You need to log in before you can comment on or make changes to this bug.