Bug 1801898 - [etcd-operator] etcd operator failing due to node name inconsistencies across platforms
Summary: [etcd-operator] etcd operator failing due to node name inconsistencies across...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.4.0
Assignee: Sam Batschelet
QA Contact: ge liu
: 1802649 1802678 (view as bug list)
Depends On:
TreeView+ depends on / blocked
Reported: 2020-02-11 21:02 UTC by Yu Qi Zhang
Modified: 2020-06-02 09:41 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-05-04 11:36:06 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-etcd-operator pull 143 0 None closed Bug 1801898: remove dependency on node internal DNS name 2021-02-06 14:26:35 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:36:27 UTC

Description Yu Qi Zhang 2020-02-11 21:02:40 UTC
Description of problem:

On azure/metal/ovirt, installs are failing with:

level=fatal msg="failed to initialize the cluster: Cluster operator etcd is reporting a failure: InstallerControllerDegraded: missing required resources: [configmaps: config-1,etcd-metrics-proxy-client-ca-1,etcd-metrics-proxy-serving-ca-1,etcd-peer-client-ca-1,etcd-pod-1,etcd-serving-ca-1, secrets: etcd-all-peer-1,etcd-all-serving-1,etcd-all-serving-metrics-1]\nStaticPodsDegraded: pods \"etcd-ci-op-7lhbj7qi-761c8-jm6jf-master-2\" not found\nStaticPodsDegraded: pods \"etcd-ci-op-7lhbj7qi-761c8-jm6jf-master-1\" not found\nStaticPodsDegraded: pods \"etcd-ci-op-7lhbj7qi-761c8-jm6jf-master-0\" not found\nRevisionControllerDegraded: configmaps \"etcd-pod\" not found\nTargetConfigControllerDegraded: \"configmap/kube-apiserver-pod\": node/ci-op-7lhbj7qi-761c8-jm6jf-master-2 missing InternalDNS"

Sam did some digging and come up with: https://github.com/openshift/cluster-etcd-operator/issues/115

This is blocking many of our jobs, example:


Version-Release number of selected component (if applicable):

How reproducible:

Comment 6 Abhinav Dahiya 2020-02-13 17:19:26 UTC
*** Bug 1802678 has been marked as a duplicate of this bug. ***

Comment 7 Abhinav Dahiya 2020-02-13 17:19:28 UTC
*** Bug 1802649 has been marked as a duplicate of this bug. ***

Comment 11 Ray Ashworth 2020-02-17 17:02:58 UTC
Test latest nightly build, looks like it was posted Saturday 2/15, no change, do we need a new RH CORE OS image?

Comment 12 isaic 2020-02-17 19:56:06 UTC
Can you confirm what the latest 4.4 CoreOS OVA file should be?  

Originally were told to go here for a 1/24/20 date:  https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/rhcos-4.4/44.81.202001241431.0/x86_64/rhcos-44.81.202001241431.0-vmware.x86_64.ova  and is the one that fails when we try to install OCP 4.4 on VMware. 

We then checked to see if we could find a new version of CoreOS OVA file.  Noticed that there was a "newer" version than the one we are using (but likely NOT related to this bugzilla) since it has a 2/07/20 date here.  https://github.com/openshift/installer/blob/master/data/data/rhcos.json#L123-L127


Let us know.  Tks!

Comment 13 ge liu 2020-02-20 07:45:46 UTC
Verified in upi osp with 4.4.0-0.nightly-2020-02-19-173908, tried on other platform: azure/vsphere/... but blocked by another bug: https://bugzilla.redhat.com/show_bug.cgi?id=1798945.

Comment 15 errata-xmlrpc 2020-05-04 11:36:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 16 Nicolas Marcq 2020-05-29 08:06:08 UTC

The mirror still contains only the 4.4.3 image:

there is a way to precise images version to pull from the bootstrap installation? I use the lasted openshift-install 4.4.5 but it seems that is the OVA that actually point the the installed Openshift version.


Comment 17 Sam Batschelet 2020-05-29 13:55:49 UTC

>The mirror still contains only the 4.4.3 image:

>there is a way to precise images version to pull from the bootstrap installation? I use the lasted openshift-install 4.4.5 but it seems that is the OVA that actually point the the installed Openshift version.


Thank you for the report we are looking into this.

Comment 18 Sam Batschelet 2020-05-29 22:12:49 UTC
Spoke to ART team which handles these assets, they said that these images although referencing 4.4.3 are the latest for rhcos dependencies. So in short what you are seeing is expected.

Comment 19 Nicolas Marcq 2020-06-02 09:41:54 UTC
OK thanks.

It's just that I still have the issue with the installer 4.4.5 and the RHCOS image 4.4.3.


oc describe co etcd                                                    
Name:         etcd
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
  Creation Timestamp:  2020-06-02T09:11:16Z
  Generation:          1
  Resource Version:    45747
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/etcd
  UID:                 d5f61fc1-5064-409a-b7ca-c7357e22e759
    Last Transition Time:  2020-06-02T09:13:39Z
    Message:               StaticPodsDegraded: pods "etcd-localhost" not found
InstallerControllerDegraded: missing required resources: [configmaps: etcd-scripts,restore-etcd-pod, configmaps: config-1,etcd-metrics-proxy-client-ca-1,etcd-metrics-proxy-serving-ca-1,etcd-peer-client-ca-1,etcd-pod-1,etcd-serving-ca-1, secrets: etcd-all-peer-1,etcd-all-serving-1,etcd-all-serving-metrics-1]
EnvVarControllerDegraded: at least three nodes are required to have a valid configuration
RevisionControllerDegraded: configmaps "etcd-pod" not found
ScriptControllerDegraded: "configmap/etcd-pod": missing env var values


Note You need to log in before you can comment on or make changes to this bug.