Description of problem: During a 4.1.z to 4.2.z upgrade the etcd-member.yaml master manifest is being rendered with an empty image reference by the MCO (which is where the template lives and the render happens). As a result, etcd fails to run, etcd-quorum-guard is fooled and the upgrade completely disrupts the cluster as it loses quorum. Version-Release number of selected component (if applicable): from 4.1 onwards How reproducible: 1/20 Steps to Reproduce: 1. CI and or manual testing, 1/20 will fail 2. 3. Actual results: cluster disruption Expected results: upgrade successful Additional info:
Verified releaseVersion is present in images.json config map $ oc -n openshift-machine-config-operator get cm/machine-config-operator-images -o yaml apiVersion: v1 data: images.json: | { "releaseVersion": "4.3.0-0.nightly-2019-11-13-233341", "machineConfigOperator": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1eda418f9fc8ae45f975009ea5854f29507dc9a4a2cc8fc2b34f148b77355bc4", "etcd": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6054ee9ffe238573b9a515987d02a15a40d9d4b30fc8ab857d0701fd807d2271", "infraImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:69d09a18de7db2cd5bda94ece8bb35b402693d67a72c787b57f94320881bf645", "kubeClientAgentImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b3be7dd1fd6491e8db1c52900efc89893d382bb00510fe7d9c6b01e9612401f1", "clusterEtcdOperatorImage": "", "keepalivedImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a714bd0acdad76e9946e301f8c8e74cbea7dc7ee78b14d4fdc8a316cf46cc610", "corednsImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02b7dd4334f09809db18ddc26c7eb2f730456148fa99e0c2030cbfbedf0e9aed", "mdnsPublisherImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b0305d9620a5de8e4c1cc1056560691f4bcb76d80a78e365142e2ce7e5a4365", "haproxyImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f0369f8bfcad52fce3aad5b341827e865373eac25b0fd095ace0383f0aa2fdfd", "baremetalRuntimeCfgImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7614f8b98e39bc4c61b8e95a11fc6232a0a147250ca009c26b09286743d4dbe8", "oauthProxy": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1f774f8699afa2e7f8d2dd4e46b86430a44e151ebc73371cfa97b41a4d2245dc" } kind: ConfigMap metadata: creationTimestamp: "2019-11-18T16:14:24Z" name: machine-config-operator-images namespace: openshift-machine-config-operator resourceVersion: "42390" selfLink: /api/v1/namespaces/openshift-machine-config-operator/configmaps/machine-config-operator-images uid: c1a79672-ea70-4626-b292-755aa5214e43 Verified error message is present in MCO logs after changing the releaseVersion: E1118 18:02:22.977753 1 operator.go:312] refusing to read images.json version "4.3.0-0.nightly-2019-11-13-233341XXX", operator version "4.3.0-0.nightly-2019-11-13-233341" Verified successful upgrade
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062