Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1763635

Summary: etcd-member manifest rendered with an empty image reference and leading to cluster disruption during a 4.1->4.2 upgrade
Product: OpenShift Container Platform Reporter: Antonio Murdaca <amurdaca>
Component: Machine Config OperatorAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.2.0CC: scuppett
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1763636 (view as bug list) Environment:
Last Closed: 2020-01-23 11:08:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1763636, 1763638    

Description Antonio Murdaca 2019-10-21 08:24:54 UTC
Description of problem:

During a 4.1.z to 4.2.z upgrade the etcd-member.yaml master manifest is being rendered with an empty image reference by the MCO (which is where the template lives and the render happens).
As a result, etcd fails to run, etcd-quorum-guard is fooled and the upgrade completely disrupts the cluster as it loses quorum.


Version-Release number of selected component (if applicable):

from 4.1 onwards

How reproducible:

1/20

Steps to Reproduce:
1. CI and or manual testing, 1/20 will fail
2.
3.

Actual results:

cluster disruption

Expected results:

upgrade successful


Additional info:

Comment 2 Michael Nguyen 2019-11-18 18:08:11 UTC
Verified releaseVersion is present in images.json config map
$  oc -n openshift-machine-config-operator get cm/machine-config-operator-images  -o yaml
apiVersion: v1
data:
  images.json: |
    {
      "releaseVersion": "4.3.0-0.nightly-2019-11-13-233341",
      "machineConfigOperator": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1eda418f9fc8ae45f975009ea5854f29507dc9a4a2cc8fc2b34f148b77355bc4",
      "etcd": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6054ee9ffe238573b9a515987d02a15a40d9d4b30fc8ab857d0701fd807d2271",
      "infraImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:69d09a18de7db2cd5bda94ece8bb35b402693d67a72c787b57f94320881bf645",
      "kubeClientAgentImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b3be7dd1fd6491e8db1c52900efc89893d382bb00510fe7d9c6b01e9612401f1",
      "clusterEtcdOperatorImage": "",
      "keepalivedImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a714bd0acdad76e9946e301f8c8e74cbea7dc7ee78b14d4fdc8a316cf46cc610",
      "corednsImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02b7dd4334f09809db18ddc26c7eb2f730456148fa99e0c2030cbfbedf0e9aed",
      "mdnsPublisherImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b0305d9620a5de8e4c1cc1056560691f4bcb76d80a78e365142e2ce7e5a4365",
      "haproxyImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f0369f8bfcad52fce3aad5b341827e865373eac25b0fd095ace0383f0aa2fdfd",
      "baremetalRuntimeCfgImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7614f8b98e39bc4c61b8e95a11fc6232a0a147250ca009c26b09286743d4dbe8",
      "oauthProxy": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1f774f8699afa2e7f8d2dd4e46b86430a44e151ebc73371cfa97b41a4d2245dc"
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2019-11-18T16:14:24Z"
  name: machine-config-operator-images
  namespace: openshift-machine-config-operator
  resourceVersion: "42390"
  selfLink: /api/v1/namespaces/openshift-machine-config-operator/configmaps/machine-config-operator-images
  uid: c1a79672-ea70-4626-b292-755aa5214e43


Verified error message is present in MCO logs after changing the releaseVersion:
E1118 18:02:22.977753       1 operator.go:312] refusing to read images.json version "4.3.0-0.nightly-2019-11-13-233341XXX", operator version "4.3.0-0.nightly-2019-11-13-233341"

Verified successful upgrade

Comment 4 errata-xmlrpc 2020-01-23 11:08:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062