Bug 1763635 - etcd-member manifest rendered with an empty image reference and leading to cluster disruption during a 4.1->4.2 upgrade
Summary: etcd-member manifest rendered with an empty image reference and leading to cl...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1763636 1763638
TreeView+ depends on / blocked
 
Reported: 2019-10-21 08:24 UTC by Antonio Murdaca
Modified: 2020-01-23 11:08 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1763636 (view as bug list)
Environment:
Last Closed: 2020-01-23 11:08:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1198 0 'None' closed Bug 1763635: pkg/operator: fix race between images CM and MCO 2020-06-21 06:12:55 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:08:56 UTC

Description Antonio Murdaca 2019-10-21 08:24:54 UTC
Description of problem:

During a 4.1.z to 4.2.z upgrade the etcd-member.yaml master manifest is being rendered with an empty image reference by the MCO (which is where the template lives and the render happens).
As a result, etcd fails to run, etcd-quorum-guard is fooled and the upgrade completely disrupts the cluster as it loses quorum.


Version-Release number of selected component (if applicable):

from 4.1 onwards

How reproducible:

1/20

Steps to Reproduce:
1. CI and or manual testing, 1/20 will fail
2.
3.

Actual results:

cluster disruption

Expected results:

upgrade successful


Additional info:

Comment 2 Michael Nguyen 2019-11-18 18:08:11 UTC
Verified releaseVersion is present in images.json config map
$  oc -n openshift-machine-config-operator get cm/machine-config-operator-images  -o yaml
apiVersion: v1
data:
  images.json: |
    {
      "releaseVersion": "4.3.0-0.nightly-2019-11-13-233341",
      "machineConfigOperator": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1eda418f9fc8ae45f975009ea5854f29507dc9a4a2cc8fc2b34f148b77355bc4",
      "etcd": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6054ee9ffe238573b9a515987d02a15a40d9d4b30fc8ab857d0701fd807d2271",
      "infraImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:69d09a18de7db2cd5bda94ece8bb35b402693d67a72c787b57f94320881bf645",
      "kubeClientAgentImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b3be7dd1fd6491e8db1c52900efc89893d382bb00510fe7d9c6b01e9612401f1",
      "clusterEtcdOperatorImage": "",
      "keepalivedImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a714bd0acdad76e9946e301f8c8e74cbea7dc7ee78b14d4fdc8a316cf46cc610",
      "corednsImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:02b7dd4334f09809db18ddc26c7eb2f730456148fa99e0c2030cbfbedf0e9aed",
      "mdnsPublisherImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6b0305d9620a5de8e4c1cc1056560691f4bcb76d80a78e365142e2ce7e5a4365",
      "haproxyImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f0369f8bfcad52fce3aad5b341827e865373eac25b0fd095ace0383f0aa2fdfd",
      "baremetalRuntimeCfgImage": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7614f8b98e39bc4c61b8e95a11fc6232a0a147250ca009c26b09286743d4dbe8",
      "oauthProxy": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1f774f8699afa2e7f8d2dd4e46b86430a44e151ebc73371cfa97b41a4d2245dc"
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2019-11-18T16:14:24Z"
  name: machine-config-operator-images
  namespace: openshift-machine-config-operator
  resourceVersion: "42390"
  selfLink: /api/v1/namespaces/openshift-machine-config-operator/configmaps/machine-config-operator-images
  uid: c1a79672-ea70-4626-b292-755aa5214e43


Verified error message is present in MCO logs after changing the releaseVersion:
E1118 18:02:22.977753       1 operator.go:312] refusing to read images.json version "4.3.0-0.nightly-2019-11-13-233341XXX", operator version "4.3.0-0.nightly-2019-11-13-233341"

Verified successful upgrade

Comment 4 errata-xmlrpc 2020-01-23 11:08:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.