Bug 1713039
Summary: | etcd quorum guard test does not correctly make nodes unschedulable | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Robert Krawitz <rkrawitz> |
Component: | Machine Config Operator | Assignee: | Robert Krawitz <rkrawitz> |
Status: | CLOSED ERRATA | QA Contact: | Micah Abbott <miabbott> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.0 | CC: | amurdaca, ccoleman, erich, sponnaga, wking |
Target Milestone: | --- | Keywords: | OSE41z_next |
Target Release: | 4.1.z | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | 4.1.3 | ||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-26 08:50:22 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Robert Krawitz
2019-05-22 18:00:43 UTC
PR merged. I searched through the last 14d of CI results for log messages that were removed/changed in the PR (https://github.com/openshift/machine-config-operator/pull/822): - "etcdQuotaGard deployment not present" - "Node object was modified and not up to date; retrying" - "Failed to make node %s %sschedulable" I was unable to find any evidence of those messages. Additionally, I pulled the machine-config-operator image included in the 4.1.0-0.nightly-2019-06-19-033215 release and inspected the contents of the changed manifest: ``` $ ./oc image info -a ../all-the-pull-secrets.json $(./oc adm release info -a ../all-the-pull-secrets.json --image-for=machine-config-operator registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-06-19-033215) | grep Name Name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:976cd21a9b96fa2e4e1bed568e3f34b9087703f4d18c914beb0379e05b43aeaf $ sudo podman pull --authfile ../all-the-pull-secrets.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:976cd21a9b96fa 2e4e1bed568e3f34b9087703f4d18c914beb0379e05b43aeaf $ ctr=$(sudo podman create quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:976cd21a9b96fa2e4e1bed568e3f34b9087703f4d18c91 4beb0379e05b43aeaf) $ mnt=$(sudo podman mount $ctr) $ sudo grep -C 10 TERM $mnt/manifests/0000_80_machine-config-operator_07_etcdquorumguard_deployment.yaml imagePullPolicy: IfNotPresent name: guard volumeMounts: - mountPath: /mnt/kube name: kubecerts command: - /bin/bash args: - -c - | # properly handle TERM and exit as soon as it is signaled set -euo pipefail trap 'jobs -p | xargs -r kill; exit 0' TERM sleep infinity & wait readinessProbe: exec: command: - /bin/sh - -c - | declare -r croot=/mnt/kube declare -r health_endpoint="https://127.0.0.1:2379/health" declare -r cert="$(find $croot -name 'system:etcd-peer*.crt' -print -quit)" ``` This confirms the manifest has the changes included in https://github.com/openshift/machine-config-operator/pull/822 Moving to VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1589 |