Bug 1847672
Summary: | Changes to probe fields in operator manifests are not applied during upgrade | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Dan Mace <dmace> |
Component: | Cluster Version Operator | Assignee: | Dan Mace <dmace> |
Status: | CLOSED ERRATA | QA Contact: | ge liu <geliu> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.5 | CC: | aos-bugs, jokerman, wking |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The cluster-version operator ignored several probe properties, including timeoutSeconds.
Consequence: Operators which changed their release manifest s to adjust those properties did not get the changes applied to clusters on updating to the new release image.
Fix: The cluster-version operator now applies these probe properties.
Result: The cluster-version operator ensures that the in-cluster probe state matches the requested state from the operator's release manifests.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:07:36 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1829923, 1848729, 1849619 |
Description
Dan Mace
2020-06-16 18:55:29 UTC
This blocks a fix for quorum-guard which involves changing probe timeout values (https://bugzilla.redhat.com/show_bug.cgi?id=1829923). https://github.com/openshift/cluster-version-operator/pull/383 is an attempt to fix the problem, but the fix doesn't appear to work and we don't yet know why. Looks good in CI [1], where 4.5.0-rc.1 -> 4.6.0-0.ci-2020-06-18-154744 started with: $ oc adm release extract --to=4.5 quay.io/openshift-release-dev/ocp-release:4.5.0-rc.1-x86_64 Extracted release payload from digest sha256:7ea01a3c4d91f852f480ea40189f1762fcd2e77b8843a0662c471889f0b72028 created at 2020-06-05T17:58:18Z $ oc adm release extract --to=4.6 registry.svc.ci.openshift.org/ocp/release:4.6.0-0.ci-2020-06-18-154744 Extracted release payload from digest sha256:25910e71a3bd53e86bdad8aeb4ea6453b944e54b6c0a70806bc8d673dcf17c28 created at 2020-06-18T15:48:15Z $ diff -u 4.{5,6}/0000_80_machine-config-operator_07_etcdquorumguard_deployment.yaml --- 4.5/0000_80_machine-config-operator_07_etcdquorumguard_deployment.yaml 2020-06-05 00:05:51.000000000 -0700 +++ 4.6/0000_80_machine-config-operator_07_etcdquorumguard_deployment.yaml 2020-06-18 06:02:23.000000000 -0700 @@ -52,9 +52,9 @@ operator: Exists effect: NoSchedule containers: - - image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c4520124123a1425128d1f90740530069be125c911fe3e1d760d9bf6d1ce19c1 + - name: guard + image: registry.svc.ci.openshift.org/ocp/4.6-2020-06-18-154744@sha256:e4e40d4fd585029f7287f7bcdb45067c696d126869a3d817891049cd5039f04d imagePullPolicy: IfNotPresent - name: guard terminationMessagePolicy: FallbackToLogsOnError volumeMounts: - mountPath: /mnt/kube @@ -82,8 +82,10 @@ export NSS_SDB_USE_CACHE=no [[ -z $cert || -z $key ]] && exit 1 curl --max-time 2 --silent --cert "${cert//:/\:}" --key "$key" --cacert "$cacert" "$health_endpoint" |grep '{ *"health" *: *"true" *}' - initialDelaySecond: 5 - periodSecond: 5 + initialDelaySeconds: 5 + periodSeconds: 5 + failureThreshold: 3 + timeoutSeconds: 3 resources: requests: cpu: 10m $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1273719951609303040/artifacts/launch/pods.json | jq -r '.items[] | select(.metadata.name | startswith("etcd-quorum-guard")).spec.containers[].readinessProbe | {initialDelaySeconds, periodSeconds, failureThreshold, timeoutSeconds} | tostring' | uniq {"initialDelaySeconds":5,"periodSeconds":5,"failureThreshold":3,"timeoutSeconds":3} [1]: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp/1273719951609303040 Thanks for the followup, Trevor According to comment5, the verify steps from QE side should be: 1. Install ocp v4.5 without the backport pr in v4.5(such as 4.5.0-rc.1) 2. Upgrade to latest v4.6 nightly build which included pr383 3. Check if probe integer fields (initialDelaySeconds, periodSeconds, failureThreshold, and timeoutSeconds) of etcd-quorum-guard updated. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |