Bug 2076521
| Summary: | Nodes in the same zone are not updated in the right order | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Sergio <sregidor> |
| Component: | Machine Config Operator | Assignee: | Kirsten Garrison <kgarriso> |
| Machine Config Operator sub component: | Machine Config Operator | QA Contact: | Sergio <sregidor> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | low | ||
| Priority: | high | CC: | aos-bugs, kgarriso, mkrejci, rioliu |
| Version: | 4.11 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 11:07:40 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Verified using build: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-05-04-060900 True False 141m Cluster version is 4.11.0-0.nightly-2022-05-04-060900 Nodes in the same zone are updated "oldest first". We move the issue to VERIFIED status. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |
Description of problem: In 4.11 when a new machine configuration is provided the nodes should be updated in this order: 1. Upgrade nodes in topology.kubernetes.io/zone order and then by node age (oldest first). 2. If zones are not present (for ex: baremetal deployments) upgrade nodes by age oldest first. The problem is that now when there are several nodes in the same zone, those nodes are updated in a random order (usually newest first, but not always). They should be updated oldest first. Version-Release number of MCO (Machine Config Operator) (if applicable): $ oc get co machine-config NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE machine-config 4.11.0-0.nightly-2022-04-18-091618 True False False 80m Platform (AWS, VSphere, Metal, etc.): All platforms where topology.kubernetes.io/zone is defined. Are you certain that the root cause of the issue being reported is the MCO (Machine Config Operator)? (Y/N/Not sure): Yes How reproducible: Always Did you catch this issue by running a Jenkins job? If yes, please list: 1. Jenkins job: 2. Profile: Steps to Reproduce: 1. Scale one machineset to 3 (for example), so that we make sure that there are 3 nodes in the same zone. $ oc get machineset -n openshift-machine-api NAME DESIRED CURRENT READY AVAILABLE AGE test22-b6886-worker-a 1 1 1 1 89m test22-b6886-worker-b 1 1 1 1 89m test22-b6886-worker-c 1 1 1 1 89m test22-b6886-worker-f 0 0 89m $ oc scale machineset -n openshift-machine-api test22-b6886-worker-a --replicas=3 machineset.machine.openshift.io/test22-b6886-worker-a scaled 2. Create a machineconfig to force an update in the nodes' configuration $ cat << EOF | oc create -f - apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: mco-test-file spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:,MCO%20test%20file%20order%0A path: /etc/mco-test-file-order EOF 3. Check the order used to update the nodes $ watch "oc get node -l node-role.kubernetes.io/worker --sort-by '.metadata.labels.topology\.kubernetes\.io/zone' -ocustom-columns='ZONE:.metadata.labels.topology\.kubernetes\.io/zone,TIMES:.metadata.creationTimestamp,NAME:.metadata.name,MCO-STATE:.metadata.annotations.machineconfiguration\.openshift\.io/state'" ZONE TIMES NAME MCO-STATE us-central1-a 2022-04-19T08:56:44Z test22-b6886-worker-a-5rndd.c.openshift-qe.internal Done us-central1-a 2022-04-19T08:56:52Z test22-b6886-worker-a-lzglj.c.openshift-qe.internal Working us-central1-a 2022-04-19T07:35:56Z test22-b6886-worker-a-sgvs8.c.openshift-qe.internal Done us-central1-b 2022-04-19T07:36:00Z test22-b6886-worker-b-g7zfv.c.openshift-qe.internal Done us-central1-c 2022-04-19T07:35:53Z test22-b6886-worker-c-jtd7b.c.openshift-qe.internal Done Actual results: The order in which the nodes in the same zone are updated is not always "oldest first". Expected results: Nodes in the same zone should be updated by node age (oldest first). Additional info: