Bug 1743361
| Summary: | Incorrect reporting in ClusterOperator | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alex Crawford <crawford> |
| Component: | Machine Config Operator | Assignee: | Kirsten Garrison <kgarriso> |
| Status: | CLOSED ERRATA | QA Contact: | Micah Abbott <miabbott> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.1.z | CC: | kgarriso, mnguyen |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-16 06:36:35 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Do you happen to have the must-gather for this cluster? or MCD logs? I believe that I've reproduced this issue locally with a non-FIPS MC... Will do more testing and dig into this further. Can confirm that this affects masters as well: ``` master: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-a8f0ae377e3f2b0af5b93ca37709c2f0 ``` https://github.com/openshift/machine-config-operator/pull/1066 - WIP, still need to test However in my tests, I can confirm that when masters are updated we do see: ``` $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED master rendered-master-a8f0ae377e3f2b0af5b93ca37709c2f0 False True False ``` as a brief update: I fixed the output issue in my PR. However to track the progress of a new MC we'd recommend using $ oc get mcp and watching the UPDATING column I believe Progressing as reflected in the CVO represents an operator version change, which adding a new MC does not do: https://github.com/openshift/machine-config-operator/blob/093e96ef4cdbd15ecda18323dadf4d552fcfd327/pkg/operator/status.go#L117 We are lacking clear documentation explaining this and I will add a doc for this in a follow-on PR. Verified on 4.2.0-0.nightly-2019-09-04-102339.
oc get co machine-config now indicates the correct machineconfig in the co/machine-config status.
$ cat file-drop.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: master
name: test-file
spec:
config:
ignition:
version: 2.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK
filesystem: root
mode: 0644
path: /etc/test
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED
master rendered-master-4fb70e93cdca2867233058ac88786760 True False False
worker rendered-worker-be34dec52f9201d3ecaaa5874da6ccb6 True False False
$ oc get co machine-config -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
creationTimestamp: "2019-09-04T17:13:44Z"
generation: 1
name: machine-config
resourceVersion: "13923"
selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-config
uid: 59c96d47-cf37-11e9-95b3-02d2aae96faa
spec: {}
status:
conditions:
- lastTransitionTime: "2019-09-04T17:14:28Z"
message: Cluster has deployed 4.2.0-0.nightly-2019-09-04-102339
status: "True"
type: Available
- lastTransitionTime: "2019-09-04T17:14:28Z"
message: Cluster version is 4.2.0-0.nightly-2019-09-04-102339
status: "False"
type: Progressing
- lastTransitionTime: "2019-09-04T17:13:44Z"
status: "False"
type: Degraded
- lastTransitionTime: "2019-09-04T17:14:28Z"
reason: AsExpected
status: "True"
type: Upgradeable
extension:
master: all 3 nodes are at latest configuration rendered-master-4fb70e93cdca2867233058ac88786760
worker: all 3 nodes are at latest configuration rendered-worker-be34dec52f9201d3ecaaa5874da6ccb6
relatedObjects:
- group: ""
name: openshift-machine-config-operator
resource: namespaces
- group: machineconfiguration.openshift.io
name: master
resource: machineconfigpools
- group: machineconfiguration.openshift.io
name: worker
resource: machineconfigpools
- group: machineconfiguration.openshift.io
name: machine-config-controller
resource: controllerconfigs
versions:
- name: operator
version: 4.2.0-0.nightly-2019-09-04-102339
$ oc apply -f file-drop.yaml
machineconfig.machineconfiguration.openshift.io/test-file created
$ oc get mc
NAME GENERATEDBYCONTROLLER IGNITIONVERSION CREATED
00-master 3b375b425a3bf6ca4189206f8ea4c499376eb71c 2.2.0 13m
00-worker 3b375b425a3bf6ca4189206f8ea4c499376eb71c 2.2.0 13m
01-master-container-runtime 3b375b425a3bf6ca4189206f8ea4c499376eb71c 2.2.0 13m
01-master-kubelet 3b375b425a3bf6ca4189206f8ea4c499376eb71c 2.2.0 13m
01-worker-container-runtime 3b375b425a3bf6ca4189206f8ea4c499376eb71c 2.2.0 13m
01-worker-kubelet 3b375b425a3bf6ca4189206f8ea4c499376eb71c 2.2.0 13m
99-master-5ce72f89-cf37-11e9-95b3-02d2aae96faa-registries 3b375b425a3bf6ca4189206f8ea4c499376eb71c 2.2.0 13m
99-master-ssh 2.2.0 14m
99-worker-5ce8ba94-cf37-11e9-95b3-02d2aae96faa-registries 3b375b425a3bf6ca4189206f8ea4c499376eb71c 2.2.0 13m
99-worker-ssh 2.2.0 14m
rendered-master-4fb70e93cdca2867233058ac88786760 3b375b425a3bf6ca4189206f8ea4c499376eb71c 2.2.0 13m
rendered-master-ab16e8a420ec8a6db44ab06d6e45ec93 3b375b425a3bf6ca4189206f8ea4c499376eb71c 2.2.0 0s
rendered-worker-be34dec52f9201d3ecaaa5874da6ccb6 3b375b425a3bf6ca4189206f8ea4c499376eb71c 2.2.0 13m
test-file 2.2.0 5s
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED
master rendered-master-4fb70e93cdca2867233058ac88786760 False True False
worker rendered-worker-be34dec52f9201d3ecaaa5874da6ccb6 True False False
$ oc get node
NAME STATUS ROLES AGE VERSION
ip-10-0-133-250.us-west-2.compute.internal Ready master 15m v1.14.0+7fe5fb087
ip-10-0-133-56.us-west-2.compute.internal Ready worker 8m38s v1.14.0+7fe5fb087
ip-10-0-145-112.us-west-2.compute.internal Ready master 15m v1.14.0+7fe5fb087
ip-10-0-148-159.us-west-2.compute.internal Ready worker 8m50s v1.14.0+7fe5fb087
ip-10-0-169-149.us-west-2.compute.internal Ready worker 8m42s v1.14.0+7fe5fb087
ip-10-0-170-68.us-west-2.compute.internal Ready,SchedulingDisabled master 15m v1.14.0+7fe5fb087
$ oc get co machine-config -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
creationTimestamp: "2019-09-04T17:13:44Z"
generation: 1
name: machine-config
resourceVersion: "16682"
selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-config
uid: 59c96d47-cf37-11e9-95b3-02d2aae96faa
spec: {}
status:
conditions:
- lastTransitionTime: "2019-09-04T17:14:28Z"
message: Cluster has deployed 4.2.0-0.nightly-2019-09-04-102339
status: "True"
type: Available
- lastTransitionTime: "2019-09-04T17:14:28Z"
message: Cluster version is 4.2.0-0.nightly-2019-09-04-102339
status: "False"
type: Progressing
- lastTransitionTime: "2019-09-04T17:13:44Z"
status: "False"
type: Degraded
- lastTransitionTime: "2019-09-04T17:14:28Z"
reason: AsExpected
status: "True"
type: Upgradeable
extension:
lastSyncError: 'error pool master is not ready, retrying. Status: (pool degraded:
false total: 3, ready 0, updated: 0, unavailable: 0)'
master: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-ab16e8a420ec8a6db44ab06d6e45ec93
worker: all 3 nodes are at latest configuration rendered-worker-be34dec52f9201d3ecaaa5874da6ccb6
relatedObjects:
- group: ""
name: openshift-machine-config-operator
resource: namespaces
- group: machineconfiguration.openshift.io
name: master
resource: machineconfigpools
- group: machineconfiguration.openshift.io
name: worker
resource: machineconfigpools
- group: machineconfiguration.openshift.io
name: machine-config-controller
resource: controllerconfigs
versions:
- name: operator
version: 4.2.0-0.nightly-2019-09-04-102339
$ oc get mcp master -o yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
creationTimestamp: "2019-09-04T17:13:50Z"
generation: 3
labels:
machineconfiguration.openshift.io/mco-built-in: ""
operator.machineconfiguration.openshift.io/required-for-upgrade: ""
name: master
resourceVersion: "16931"
selfLink: /apis/machineconfiguration.openshift.io/v1/machineconfigpools/master
uid: 5ce72f89-cf37-11e9-95b3-02d2aae96faa
spec:
configuration:
name: rendered-master-ab16e8a420ec8a6db44ab06d6e45ec93
source:
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 00-master
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 01-master-container-runtime
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 01-master-kubelet
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-master-5ce72f89-cf37-11e9-95b3-02d2aae96faa-registries
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-master-ssh
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: test-file
machineConfigSelector:
matchLabels:
machineconfiguration.openshift.io/role: master
nodeSelector:
matchLabels:
node-role.kubernetes.io/master: ""
paused: false
status:
conditions:
- lastTransitionTime: "2019-09-04T17:14:22Z"
message: ""
reason: ""
status: "False"
type: RenderDegraded
- lastTransitionTime: "2019-09-04T17:14:27Z"
message: ""
reason: ""
status: "False"
type: NodeDegraded
- lastTransitionTime: "2019-09-04T17:14:27Z"
message: ""
reason: ""
status: "False"
type: Degraded
- lastTransitionTime: "2019-09-04T17:28:07Z"
message: ""
reason: ""
status: "False"
type: Updated
- lastTransitionTime: "2019-09-04T17:28:07Z"
message: All nodes are updating to rendered-master-ab16e8a420ec8a6db44ab06d6e45ec93
reason: ""
status: "True"
type: Updating
configuration:
name: rendered-master-4fb70e93cdca2867233058ac88786760
source:
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 00-master
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 01-master-container-runtime
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 01-master-kubelet
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-master-5ce72f89-cf37-11e9-95b3-02d2aae96faa-registries
- apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
name: 99-master-ssh
degradedMachineCount: 0
machineCount: 3
observedGeneration: 3
readyMachineCount: 0
unavailableMachineCount: 1
updatedMachineCount: 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |
Description of problem: The MCO is improperly reporting the status (both the state of the nodes and the target MachineConfig) in the extension field of its ClusterOperator. Version-Release number of selected component (if applicable): 4.2.0-0.okd-2019-08-19-143649 How reproducible: 2/2 Steps to Reproduce: 1. Boot cluster 2. Create a MachineConfig (I only enabled FIPS in mine) Actual results (and expected): Before creating the extra MachineConfig, here's how the MachineConfigs looked: $ oc get machineconfigs NAME GENERATEDBYCONTROLLER IGNITIONVERSION CREATED 00-master bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 00-worker bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 01-master-container-runtime bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 01-master-kubelet bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 01-worker-container-runtime bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 01-worker-kubelet bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 99-master-9e68df9c-c295-11e9-8f72-5254003ec71c-registries bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 99-master-ssh 2.2.0 121m 99-worker-9e6b6010-c295-11e9-8f72-5254003ec71c-registries bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 99-worker-ssh 2.2.0 121m rendered-master-07a405ed1f29060653a45c236bf5fd6e bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m rendered-worker-2fe2c50ca1b300646e59b6fad007d537 bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121 And after: $ oc get machineconfigs NAME GENERATEDBYCONTROLLER IGNITIONVERSION CREATED 00-master bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 00-worker bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 01-master-container-runtime bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 01-master-kubelet bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 01-worker-container-runtime bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 01-worker-kubelet bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 99-master-9e68df9c-c295-11e9-8f72-5254003ec71c-registries bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 99-master-ssh 2.2.0 121m 99-worker-9e6b6010-c295-11e9-8f72-5254003ec71c-registries bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m 99-worker-fips 17s 99-worker-ssh 2.2.0 121m rendered-master-07a405ed1f29060653a45c236bf5fd6e bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m rendered-worker-2fe2c50ca1b300646e59b6fad007d537 bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 121m rendered-worker-b3530b52778ea08aba671f744d5d536a bea1c28437c25c31bc02b87d39004ac0479b5395 2.2.0 12s So far, so good. But despite there being a change, the MCO did not report that it was progressing: $ oc get co machine-config NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE machine-config 4.2.0-0.okd-2019-08-19-143649 True False False 118m And looking more closely at the ClusterOperator, I see the following: $ oc get co machine-config -o yaml | jq .status status: conditions: - lastTransitionTime: "2019-08-19T15:30:33Z" message: Cluster has deployed 4.2.0-0.okd-2019-08-19-143649 status: "True" type: Available - lastTransitionTime: "2019-08-19T15:26:54Z" message: Cluster version is 4.2.0-0.okd-2019-08-19-143649 status: "False" type: Progressing - lastTransitionTime: "2019-08-19T15:30:32Z" status: "False" type: Degraded - lastTransitionTime: "2019-08-19T15:26:54Z" reason: AsExpected status: "True" type: Upgradeable extension: master: all 3 nodes are at latest configuration rendered-master-07a405ed1f29060653a45c236bf5fd6e worker: 0 (ready 0) out of 2 nodes are updating to latest configuration rendered-worker-2fe2c50ca1b300646e59b6fad007d537 So it says that none of the machines are updating (even though I could see the machine was rebooting) and it says that it's progressing toward rendered-worker-2fe2c50ca1b300646e59b6fad007d537 (when it should have said rendered-worker-b3530b52778ea08aba671f744d5d536a). After that first node came back, I checked again: $ oc get co machine-config -o yaml | jq .status status: conditions: - lastTransitionTime: "2019-08-19T15:30:33Z" message: Cluster has deployed 4.2.0-0.okd-2019-08-19-143649 status: "True" type: Available - lastTransitionTime: "2019-08-19T15:26:54Z" message: Cluster version is 4.2.0-0.okd-2019-08-19-143649 status: "False" type: Progressing - lastTransitionTime: "2019-08-19T15:30:32Z" status: "False" type: Degraded - lastTransitionTime: "2019-08-19T15:26:54Z" reason: AsExpected status: "True" type: Upgradeable extension: master: all 3 nodes are at latest configuration rendered-master-07a405ed1f29060653a45c236bf5fd6e worker: 1 (ready 1) out of 2 nodes are updating to latest configuration rendered-worker-2fe2c50ca1b300646e59b6fad007d537 So again, it still says that it's progressing to rendered-worker-2fe2c50ca1b300646e59b6fad007d537.