Bug 1743361 - Incorrect reporting in ClusterOperator
Summary: Incorrect reporting in ClusterOperator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.2.0
Assignee: Kirsten Garrison
QA Contact: Micah Abbott
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-19 17:56 UTC by Alex Crawford
Modified: 2019-10-16 06:37 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:36:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1066 0 None closed Bug 1743361: fix updating version output 2020-08-12 01:36:43 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:37:22 UTC

Description Alex Crawford 2019-08-19 17:56:26 UTC
Description of problem: The MCO is improperly reporting the status (both the state of the nodes and the target MachineConfig) in the extension field of its ClusterOperator.


Version-Release number of selected component (if applicable): 4.2.0-0.okd-2019-08-19-143649


How reproducible: 2/2


Steps to Reproduce:
1. Boot cluster
2. Create a MachineConfig (I only enabled FIPS in mine)

Actual results (and expected):

Before creating the extra MachineConfig, here's how the MachineConfigs looked:

    $ oc get machineconfigs
    NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
    00-master                                                   bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    00-worker                                                   bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    01-master-container-runtime                                 bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    01-master-kubelet                                           bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    01-worker-container-runtime                                 bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    01-worker-kubelet                                           bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    99-master-9e68df9c-c295-11e9-8f72-5254003ec71c-registries   bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    99-master-ssh                                                                                          2.2.0             121m
    99-worker-9e6b6010-c295-11e9-8f72-5254003ec71c-registries   bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    99-worker-ssh                                                                                          2.2.0             121m
    rendered-master-07a405ed1f29060653a45c236bf5fd6e            bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    rendered-worker-2fe2c50ca1b300646e59b6fad007d537            bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121

And after:

    $ oc get machineconfigs
    NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
    00-master                                                   bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    00-worker                                                   bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    01-master-container-runtime                                 bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    01-master-kubelet                                           bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    01-worker-container-runtime                                 bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    01-worker-kubelet                                           bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    99-master-9e68df9c-c295-11e9-8f72-5254003ec71c-registries   bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    99-master-ssh                                                                                          2.2.0             121m
    99-worker-9e6b6010-c295-11e9-8f72-5254003ec71c-registries   bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    99-worker-fips                                                                                                           17s
    99-worker-ssh                                                                                          2.2.0             121m
    rendered-master-07a405ed1f29060653a45c236bf5fd6e            bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    rendered-worker-2fe2c50ca1b300646e59b6fad007d537            bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             121m
    rendered-worker-b3530b52778ea08aba671f744d5d536a            bea1c28437c25c31bc02b87d39004ac0479b5395   2.2.0             12s

So far, so good. But despite there being a change, the MCO did not report that it was progressing:

    $ oc get co machine-config
    NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE
    machine-config                             4.2.0-0.okd-2019-08-19-143649   True        False         False      118m

And looking more closely at the ClusterOperator, I see the following:

    $ oc get co machine-config -o yaml | jq .status
    status:
      conditions:
      - lastTransitionTime: "2019-08-19T15:30:33Z"
        message: Cluster has deployed 4.2.0-0.okd-2019-08-19-143649
        status: "True"
        type: Available
      - lastTransitionTime: "2019-08-19T15:26:54Z"
        message: Cluster version is 4.2.0-0.okd-2019-08-19-143649
        status: "False"
        type: Progressing
      - lastTransitionTime: "2019-08-19T15:30:32Z"
        status: "False"
        type: Degraded
      - lastTransitionTime: "2019-08-19T15:26:54Z"
        reason: AsExpected
        status: "True"
        type: Upgradeable
      extension:
        master: all 3 nodes are at latest configuration rendered-master-07a405ed1f29060653a45c236bf5fd6e
        worker: 0 (ready 0) out of 2 nodes are updating to latest configuration rendered-worker-2fe2c50ca1b300646e59b6fad007d537

So it says that none of the machines are updating (even though I could see the machine was rebooting) and it says that it's progressing toward rendered-worker-2fe2c50ca1b300646e59b6fad007d537 (when it should have said rendered-worker-b3530b52778ea08aba671f744d5d536a). After that first node came back, I checked again:

    $ oc get co machine-config -o yaml | jq .status
    status:
      conditions:
      - lastTransitionTime: "2019-08-19T15:30:33Z"
        message: Cluster has deployed 4.2.0-0.okd-2019-08-19-143649
        status: "True"
        type: Available
      - lastTransitionTime: "2019-08-19T15:26:54Z"
        message: Cluster version is 4.2.0-0.okd-2019-08-19-143649
        status: "False"
        type: Progressing
      - lastTransitionTime: "2019-08-19T15:30:32Z"
        status: "False"
        type: Degraded
      - lastTransitionTime: "2019-08-19T15:26:54Z"
        reason: AsExpected
        status: "True"
        type: Upgradeable
      extension:
        master: all 3 nodes are at latest configuration rendered-master-07a405ed1f29060653a45c236bf5fd6e
        worker: 1 (ready 1) out of 2 nodes are updating to latest configuration rendered-worker-2fe2c50ca1b300646e59b6fad007d537

So again, it still says that it's progressing to rendered-worker-2fe2c50ca1b300646e59b6fad007d537.

Comment 1 Kirsten Garrison 2019-08-19 18:08:08 UTC
Do you happen to have the must-gather for this cluster?

Comment 2 Kirsten Garrison 2019-08-19 19:14:50 UTC
or MCD logs?

Comment 3 Kirsten Garrison 2019-08-19 22:07:11 UTC
I believe that I've reproduced this issue locally with a non-FIPS MC...

Will do more testing and dig into this further.

Comment 4 Kirsten Garrison 2019-08-19 23:06:11 UTC
Can confirm that this affects masters as well:
```
   master: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-a8f0ae377e3f2b0af5b93ca37709c2f0

```

Comment 5 Kirsten Garrison 2019-08-19 23:07:26 UTC
https://github.com/openshift/machine-config-operator/pull/1066 - WIP, still need to test

Comment 6 Kirsten Garrison 2019-08-19 23:35:19 UTC
However in my tests, I can confirm that when masters are updated we do see:

```
$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED
master   rendered-master-a8f0ae377e3f2b0af5b93ca37709c2f0   False     True       False
```

Comment 7 Kirsten Garrison 2019-08-20 23:43:27 UTC
as a brief update:
I fixed the output issue in my PR. 

However to track the progress of a new MC we'd recommend using 
$ oc get mcp 

and watching the UPDATING column

I believe Progressing as reflected in the CVO represents an operator version change, which adding a new MC does not do:

https://github.com/openshift/machine-config-operator/blob/093e96ef4cdbd15ecda18323dadf4d552fcfd327/pkg/operator/status.go#L117


We are lacking clear documentation explaining this and I will add a doc for this in a follow-on PR.

Comment 9 Michael Nguyen 2019-09-04 17:34:26 UTC
Verified on 4.2.0-0.nightly-2019-09-04-102339.

oc get co machine-config now indicates the correct machineconfig in the co/machine-config status.


$ cat file-drop.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: test-file
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK
        filesystem: root
        mode: 0644
        path: /etc/test
$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED
master   rendered-master-4fb70e93cdca2867233058ac88786760   True      False      False
worker   rendered-worker-be34dec52f9201d3ecaaa5874da6ccb6   True      False      False
$ oc get co machine-config -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-09-04T17:13:44Z"
  generation: 1
  name: machine-config
  resourceVersion: "13923"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-config
  uid: 59c96d47-cf37-11e9-95b3-02d2aae96faa
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-09-04T17:14:28Z"
    message: Cluster has deployed 4.2.0-0.nightly-2019-09-04-102339
    status: "True"
    type: Available
  - lastTransitionTime: "2019-09-04T17:14:28Z"
    message: Cluster version is 4.2.0-0.nightly-2019-09-04-102339
    status: "False"
    type: Progressing
  - lastTransitionTime: "2019-09-04T17:13:44Z"
    status: "False"
    type: Degraded
  - lastTransitionTime: "2019-09-04T17:14:28Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension:
    master: all 3 nodes are at latest configuration rendered-master-4fb70e93cdca2867233058ac88786760
    worker: all 3 nodes are at latest configuration rendered-worker-be34dec52f9201d3ecaaa5874da6ccb6
  relatedObjects:
  - group: ""
    name: openshift-machine-config-operator
    resource: namespaces
  - group: machineconfiguration.openshift.io
    name: master
    resource: machineconfigpools
  - group: machineconfiguration.openshift.io
    name: worker
    resource: machineconfigpools
  - group: machineconfiguration.openshift.io
    name: machine-config-controller
    resource: controllerconfigs
  versions:
  - name: operator
    version: 4.2.0-0.nightly-2019-09-04-102339
$ oc apply -f file-drop.yaml 
machineconfig.machineconfiguration.openshift.io/test-file created
$ oc get mc
NAME                                                        GENERATEDBYCONTROLLER                      IGNITIONVERSION   CREATED
00-master                                                   3b375b425a3bf6ca4189206f8ea4c499376eb71c   2.2.0             13m
00-worker                                                   3b375b425a3bf6ca4189206f8ea4c499376eb71c   2.2.0             13m
01-master-container-runtime                                 3b375b425a3bf6ca4189206f8ea4c499376eb71c   2.2.0             13m
01-master-kubelet                                           3b375b425a3bf6ca4189206f8ea4c499376eb71c   2.2.0             13m
01-worker-container-runtime                                 3b375b425a3bf6ca4189206f8ea4c499376eb71c   2.2.0             13m
01-worker-kubelet                                           3b375b425a3bf6ca4189206f8ea4c499376eb71c   2.2.0             13m
99-master-5ce72f89-cf37-11e9-95b3-02d2aae96faa-registries   3b375b425a3bf6ca4189206f8ea4c499376eb71c   2.2.0             13m
99-master-ssh                                                                                          2.2.0             14m
99-worker-5ce8ba94-cf37-11e9-95b3-02d2aae96faa-registries   3b375b425a3bf6ca4189206f8ea4c499376eb71c   2.2.0             13m
99-worker-ssh                                                                                          2.2.0             14m
rendered-master-4fb70e93cdca2867233058ac88786760            3b375b425a3bf6ca4189206f8ea4c499376eb71c   2.2.0             13m
rendered-master-ab16e8a420ec8a6db44ab06d6e45ec93            3b375b425a3bf6ca4189206f8ea4c499376eb71c   2.2.0             0s
rendered-worker-be34dec52f9201d3ecaaa5874da6ccb6            3b375b425a3bf6ca4189206f8ea4c499376eb71c   2.2.0             13m
test-file                                                                                              2.2.0             5s
$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED
master   rendered-master-4fb70e93cdca2867233058ac88786760   False     True       False
worker   rendered-worker-be34dec52f9201d3ecaaa5874da6ccb6   True      False      False
$ oc get node
NAME                                         STATUS                     ROLES    AGE     VERSION
ip-10-0-133-250.us-west-2.compute.internal   Ready                      master   15m     v1.14.0+7fe5fb087
ip-10-0-133-56.us-west-2.compute.internal    Ready                      worker   8m38s   v1.14.0+7fe5fb087
ip-10-0-145-112.us-west-2.compute.internal   Ready                      master   15m     v1.14.0+7fe5fb087
ip-10-0-148-159.us-west-2.compute.internal   Ready                      worker   8m50s   v1.14.0+7fe5fb087
ip-10-0-169-149.us-west-2.compute.internal   Ready                      worker   8m42s   v1.14.0+7fe5fb087
ip-10-0-170-68.us-west-2.compute.internal    Ready,SchedulingDisabled   master   15m     v1.14.0+7fe5fb087
$ oc get co machine-config -o yaml
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-09-04T17:13:44Z"
  generation: 1
  name: machine-config
  resourceVersion: "16682"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/machine-config
  uid: 59c96d47-cf37-11e9-95b3-02d2aae96faa
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-09-04T17:14:28Z"
    message: Cluster has deployed 4.2.0-0.nightly-2019-09-04-102339
    status: "True"
    type: Available
  - lastTransitionTime: "2019-09-04T17:14:28Z"
    message: Cluster version is 4.2.0-0.nightly-2019-09-04-102339
    status: "False"
    type: Progressing
  - lastTransitionTime: "2019-09-04T17:13:44Z"
    status: "False"
    type: Degraded
  - lastTransitionTime: "2019-09-04T17:14:28Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension:
    lastSyncError: 'error pool master is not ready, retrying. Status: (pool degraded:
      false total: 3, ready 0, updated: 0, unavailable: 0)'
    master: 0 (ready 0) out of 3 nodes are updating to latest configuration rendered-master-ab16e8a420ec8a6db44ab06d6e45ec93
    worker: all 3 nodes are at latest configuration rendered-worker-be34dec52f9201d3ecaaa5874da6ccb6
  relatedObjects:
  - group: ""
    name: openshift-machine-config-operator
    resource: namespaces
  - group: machineconfiguration.openshift.io
    name: master
    resource: machineconfigpools
  - group: machineconfiguration.openshift.io
    name: worker
    resource: machineconfigpools
  - group: machineconfiguration.openshift.io
    name: machine-config-controller
    resource: controllerconfigs
  versions:
  - name: operator
    version: 4.2.0-0.nightly-2019-09-04-102339

$ oc get mcp master -o yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  creationTimestamp: "2019-09-04T17:13:50Z"
  generation: 3
  labels:
    machineconfiguration.openshift.io/mco-built-in: ""
    operator.machineconfiguration.openshift.io/required-for-upgrade: ""
  name: master
  resourceVersion: "16931"
  selfLink: /apis/machineconfiguration.openshift.io/v1/machineconfigpools/master
  uid: 5ce72f89-cf37-11e9-95b3-02d2aae96faa
spec:
  configuration:
    name: rendered-master-ab16e8a420ec8a6db44ab06d6e45ec93
    source:
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 00-master
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-container-runtime
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-kubelet
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-5ce72f89-cf37-11e9-95b3-02d2aae96faa-registries
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-ssh
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: test-file
  machineConfigSelector:
    matchLabels:
      machineconfiguration.openshift.io/role: master
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/master: ""
  paused: false
status:
  conditions:
  - lastTransitionTime: "2019-09-04T17:14:22Z"
    message: ""
    reason: ""
    status: "False"
    type: RenderDegraded
  - lastTransitionTime: "2019-09-04T17:14:27Z"
    message: ""
    reason: ""
    status: "False"
    type: NodeDegraded
  - lastTransitionTime: "2019-09-04T17:14:27Z"
    message: ""
    reason: ""
    status: "False"
    type: Degraded
  - lastTransitionTime: "2019-09-04T17:28:07Z"
    message: ""
    reason: ""
    status: "False"
    type: Updated
  - lastTransitionTime: "2019-09-04T17:28:07Z"
    message: All nodes are updating to rendered-master-ab16e8a420ec8a6db44ab06d6e45ec93
    reason: ""
    status: "True"
    type: Updating
  configuration:
    name: rendered-master-4fb70e93cdca2867233058ac88786760
    source:
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 00-master
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-container-runtime
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 01-master-kubelet
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-5ce72f89-cf37-11e9-95b3-02d2aae96faa-registries
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 99-master-ssh
  degradedMachineCount: 0
  machineCount: 3
  observedGeneration: 3
  readyMachineCount: 0
  unavailableMachineCount: 1
  updatedMachineCount: 0

Comment 10 errata-xmlrpc 2019-10-16 06:36:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.