Bug 1869876 - Cluster upgrades silently stall when machine config pools are paused
Summary: Cluster upgrades silently stall when machine config pools are paused
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.6.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-18 20:58 UTC by rvanderp
Modified: 2020-10-27 16:29 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:29:05 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2009 0 None closed Bug 1869876: Cluster upgrades silently stall when machine config pools are paused 2021-02-18 06:17:59 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:29:38 UTC

Description rvanderp 2020-08-18 20:58:42 UTC
Description of problem:
When performing a cluster upgrade with machine config pools configured as stalled, the upgrade will stall while trying to upgrade the MCO.  There is no indication in the events or logs associated with the MCO as to the reason why the upgrade is not progressing.  

Version-Release number of selected component (if applicable):
4.6

How reproducible:
On demand 

Steps to Reproduce:
1. Configure `worker` machine config to be 'Paused' 
2. Update the `99-worker-ssh` machine config
3. Wait for the machine config pool to progress to 'Updating=true'

Actual results:
Machine config pool transitions to 'Updating=true' but never completes.  There is no indication in logs or events as to this condition in the machine-config-controller logs or events in the openshift-machine-config-operator namespaces.

Expected results:
machine-config-controller should communicate that the 'Updating' state will never progress to completion.  

Additional info:

Comment 3 Antonio Murdaca 2020-08-25 11:05:16 UTC
Tentatively targeting 4.6 but given the low priority we might push this forward to a z release or 4.7

Thanks for helping on this, much appreciated!

Comment 6 Michael Nguyen 2020-09-25 14:28:02 UTC
Verified on 4.6.0-0.nightly-2020-09-25-085318

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-25-085318   True        False         62m     Cluster version is 4.6.0-0.nightly-2020-09-25-085318

$ oc describe mcp/worker
Name:         worker
Namespace:    
Labels:       machineconfiguration.openshift.io/mco-built-in=
              pools.operator.machineconfiguration.openshift.io/worker=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2020-09-25T12:42:13Z
  Generation:          3
  Managed Fields:
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:machineconfiguration.openshift.io/mco-built-in:
          f:pools.operator.machineconfiguration.openshift.io/worker:
      f:spec:
        .:
        f:configuration:
        f:machineConfigSelector:
          .:
          f:matchLabels:
            .:
            f:machineconfiguration.openshift.io/role:
        f:nodeSelector:
          .:
          f:matchLabels:
            .:
            f:node-role.kubernetes.io/worker:
        f:paused:
    Manager:      machine-config-operator
    Operation:    Update
    Time:         2020-09-25T12:42:13Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:configuration:
          f:name:
          f:source:
      f:status:
        .:
        f:conditions:
        f:configuration:
          .:
          f:name:
          f:source:
        f:degradedMachineCount:
        f:machineCount:
        f:observedGeneration:
        f:readyMachineCount:
        f:unavailableMachineCount:
        f:updatedMachineCount:
    Manager:         machine-config-controller
    Operation:       Update
    Time:            2020-09-25T12:54:29Z
  Resource Version:  27091
  Self Link:         /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker
  UID:               0ef14384-761e-43fb-abe0-ebd88d2302dd
Spec:
  Configuration:
    Name:  rendered-worker-0825ccbd713febb2260f339945806b66
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-worker
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-generated-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-ssh
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  worker
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/worker:  
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2020-09-25T12:44:22Z
    Message:               
    Reason:                
    Status:                False
    Type:                  NodeDegraded
    Last Transition Time:  2020-09-25T12:44:22Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2020-09-25T12:44:33Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2020-09-25T12:54:29Z
    Message:               All nodes are updated with rendered-worker-0825ccbd713febb2260f339945806b66
    Reason:                
    Status:                True
    Type:                  Updated
    Last Transition Time:  2020-09-25T12:54:29Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updating
  Configuration:
    Name:  rendered-worker-0825ccbd713febb2260f339945806b66
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-worker
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-generated-registries
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-ssh
  Degraded Machine Count:     0
  Machine Count:              3
  Observed Generation:        3
  Ready Machine Count:        3
  Unavailable Machine Count:  0
  Updated Machine Count:      3
Events:                       <none>

$ oc edit mcp/worker
machineconfigpool.machineconfiguration.openshift.io/worker edited
$ oc describe mcp/worker
Name:         worker
Namespace:    
Labels:       machineconfiguration.openshift.io/mco-built-in=
              pools.operator.machineconfiguration.openshift.io/worker=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2020-09-25T12:42:13Z
  Generation:          4
  Managed Fields:
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:machineconfiguration.openshift.io/mco-built-in:
          f:pools.operator.machineconfiguration.openshift.io/worker:
      f:spec:
        .:
        f:configuration:
        f:machineConfigSelector:
          .:
          f:matchLabels:
            .:
            f:machineconfiguration.openshift.io/role:
        f:nodeSelector:
          .:
          f:matchLabels:
            .:
            f:node-role.kubernetes.io/worker:
    Manager:      machine-config-operator
    Operation:    Update
    Time:         2020-09-25T12:42:13Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:configuration:
          f:name:
          f:source:
      f:status:
        .:
        f:conditions:
        f:configuration:
          .:
          f:name:
          f:source:
        f:degradedMachineCount:
        f:machineCount:
        f:observedGeneration:
        f:readyMachineCount:
        f:unavailableMachineCount:
        f:updatedMachineCount:
    Manager:      machine-config-controller
    Operation:    Update
    Time:         2020-09-25T12:54:29Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:paused:
    Manager:         oc
    Operation:       Update
    Time:            2020-09-25T14:13:37Z
  Resource Version:  93236
  Self Link:         /apis/machineconfiguration.openshift.io/v1/machineconfigpools/worker
  UID:               0ef14384-761e-43fb-abe0-ebd88d2302dd
Spec:
  Configuration:
    Name:  rendered-worker-0825ccbd713febb2260f339945806b66
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-worker
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-generated-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-ssh
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  worker
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/worker:  
  Paused:                              true
Status:
  Conditions:
    Last Transition Time:  2020-09-25T12:44:22Z
    Message:               
    Reason:                
    Status:                False
    Type:                  NodeDegraded
    Last Transition Time:  2020-09-25T12:44:22Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Degraded
    Last Transition Time:  2020-09-25T12:44:33Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2020-09-25T12:54:29Z
    Message:               All nodes are updated with rendered-worker-0825ccbd713febb2260f339945806b66
    Reason:                
    Status:                True
    Type:                  Updated
    Last Transition Time:  2020-09-25T12:54:29Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updating
  Configuration:
    Name:  rendered-worker-0825ccbd713febb2260f339945806b66
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-worker
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-generated-registries
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-ssh
  Degraded Machine Count:     0
  Machine Count:              3
  Observed Generation:        3
  Ready Machine Count:        3
  Unavailable Machine Count:  0
  Updated Machine Count:      3
Events:                       <none>
$ cp ../file.yaml .
$ cat file.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-file
spec:
  config:
    ignition:
      version: 2.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf;base64,c2VydmVyIGZvby5leGFtcGxlLm5ldCBtYXhkZWxheSAwLjQgb2ZmbGluZQpzZXJ2ZXIgYmFyLmV4YW1wbGUubmV0IG1heGRlbGF5IDAuNCBvZmZsaW5lCnNlcnZlciBiYXouZXhhbXBsZS5uZXQgbWF4ZGVsYXkgMC40IG9mZmxpbmUK
        filesystem: root
        mode: 0644
        path: /etc/test

$ oc create -f file.yaml
machineconfig.machineconfiguration.openshift.io/test-file created
                                                                                     2.2.0             4s
$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b   3.1.0             92m
00-worker                                          a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b   3.1.0             92m
01-master-container-runtime                        a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b   3.1.0             92m
01-master-kubelet                                  a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b   3.1.0             92m
01-worker-container-runtime                        a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b   3.1.0             92m
01-worker-kubelet                                  a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b   3.1.0             92m
99-master-generated-registries                     a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b   3.1.0             92m
99-master-ssh                                                                                 3.1.0             102m
99-worker-generated-registries                     a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b   3.1.0             92m
99-worker-ssh                                                                                 3.1.0             102m
rendered-master-ec4d762b46b2b709eb29fed299628864   a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b   3.1.0             92m
rendered-worker-0825ccbd713febb2260f339945806b66   a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b   3.1.0             92m
rendered-worker-669f746a17a568d6e4e3b34fe3e2ed7b   a3c9532c8e8f2efe9b0f739fbd761b32cc0bfa2b   3.1.0             4s
test-file                                                                                     2.2.0             9s

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-ec4d762b46b2b709eb29fed299628864   True      False      False      3              3                   3                     0                      94m
worker   rendered-worker-0825ccbd713febb2260f339945806b66   False     True       False      3              0                   0                     0                      94m

$ oc -n openshift-machine-config-operator get pods
NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-5555dd5b85-zrjtz   1/1     Running   0          96m
machine-config-daemon-5ct9t                  2/2     Running   0          87m
machine-config-daemon-7s4fd                  2/2     Running   0          97m
machine-config-daemon-g6vtj                  2/2     Running   0          86m
machine-config-daemon-gpzp4                  2/2     Running   0          97m
machine-config-daemon-nxr7j                  2/2     Running   0          86m
machine-config-daemon-pcq79                  2/2     Running   0          97m
machine-config-operator-5749976cd6-m225p     1/1     Running   0          106m
machine-config-server-7k9h6                  1/1     Running   0          95m
machine-config-server-d42m2                  1/1     Running   0          95m
machine-config-server-dm4f6                  1/1     Running   0          95m


$ oc -n openshift-machine-config-operator logs machine-config-controller-5555dd5b85-zrjtz
...SNIP...
E0925 14:16:34.684307       1 render_controller.go:459] Error updating MachineConfigPool worker: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the latest version and try again
I0925 14:16:34.684329       1 render_controller.go:376] Error syncing machineconfigpool worker: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the latest version and try again
I0925 14:16:39.635872       1 node_controller.go:740] Pool worker is paused and will not update.

Comment 8 errata-xmlrpc 2020-10-27 16:29:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.