Bug 1909943 - Upgrade from 4.6 to 4.7 stuck due to write /sys/devices/xxxx/block/sda/queue/scheduler: invalid argument
Summary: Upgrade from 4.6 to 4.7 stuck due to write /sys/devices/xxxx/block/sda/queue/...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Ben Howard
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1913316
TreeView+ depends on / blocked
 
Reported: 2020-12-22 06:37 UTC by Sunil Choudhary
Modified: 2021-04-05 17:36 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:47:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2317 0 None closed Bug 1909943: check for scheduler support before setting 2021-01-27 09:12:16 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:48:11 UTC

Description Sunil Choudhary 2020-12-22 06:37:14 UTC
Description of problem:

While upgrading from 4.6.0-0.nightly-2020-12-20-032710 to 4.7.0-0.nightly-2020-12-21-131655, machine config operator is in degraded state with below message.
Node upgrade45-chuo-x5bnh-w-a-l-rhel-0 is reporting: "write /sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:1/0:0:1:0/block/sda/queue/scheduler: invalid argument"

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-12-20-032710   True        True          3h55m   Working towards 4.7.0-0.nightly-2020-12-21-131655: 30% complete

$ oc get nodes -o wide
NAME                                                 STATUS                     ROLES    AGE   VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
upgrade45-chuo-x5bnh-m-0.c.openshift-qe.internal     Ready                      master   25h   v1.19.0+9c69bdc   10.0.0.108    <none>        Red Hat Enterprise Linux CoreOS 46.82.202012191219-0 (Ootpa)   4.18.0-193.37.1.el8_2.x86_64   cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8
upgrade45-chuo-x5bnh-m-1.c.openshift-qe.internal     Ready                      master   25h   v1.19.0+9c69bdc   10.0.0.109    <none>        Red Hat Enterprise Linux CoreOS 46.82.202012191219-0 (Ootpa)   4.18.0-193.37.1.el8_2.x86_64   cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8
upgrade45-chuo-x5bnh-m-2.c.openshift-qe.internal     Ready                      master   25h   v1.19.0+9c69bdc   10.0.0.107    <none>        Red Hat Enterprise Linux CoreOS 46.82.202012191219-0 (Ootpa)   4.18.0-193.37.1.el8_2.x86_64   cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8
upgrade45-chuo-x5bnh-w-a-0.c.openshift-qe.internal   NotReady                   worker   24h   v1.19.0+9c69bdc   10.0.32.37    <none>        Red Hat Enterprise Linux CoreOS 46.82.202012191219-0 (Ootpa)   4.18.0-193.37.1.el8_2.x86_64   cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8
upgrade45-chuo-x5bnh-w-a-l-rhel-0                    Ready,SchedulingDisabled   worker   24h   v1.18.3+86dc8d1   10.0.32.102                 Red Hat Enterprise Linux Server 7.9 (Maipo)                    3.10.0-1160.11.1.el7.x86_64    cri-o://1.18.4-4.rhaos4.5.git6dee389.el7
upgrade45-chuo-x5bnh-w-b-1.c.openshift-qe.internal   NotReady                   worker   24h   v1.18.3+86dc8d1   10.0.32.101                 Red Hat Enterprise Linux CoreOS 45.82.202012172327-0 (Ootpa)   4.18.0-193.37.1.el8_2.x86_64   cri-o://1.18.4-4.rhaos4.5.git6dee389.el8


$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.nightly-2020-12-21-131655   False       True          True       130m
baremetal                                  4.7.0-0.nightly-2020-12-21-131655   True        False         False      137m
cloud-credential                           4.7.0-0.nightly-2020-12-21-131655   True        False         False      24h
cluster-autoscaler                         4.7.0-0.nightly-2020-12-21-131655   True        False         False      23h
config-operator                            4.7.0-0.nightly-2020-12-21-131655   True        False         False      23h
console                                    4.7.0-0.nightly-2020-12-21-131655   True        False         True       134m
csi-snapshot-controller                    4.7.0-0.nightly-2020-12-21-131655   True        False         False      134m
dns                                        4.6.0-0.nightly-2020-12-20-032710   True        False         True       18h
etcd                                       4.7.0-0.nightly-2020-12-21-131655   True        False         False      24h
image-registry                             4.7.0-0.nightly-2020-12-21-131655   False       True          True       130m
ingress                                    4.7.0-0.nightly-2020-12-21-131655   False       True          True       130m
insights                                   4.7.0-0.nightly-2020-12-21-131655   True        False         False      24h
kube-apiserver                             4.7.0-0.nightly-2020-12-21-131655   True        True          False      24h
kube-controller-manager                    4.7.0-0.nightly-2020-12-21-131655   True        False         False      24h
kube-scheduler                             4.7.0-0.nightly-2020-12-21-131655   True        False         False      24h
kube-storage-version-migrator              4.7.0-0.nightly-2020-12-21-131655   False       False         False      130m
machine-api                                4.7.0-0.nightly-2020-12-21-131655   True        False         False      23h
machine-approver                           4.7.0-0.nightly-2020-12-21-131655   True        False         False      24h
machine-config                             4.6.0-0.nightly-2020-12-20-032710   False       False         True       18h
marketplace                                4.7.0-0.nightly-2020-12-21-131655   True        False         False      134m
monitoring                                 4.6.0-0.nightly-2020-12-20-032710   False       True          True       18h
network                                    4.6.0-0.nightly-2020-12-20-032710   True        True          True       24h
node-tuning                                4.7.0-0.nightly-2020-12-21-131655   True        True          False      136m
openshift-apiserver                        4.7.0-0.nightly-2020-12-21-131655   True        False         False      144m
openshift-controller-manager               4.7.0-0.nightly-2020-12-21-131655   True        False         False      5h4m
openshift-samples                          4.7.0-0.nightly-2020-12-21-131655   True        False         False      134m
operator-lifecycle-manager                 4.7.0-0.nightly-2020-12-21-131655   True        False         False      24h
operator-lifecycle-manager-catalog         4.7.0-0.nightly-2020-12-21-131655   True        False         False      24h
operator-lifecycle-manager-packageserver   4.7.0-0.nightly-2020-12-21-131655   True        False         False      15m
service-ca                                 4.7.0-0.nightly-2020-12-21-131655   True        False         False      24h
storage                                    4.7.0-0.nightly-2020-12-21-131655   True        True          False      134m
	
$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          eb9778355a9020673e8ce9aee092cb98d80cde5e   3.1.0             25h
00-worker                                          eb9778355a9020673e8ce9aee092cb98d80cde5e   3.1.0             25h
01-master-container-runtime                        eb9778355a9020673e8ce9aee092cb98d80cde5e   3.1.0             25h
01-master-kubelet                                  eb9778355a9020673e8ce9aee092cb98d80cde5e   3.1.0             25h
01-worker-container-runtime                        eb9778355a9020673e8ce9aee092cb98d80cde5e   3.1.0             25h
01-worker-kubelet                                  eb9778355a9020673e8ce9aee092cb98d80cde5e   3.1.0             25h
99-master-generated-crio-capabilities                                                         2.2.0             25h
99-master-generated-registries                     eb9778355a9020673e8ce9aee092cb98d80cde5e   3.1.0             19h
99-master-ssh                                                                                 2.2.0             25h
99-worker-generated-crio-capabilities                                                         2.2.0             25h
99-worker-generated-registries                     eb9778355a9020673e8ce9aee092cb98d80cde5e   3.1.0             19h
99-worker-ssh                                                                                 2.2.0             25h
rendered-master-0495d6d377fa23ebd92fd9c500e299b8   eb9778355a9020673e8ce9aee092cb98d80cde5e   3.1.0             19h
rendered-master-8ce7cd4511a9bfac379a3acfab1c645e   d7ca39367eb7368c1bfdb8b854faa8af9526fa5e   2.2.0             19h
rendered-master-b90fe26e5becf05bc0058e55103ceb04   d7ca39367eb7368c1bfdb8b854faa8af9526fa5e   2.2.0             25h
rendered-worker-93403b92e87cea3f791bb212d05bc44f   d7ca39367eb7368c1bfdb8b854faa8af9526fa5e   2.2.0             25h
rendered-worker-9edb72930638049779a56f9b0d0690a5   eb9778355a9020673e8ce9aee092cb98d80cde5e   3.1.0             19h


$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-0495d6d377fa23ebd92fd9c500e299b8   True      False      False      3              3                   3                     0                      23h
worker   rendered-worker-93403b92e87cea3f791bb212d05bc44f   False     True       True       3              0                   1                     1                      23h

$ oc describe mcp worker
Name:         worker
Namespace:    
Labels:       machineconfiguration.openshift.io/mco-built-in=
              pools.operator.machineconfiguration.openshift.io/worker=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2020-12-21T05:27:50Z
  Generation:          3
  Managed Fields:
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:machineconfiguration.openshift.io/mco-built-in:
          f:pools.operator.machineconfiguration.openshift.io/worker:
      f:spec:
        .:
        f:configuration:
        f:machineConfigSelector:
          .:
          f:matchLabels:
            .:
            f:machineconfiguration.openshift.io/role:
        f:nodeSelector:
          .:
          f:matchLabels:
            .:
            f:node-role.kubernetes.io/worker:
        f:paused:
    Manager:      machine-config-operator
    Operation:    Update
    Time:         2020-12-21T10:30:03Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:configuration:
          f:name:
          f:source:
      f:status:
        .:
        f:conditions:
        f:configuration:
          .:
          f:name:
          f:source:
        f:degradedMachineCount:
        f:machineCount:
        f:observedGeneration:
        f:readyMachineCount:
        f:unavailableMachineCount:
        f:updatedMachineCount:
    Manager:         machine-config-controller
    Operation:       Update
    Time:            2020-12-21T10:42:57Z
  Resource Version:  438812
  UID:               acf4cf2e-b5c8-475f-a255-a5ae7c0f8ba3
Spec:
  Configuration:
    Name:  rendered-worker-9edb72930638049779a56f9b0d0690a5
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-worker
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-generated-crio-capabilities
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-generated-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-ssh
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  worker
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/worker:  
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2020-12-21T05:28:12Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2020-12-21T10:33:40Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2020-12-21T10:33:40Z
    Message:               All nodes are updating to rendered-worker-9edb72930638049779a56f9b0d0690a5
    Reason:                
    Status:                True
    Type:                  Updating
    Last Transition Time:  2020-12-21T10:39:24Z
    Message:               Node upgrade45-chuo-x5bnh-w-a-l-rhel-0 is reporting: "write /sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:1/0:0:1:0/block/sda/queue/scheduler: invalid argument"
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded
    Last Transition Time:  2020-12-21T10:39:24Z
    Message:               
    Reason:                
    Status:                True
    Type:                  Degraded
  Configuration:
    Name:  rendered-worker-93403b92e87cea3f791bb212d05bc44f
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-worker
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-acf4cf2e-b5c8-475f-a255-a5ae7c0f8ba3-registries
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-generated-crio-capabilities
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-ssh
  Degraded Machine Count:     1
  Machine Count:              3
  Observed Generation:        3
  Ready Machine Count:        0
  Unavailable Machine Count:  3
  Updated Machine Count:      1
Events:                       <none>


Version-Release number of selected component (if applicable):

UPI on GCP

Comment 2 Sunil Choudhary 2021-01-07 11:40:25 UTC
I still see the issue with 4.7.0-0.nightly-2021-01-06-222035, however in this instance the error I see is "error enabling unit: Failed to execute operation: File exists\n"

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.9     True        True          64m     Working towards 4.7.0-0.nightly-2021-01-06-222035: 84% complete


$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.7.0-0.nightly-2021-01-06-222035   True        False         False      2m33s
baremetal                                  4.7.0-0.nightly-2021-01-06-222035   True        False         False      41m
cloud-credential                           4.7.0-0.nightly-2021-01-06-222035   True        False         False      126m
cluster-autoscaler                         4.7.0-0.nightly-2021-01-06-222035   True        False         False      122m
config-operator                            4.7.0-0.nightly-2021-01-06-222035   True        False         False      123m
console                                    4.7.0-0.nightly-2021-01-06-222035   True        False         False      7m31s
csi-snapshot-controller                    4.7.0-0.nightly-2021-01-06-222035   True        False         False      7m26s
dns                                        4.7.0-0.nightly-2021-01-06-222035   True        False         False      121m
etcd                                       4.7.0-0.nightly-2021-01-06-222035   True        False         False      121m
image-registry                             4.7.0-0.nightly-2021-01-06-222035   True        False         False      113m
ingress                                    4.7.0-0.nightly-2021-01-06-222035   True        False         False      113m
insights                                   4.7.0-0.nightly-2021-01-06-222035   True        False         False      123m
kube-apiserver                             4.7.0-0.nightly-2021-01-06-222035   True        False         False      120m
kube-controller-manager                    4.7.0-0.nightly-2021-01-06-222035   True        False         False      120m
kube-scheduler                             4.7.0-0.nightly-2021-01-06-222035   True        False         False      120m
kube-storage-version-migrator              4.7.0-0.nightly-2021-01-06-222035   True        False         False      12m
machine-api                                4.7.0-0.nightly-2021-01-06-222035   True        False         False      119m
machine-approver                           4.7.0-0.nightly-2021-01-06-222035   True        False         False      122m
machine-config                             4.6.9                               False       True          True       25m
marketplace                                4.7.0-0.nightly-2021-01-06-222035   True        False         False      13m
monitoring                                 4.7.0-0.nightly-2021-01-06-222035   True        False         False      112m
network                                    4.7.0-0.nightly-2021-01-06-222035   True        False         False      28m
node-tuning                                4.7.0-0.nightly-2021-01-06-222035   True        False         False      38m
openshift-apiserver                        4.7.0-0.nightly-2021-01-06-222035   True        False         False      5m20s
openshift-controller-manager               4.7.0-0.nightly-2021-01-06-222035   True        False         False      121m
openshift-samples                          4.7.0-0.nightly-2021-01-06-222035   True        False         False      38m
operator-lifecycle-manager                 4.7.0-0.nightly-2021-01-06-222035   True        False         False      122m
operator-lifecycle-manager-catalog         4.7.0-0.nightly-2021-01-06-222035   True        False         False      122m
operator-lifecycle-manager-packageserver   4.7.0-0.nightly-2021-01-06-222035   True        False         False      7m6s
service-ca                                 4.7.0-0.nightly-2021-01-06-222035   True        False         False      123m
storage                                    4.7.0-0.nightly-2021-01-06-222035   True        False         False      6m33s


$ oc get nodes -o wide
NAME                                        STATUS                     ROLES    AGE    VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-52-165.us-east-2.compute.internal   Ready                      master   126m   v1.20.0+b1e9f0d   10.0.52.165   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101060443-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-53-183.us-east-2.compute.internal   Ready                      worker   67m    v1.19.0+9c69bdc   10.0.53.183   <none>        Red Hat Enterprise Linux Server 7.9 (Maipo)                    3.10.0-1160.11.1.el7.x86_64    cri-o://1.19.0-118.rhaos4.6.gitf51f94a.el7
ip-10-0-60-33.us-east-2.compute.internal    Ready,SchedulingDisabled   worker   67m    v1.19.0+9c69bdc   10.0.60.33    <none>        Red Hat Enterprise Linux Server 7.9 (Maipo)                    3.10.0-1160.11.1.el7.x86_64    cri-o://1.19.0-118.rhaos4.6.gitf51f94a.el7
ip-10-0-63-181.us-east-2.compute.internal   Ready                      master   126m   v1.20.0+b1e9f0d   10.0.63.181   <none>        Red Hat Enterprise Linux CoreOS 47.83.202101060443-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39
ip-10-0-74-41.us-east-2.compute.internal    Ready                      master   126m   v1.20.0+b1e9f0d   10.0.74.41    <none>        Red Hat Enterprise Linux CoreOS 47.83.202101060443-0 (Ootpa)   4.18.0-240.10.1.el8_3.x86_64   cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39


$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-15480014538085b4c551d98d493e248d   True      False      False      3              3                   3                     0                      124m
worker   rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4   False     True       True       2              0                   0                     1                      124m


$ oc describe mcp worker
Name:         worker
Namespace:    
Labels:       machineconfiguration.openshift.io/mco-built-in=
              pools.operator.machineconfiguration.openshift.io/worker=
Annotations:  <none>
API Version:  machineconfiguration.openshift.io/v1
Kind:         MachineConfigPool
Metadata:
  Creation Timestamp:  2021-01-07T09:32:57Z
  Generation:          5
  Managed Fields:
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:machineconfiguration.openshift.io/mco-built-in:
          f:pools.operator.machineconfiguration.openshift.io/worker:
      f:spec:
        .:
        f:configuration:
        f:machineConfigSelector:
          .:
          f:matchLabels:
            .:
            f:machineconfiguration.openshift.io/role:
        f:nodeSelector:
          .:
          f:matchLabels:
            .:
            f:node-role.kubernetes.io/worker:
        f:paused:
    Manager:      machine-config-operator
    Operation:    Update
    Time:         2021-01-07T09:32:57Z
    API Version:  machineconfiguration.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:configuration:
          f:name:
          f:source:
      f:status:
        .:
        f:conditions:
        f:configuration:
          .:
          f:name:
          f:source:
        f:degradedMachineCount:
        f:machineCount:
        f:observedGeneration:
        f:readyMachineCount:
        f:unavailableMachineCount:
        f:updatedMachineCount:
    Manager:         machine-config-controller
    Operation:       Update
    Time:            2021-01-07T10:32:30Z
  Resource Version:  74388
  UID:               42d6af38-c165-46a3-8d1d-9c86e156411b
Spec:
  Configuration:
    Name:  rendered-worker-3b8eb936c7f50c4d1664fb66186e51fc
    Source:
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         00-worker
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-container-runtime
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         01-worker-kubelet
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-fips
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-generated-registries
      API Version:  machineconfiguration.openshift.io/v1
      Kind:         MachineConfig
      Name:         99-worker-ssh
  Machine Config Selector:
    Match Labels:
      machineconfiguration.openshift.io/role:  worker
  Node Selector:
    Match Labels:
      node-role.kubernetes.io/worker:  
  Paused:                              false
Status:
  Conditions:
    Last Transition Time:  2021-01-07T09:35:17Z
    Message:               
    Reason:                
    Status:                False
    Type:                  RenderDegraded
    Last Transition Time:  2021-01-07T11:16:38Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2021-01-07T11:16:38Z
    Message:               All nodes are updating to rendered-worker-3b8eb936c7f50c4d1664fb66186e51fc
    Reason:                
    Status:                True
    Type:                  Updating
    Last Transition Time:  2021-01-07T11:18:48Z
    Message:               Node ip-10-0-60-33.us-east-2.compute.internal is reporting: "error enabling unit: Failed to execute operation: File exists\n"
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded
    Last Transition Time:  2021-01-07T11:18:48Z
    Message:               
    Reason:                
    Status:                True
    Type:                  Degraded
  Configuration:
    Name:  rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4
    Source:
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   00-worker
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-container-runtime
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   01-worker-kubelet
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-fips
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-generated-registries
      API Version:            machineconfiguration.openshift.io/v1
      Kind:                   MachineConfig
      Name:                   99-worker-ssh
  Degraded Machine Count:     1
  Machine Count:              2
  Observed Generation:        5
  Ready Machine Count:        0
  Unavailable Machine Count:  1
  Updated Machine Count:      0
Events:
  Type    Reason            Age   From                                    Message
  ----    ------            ----  ----                                    -------
  Normal  SetDesiredConfig  97m   machineconfigcontroller-nodecontroller  Targeted node ip-10-0-63-136.us-east-2.compute.internal to config rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4
  Normal  SetDesiredConfig  94m   machineconfigcontroller-nodecontroller  Targeted node ip-10-0-71-35.us-east-2.compute.internal to config rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4
  Normal  SetDesiredConfig  92m   machineconfigcontroller-nodecontroller  Targeted node ip-10-0-53-113.us-east-2.compute.internal to config rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4
  Normal  SetDesiredConfig  20m   machineconfigcontroller-nodecontroller  Targeted node ip-10-0-60-33.us-east-2.compute.internal to config rendered-worker-3b8eb936c7f50c4d1664fb66186e51fc


$ oc describe node ip-10-0-60-33.us-east-2.compute.internal
Name:               ip-10-0-60-33.us-east-2.compute.internal
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m4.xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-east-2
                    failure-domain.beta.kubernetes.io/zone=us-east-2a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-60-33.us-east-2.compute.internal
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=m4.xlarge
                    node.openshift.io/os_id=rhel
                    topology.ebs.csi.aws.com/zone=us-east-2a
                    topology.kubernetes.io/region=us-east-2
                    topology.kubernetes.io/zone=us-east-2a
Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-06d3f727f99a1d2da"}
                    k8s.ovn.org/l3-gateway-config:
                      {"default":{"mode":"shared","interface-id":"br-ex_ip-10-0-60-33.us-east-2.compute.internal","mac-address":"02:20:5c:5f:8e:92","ip-addresse...
                    k8s.ovn.org/node-chassis-id: 80072e11-3efc-4667-9f3f-63f3d8a2282a
                    k8s.ovn.org/node-local-nat-ip: {"default":["169.254.10.233"]}
                    k8s.ovn.org/node-mgmt-port-mac-address: 02:ef:71:ca:a0:b9
                    k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.0.60.33/20"}
                    k8s.ovn.org/node-subnets: {"default":"10.131.2.0/23"}
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-3b8eb936c7f50c4d1664fb66186e51fc
                    machineconfiguration.openshift.io/reason: error enabling unit: Failed to execute operation: File exists
                    machineconfiguration.openshift.io/ssh: accessed
                    machineconfiguration.openshift.io/state: Degraded
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 07 Jan 2021 15:59:02 +0530
Taints:             node.kubernetes.io/unschedulable:NoSchedule
Unschedulable:      true
Lease:
  HolderIdentity:  ip-10-0-60-33.us-east-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Thu, 07 Jan 2021 17:07:14 +0530
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 07 Jan 2021 17:05:04 +0530   Thu, 07 Jan 2021 15:59:02 +0530   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 07 Jan 2021 17:05:04 +0530   Thu, 07 Jan 2021 15:59:02 +0530   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 07 Jan 2021 17:05:04 +0530   Thu, 07 Jan 2021 15:59:02 +0530   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Thu, 07 Jan 2021 17:05:04 +0530   Thu, 07 Jan 2021 15:59:52 +0530   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.0.60.33
  Hostname:     ip-10-0-60-33.us-east-2.compute.internal
  InternalDNS:  ip-10-0-60-33.us-east-2.compute.internal
Capacity:
  attachable-volumes-aws-ebs:  39
  cpu:                         4
  ephemeral-storage:           31444972Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      16264968Ki
  pods:                        250
Allocatable:
  attachable-volumes-aws-ebs:  39
  cpu:                         3500m
  ephemeral-storage:           27905944324
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      15113992Ki
  pods:                        250
System Info:
  Machine ID:                             9a1c7f6b38b4416bb786db538b6ff55a
  System UUID:                            EC2BE0A1-C9B2-442F-0610-ED4DD17F8AB7
  Boot ID:                                37e728b8-9ae2-4ea6-a7da-4e4ed61e5e1e
  Kernel Version:                         3.10.0-1160.11.1.el7.x86_64
  OS Image:                               Red Hat Enterprise Linux Server 7.9 (Maipo)
  Operating System:                       linux
  Architecture:                           amd64
  Container Runtime Version:              cri-o://1.19.0-118.rhaos4.6.gitf51f94a.el7
  Kubelet Version:                        v1.19.0+9c69bdc
  Kube-Proxy Version:                     v1.19.0+9c69bdc
ProviderID:                               aws:///us-east-2a/i-06d3f727f99a1d2da
Non-terminated Pods:                      (12 in total)
  Namespace                               Name                             CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                               ----                             ------------  ----------  ---------------  -------------  ---
  openshift-cluster-csi-drivers           aws-ebs-csi-driver-node-jpr6b    30m (0%)      0 (0%)      150Mi (1%)       0 (0%)         38m
  openshift-cluster-node-tuning-operator  tuned-zww9c                      10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         38m
  openshift-dns                           dns-default-j47hl                65m (1%)      0 (0%)      131Mi (0%)       0 (0%)         24m
  openshift-image-registry                node-ca-m2jps                    10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         38m
  openshift-ingress-canary                ingress-canary-44gwr             10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         39m
  openshift-machine-config-operator       machine-config-daemon-g8tcl      40m (1%)      0 (0%)      100Mi (0%)       0 (0%)         23m
  openshift-monitoring                    node-exporter-fd95q              9m (0%)       0 (0%)      210Mi (1%)       0 (0%)         39m
  openshift-multus                        multus-8pgxt                     10m (0%)      0 (0%)      150Mi (1%)       0 (0%)         32m
  openshift-multus                        network-metrics-daemon-6vqj9     20m (0%)      0 (0%)      120Mi (0%)       0 (0%)         33m
  openshift-network-diagnostics           network-check-target-mzzdv       10m (0%)      0 (0%)      150Mi (1%)       0 (0%)         34m
  openshift-ovn-kubernetes                ovnkube-node-vs6mv               30m (0%)      0 (0%)      620Mi (4%)       0 (0%)         34m
  openshift-ovn-kubernetes                ovs-node-46vsh                   100m (2%)     0 (0%)      300Mi (2%)       0 (0%)         33m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         344m (9%)     0 (0%)
  memory                      2011Mi (13%)  0 (0%)
  ephemeral-storage           0 (0%)        0 (0%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0
Events:
  Type    Reason              Age   From                                               Message
  ----    ------              ----  ----                                               -------
  Normal  NodeNotSchedulable  20m   kubelet, ip-10-0-60-33.us-east-2.compute.internal  Node ip-10-0-60-33.us-east-2.compute.internal status is now: NodeNotSchedulable

Comment 3 Sunil Choudhary 2021-01-07 11:53:23 UTC
To add to my previous comment, the error I see is reported in bug https://bugzilla.redhat.com/show_bug.cgi?id=1913536.

@Ben Howard could you please review this as I did not see write to scheduler error reported in this bug which I guess is fixed in this bug?

Comment 4 Yu Qi Zhang 2021-01-07 17:27:41 UTC
Yes I believe you are running into https://bugzilla.redhat.com/show_bug.cgi?id=1913536. I will try to verify this fix by verifying a 4.6->4.6 upgrade with the backported fix in https://bugzilla.redhat.com/show_bug.cgi?id=1913316, as 4.6 should not be affected by this.

Comment 5 Seth Jennings 2021-01-07 18:41:23 UTC
I was able to confirm this fix.

I created a release image that included this fix from the 4.6.0-0.ci release stream then upgraded my 4.6.9 cluster to it.

The upgrade is successful, even with the RHEL worker (though I had to work around https://bugzilla.redhat.com/show_bug.cgi?id=1913154).

I see this in the MCD logs on the RHEL node:

$ oc logs machine-config-daemon-jl7rb -c machine-config-daemon  | grep sched
I0107 17:49:12.140304    1880 controlplane.go:50] Device /sys/devices/pci0000:00/0000:00:1e.0/0000:05:01.0/0000:06:0a.0/virtio1/block/vda does not support the bfq scheduler

I then rolled back to 4.6.9 where I hit the error (expectedly as 4.6.9 does not contain the fix):

E0107 18:36:01.363148   41769 writer.go:135] Marking Degraded due to: write /sys/devices/pci0000:00/0000:00:1e.0/0000:05:01.0/0000:06:0a.0/virtio1/block/vda/queue/scheduler: invalid argument

Sunil, you can consider this verified by me.

Comment 6 Sunil Choudhary 2021-01-08 05:11:48 UTC
Thank you Seth,

I will mark this as Verified as the error I see will be fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1913536

Comment 7 Ben Howard 2021-01-18 22:51:20 UTC
No docs needed.

Comment 10 errata-xmlrpc 2021-02-24 15:47:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 11 W. Trevor King 2021-04-05 17:36:44 UTC
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475


Note You need to log in before you can comment on or make changes to this bug.