Description of problem: While upgrading from 4.6.0-0.nightly-2020-12-20-032710 to 4.7.0-0.nightly-2020-12-21-131655, machine config operator is in degraded state with below message. Node upgrade45-chuo-x5bnh-w-a-l-rhel-0 is reporting: "write /sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:1/0:0:1:0/block/sda/queue/scheduler: invalid argument" $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-12-20-032710 True True 3h55m Working towards 4.7.0-0.nightly-2020-12-21-131655: 30% complete $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME upgrade45-chuo-x5bnh-m-0.c.openshift-qe.internal Ready master 25h v1.19.0+9c69bdc 10.0.0.108 <none> Red Hat Enterprise Linux CoreOS 46.82.202012191219-0 (Ootpa) 4.18.0-193.37.1.el8_2.x86_64 cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8 upgrade45-chuo-x5bnh-m-1.c.openshift-qe.internal Ready master 25h v1.19.0+9c69bdc 10.0.0.109 <none> Red Hat Enterprise Linux CoreOS 46.82.202012191219-0 (Ootpa) 4.18.0-193.37.1.el8_2.x86_64 cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8 upgrade45-chuo-x5bnh-m-2.c.openshift-qe.internal Ready master 25h v1.19.0+9c69bdc 10.0.0.107 <none> Red Hat Enterprise Linux CoreOS 46.82.202012191219-0 (Ootpa) 4.18.0-193.37.1.el8_2.x86_64 cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8 upgrade45-chuo-x5bnh-w-a-0.c.openshift-qe.internal NotReady worker 24h v1.19.0+9c69bdc 10.0.32.37 <none> Red Hat Enterprise Linux CoreOS 46.82.202012191219-0 (Ootpa) 4.18.0-193.37.1.el8_2.x86_64 cri-o://1.19.0-26.rhaos4.6.git8a05a29.el8 upgrade45-chuo-x5bnh-w-a-l-rhel-0 Ready,SchedulingDisabled worker 24h v1.18.3+86dc8d1 10.0.32.102 Red Hat Enterprise Linux Server 7.9 (Maipo) 3.10.0-1160.11.1.el7.x86_64 cri-o://1.18.4-4.rhaos4.5.git6dee389.el7 upgrade45-chuo-x5bnh-w-b-1.c.openshift-qe.internal NotReady worker 24h v1.18.3+86dc8d1 10.0.32.101 Red Hat Enterprise Linux CoreOS 45.82.202012172327-0 (Ootpa) 4.18.0-193.37.1.el8_2.x86_64 cri-o://1.18.4-4.rhaos4.5.git6dee389.el8 $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-2020-12-21-131655 False True True 130m baremetal 4.7.0-0.nightly-2020-12-21-131655 True False False 137m cloud-credential 4.7.0-0.nightly-2020-12-21-131655 True False False 24h cluster-autoscaler 4.7.0-0.nightly-2020-12-21-131655 True False False 23h config-operator 4.7.0-0.nightly-2020-12-21-131655 True False False 23h console 4.7.0-0.nightly-2020-12-21-131655 True False True 134m csi-snapshot-controller 4.7.0-0.nightly-2020-12-21-131655 True False False 134m dns 4.6.0-0.nightly-2020-12-20-032710 True False True 18h etcd 4.7.0-0.nightly-2020-12-21-131655 True False False 24h image-registry 4.7.0-0.nightly-2020-12-21-131655 False True True 130m ingress 4.7.0-0.nightly-2020-12-21-131655 False True True 130m insights 4.7.0-0.nightly-2020-12-21-131655 True False False 24h kube-apiserver 4.7.0-0.nightly-2020-12-21-131655 True True False 24h kube-controller-manager 4.7.0-0.nightly-2020-12-21-131655 True False False 24h kube-scheduler 4.7.0-0.nightly-2020-12-21-131655 True False False 24h kube-storage-version-migrator 4.7.0-0.nightly-2020-12-21-131655 False False False 130m machine-api 4.7.0-0.nightly-2020-12-21-131655 True False False 23h machine-approver 4.7.0-0.nightly-2020-12-21-131655 True False False 24h machine-config 4.6.0-0.nightly-2020-12-20-032710 False False True 18h marketplace 4.7.0-0.nightly-2020-12-21-131655 True False False 134m monitoring 4.6.0-0.nightly-2020-12-20-032710 False True True 18h network 4.6.0-0.nightly-2020-12-20-032710 True True True 24h node-tuning 4.7.0-0.nightly-2020-12-21-131655 True True False 136m openshift-apiserver 4.7.0-0.nightly-2020-12-21-131655 True False False 144m openshift-controller-manager 4.7.0-0.nightly-2020-12-21-131655 True False False 5h4m openshift-samples 4.7.0-0.nightly-2020-12-21-131655 True False False 134m operator-lifecycle-manager 4.7.0-0.nightly-2020-12-21-131655 True False False 24h operator-lifecycle-manager-catalog 4.7.0-0.nightly-2020-12-21-131655 True False False 24h operator-lifecycle-manager-packageserver 4.7.0-0.nightly-2020-12-21-131655 True False False 15m service-ca 4.7.0-0.nightly-2020-12-21-131655 True False False 24h storage 4.7.0-0.nightly-2020-12-21-131655 True True False 134m $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master eb9778355a9020673e8ce9aee092cb98d80cde5e 3.1.0 25h 00-worker eb9778355a9020673e8ce9aee092cb98d80cde5e 3.1.0 25h 01-master-container-runtime eb9778355a9020673e8ce9aee092cb98d80cde5e 3.1.0 25h 01-master-kubelet eb9778355a9020673e8ce9aee092cb98d80cde5e 3.1.0 25h 01-worker-container-runtime eb9778355a9020673e8ce9aee092cb98d80cde5e 3.1.0 25h 01-worker-kubelet eb9778355a9020673e8ce9aee092cb98d80cde5e 3.1.0 25h 99-master-generated-crio-capabilities 2.2.0 25h 99-master-generated-registries eb9778355a9020673e8ce9aee092cb98d80cde5e 3.1.0 19h 99-master-ssh 2.2.0 25h 99-worker-generated-crio-capabilities 2.2.0 25h 99-worker-generated-registries eb9778355a9020673e8ce9aee092cb98d80cde5e 3.1.0 19h 99-worker-ssh 2.2.0 25h rendered-master-0495d6d377fa23ebd92fd9c500e299b8 eb9778355a9020673e8ce9aee092cb98d80cde5e 3.1.0 19h rendered-master-8ce7cd4511a9bfac379a3acfab1c645e d7ca39367eb7368c1bfdb8b854faa8af9526fa5e 2.2.0 19h rendered-master-b90fe26e5becf05bc0058e55103ceb04 d7ca39367eb7368c1bfdb8b854faa8af9526fa5e 2.2.0 25h rendered-worker-93403b92e87cea3f791bb212d05bc44f d7ca39367eb7368c1bfdb8b854faa8af9526fa5e 2.2.0 25h rendered-worker-9edb72930638049779a56f9b0d0690a5 eb9778355a9020673e8ce9aee092cb98d80cde5e 3.1.0 19h $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-0495d6d377fa23ebd92fd9c500e299b8 True False False 3 3 3 0 23h worker rendered-worker-93403b92e87cea3f791bb212d05bc44f False True True 3 0 1 1 23h $ oc describe mcp worker Name: worker Namespace: Labels: machineconfiguration.openshift.io/mco-built-in= pools.operator.machineconfiguration.openshift.io/worker= Annotations: <none> API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfigPool Metadata: Creation Timestamp: 2020-12-21T05:27:50Z Generation: 3 Managed Fields: API Version: machineconfiguration.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:labels: .: f:machineconfiguration.openshift.io/mco-built-in: f:pools.operator.machineconfiguration.openshift.io/worker: f:spec: .: f:configuration: f:machineConfigSelector: .: f:matchLabels: .: f:machineconfiguration.openshift.io/role: f:nodeSelector: .: f:matchLabels: .: f:node-role.kubernetes.io/worker: f:paused: Manager: machine-config-operator Operation: Update Time: 2020-12-21T10:30:03Z API Version: machineconfiguration.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:spec: f:configuration: f:name: f:source: f:status: .: f:conditions: f:configuration: .: f:name: f:source: f:degradedMachineCount: f:machineCount: f:observedGeneration: f:readyMachineCount: f:unavailableMachineCount: f:updatedMachineCount: Manager: machine-config-controller Operation: Update Time: 2020-12-21T10:42:57Z Resource Version: 438812 UID: acf4cf2e-b5c8-475f-a255-a5ae7c0f8ba3 Spec: Configuration: Name: rendered-worker-9edb72930638049779a56f9b0d0690a5 Source: API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-container-runtime API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-generated-crio-capabilities API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-generated-registries API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-ssh Machine Config Selector: Match Labels: machineconfiguration.openshift.io/role: worker Node Selector: Match Labels: node-role.kubernetes.io/worker: Paused: false Status: Conditions: Last Transition Time: 2020-12-21T05:28:12Z Message: Reason: Status: False Type: RenderDegraded Last Transition Time: 2020-12-21T10:33:40Z Message: Reason: Status: False Type: Updated Last Transition Time: 2020-12-21T10:33:40Z Message: All nodes are updating to rendered-worker-9edb72930638049779a56f9b0d0690a5 Reason: Status: True Type: Updating Last Transition Time: 2020-12-21T10:39:24Z Message: Node upgrade45-chuo-x5bnh-w-a-l-rhel-0 is reporting: "write /sys/devices/pci0000:00/0000:00:03.0/virtio0/host0/target0:0:1/0:0:1:0/block/sda/queue/scheduler: invalid argument" Reason: 1 nodes are reporting degraded status on sync Status: True Type: NodeDegraded Last Transition Time: 2020-12-21T10:39:24Z Message: Reason: Status: True Type: Degraded Configuration: Name: rendered-worker-93403b92e87cea3f791bb212d05bc44f Source: API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-container-runtime API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-acf4cf2e-b5c8-475f-a255-a5ae7c0f8ba3-registries API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-generated-crio-capabilities API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-ssh Degraded Machine Count: 1 Machine Count: 3 Observed Generation: 3 Ready Machine Count: 0 Unavailable Machine Count: 3 Updated Machine Count: 1 Events: <none> Version-Release number of selected component (if applicable): UPI on GCP
I still see the issue with 4.7.0-0.nightly-2021-01-06-222035, however in this instance the error I see is "error enabling unit: Failed to execute operation: File exists\n" $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.9 True True 64m Working towards 4.7.0-0.nightly-2021-01-06-222035: 84% complete $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.7.0-0.nightly-2021-01-06-222035 True False False 2m33s baremetal 4.7.0-0.nightly-2021-01-06-222035 True False False 41m cloud-credential 4.7.0-0.nightly-2021-01-06-222035 True False False 126m cluster-autoscaler 4.7.0-0.nightly-2021-01-06-222035 True False False 122m config-operator 4.7.0-0.nightly-2021-01-06-222035 True False False 123m console 4.7.0-0.nightly-2021-01-06-222035 True False False 7m31s csi-snapshot-controller 4.7.0-0.nightly-2021-01-06-222035 True False False 7m26s dns 4.7.0-0.nightly-2021-01-06-222035 True False False 121m etcd 4.7.0-0.nightly-2021-01-06-222035 True False False 121m image-registry 4.7.0-0.nightly-2021-01-06-222035 True False False 113m ingress 4.7.0-0.nightly-2021-01-06-222035 True False False 113m insights 4.7.0-0.nightly-2021-01-06-222035 True False False 123m kube-apiserver 4.7.0-0.nightly-2021-01-06-222035 True False False 120m kube-controller-manager 4.7.0-0.nightly-2021-01-06-222035 True False False 120m kube-scheduler 4.7.0-0.nightly-2021-01-06-222035 True False False 120m kube-storage-version-migrator 4.7.0-0.nightly-2021-01-06-222035 True False False 12m machine-api 4.7.0-0.nightly-2021-01-06-222035 True False False 119m machine-approver 4.7.0-0.nightly-2021-01-06-222035 True False False 122m machine-config 4.6.9 False True True 25m marketplace 4.7.0-0.nightly-2021-01-06-222035 True False False 13m monitoring 4.7.0-0.nightly-2021-01-06-222035 True False False 112m network 4.7.0-0.nightly-2021-01-06-222035 True False False 28m node-tuning 4.7.0-0.nightly-2021-01-06-222035 True False False 38m openshift-apiserver 4.7.0-0.nightly-2021-01-06-222035 True False False 5m20s openshift-controller-manager 4.7.0-0.nightly-2021-01-06-222035 True False False 121m openshift-samples 4.7.0-0.nightly-2021-01-06-222035 True False False 38m operator-lifecycle-manager 4.7.0-0.nightly-2021-01-06-222035 True False False 122m operator-lifecycle-manager-catalog 4.7.0-0.nightly-2021-01-06-222035 True False False 122m operator-lifecycle-manager-packageserver 4.7.0-0.nightly-2021-01-06-222035 True False False 7m6s service-ca 4.7.0-0.nightly-2021-01-06-222035 True False False 123m storage 4.7.0-0.nightly-2021-01-06-222035 True False False 6m33s $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-52-165.us-east-2.compute.internal Ready master 126m v1.20.0+b1e9f0d 10.0.52.165 <none> Red Hat Enterprise Linux CoreOS 47.83.202101060443-0 (Ootpa) 4.18.0-240.10.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 ip-10-0-53-183.us-east-2.compute.internal Ready worker 67m v1.19.0+9c69bdc 10.0.53.183 <none> Red Hat Enterprise Linux Server 7.9 (Maipo) 3.10.0-1160.11.1.el7.x86_64 cri-o://1.19.0-118.rhaos4.6.gitf51f94a.el7 ip-10-0-60-33.us-east-2.compute.internal Ready,SchedulingDisabled worker 67m v1.19.0+9c69bdc 10.0.60.33 <none> Red Hat Enterprise Linux Server 7.9 (Maipo) 3.10.0-1160.11.1.el7.x86_64 cri-o://1.19.0-118.rhaos4.6.gitf51f94a.el7 ip-10-0-63-181.us-east-2.compute.internal Ready master 126m v1.20.0+b1e9f0d 10.0.63.181 <none> Red Hat Enterprise Linux CoreOS 47.83.202101060443-0 (Ootpa) 4.18.0-240.10.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 ip-10-0-74-41.us-east-2.compute.internal Ready master 126m v1.20.0+b1e9f0d 10.0.74.41 <none> Red Hat Enterprise Linux CoreOS 47.83.202101060443-0 (Ootpa) 4.18.0-240.10.1.el8_3.x86_64 cri-o://1.20.0-0.rhaos4.7.gitd388528.el8.39 $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-15480014538085b4c551d98d493e248d True False False 3 3 3 0 124m worker rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4 False True True 2 0 0 1 124m $ oc describe mcp worker Name: worker Namespace: Labels: machineconfiguration.openshift.io/mco-built-in= pools.operator.machineconfiguration.openshift.io/worker= Annotations: <none> API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfigPool Metadata: Creation Timestamp: 2021-01-07T09:32:57Z Generation: 5 Managed Fields: API Version: machineconfiguration.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:labels: .: f:machineconfiguration.openshift.io/mco-built-in: f:pools.operator.machineconfiguration.openshift.io/worker: f:spec: .: f:configuration: f:machineConfigSelector: .: f:matchLabels: .: f:machineconfiguration.openshift.io/role: f:nodeSelector: .: f:matchLabels: .: f:node-role.kubernetes.io/worker: f:paused: Manager: machine-config-operator Operation: Update Time: 2021-01-07T09:32:57Z API Version: machineconfiguration.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:spec: f:configuration: f:name: f:source: f:status: .: f:conditions: f:configuration: .: f:name: f:source: f:degradedMachineCount: f:machineCount: f:observedGeneration: f:readyMachineCount: f:unavailableMachineCount: f:updatedMachineCount: Manager: machine-config-controller Operation: Update Time: 2021-01-07T10:32:30Z Resource Version: 74388 UID: 42d6af38-c165-46a3-8d1d-9c86e156411b Spec: Configuration: Name: rendered-worker-3b8eb936c7f50c4d1664fb66186e51fc Source: API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-container-runtime API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-fips API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-generated-registries API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-ssh Machine Config Selector: Match Labels: machineconfiguration.openshift.io/role: worker Node Selector: Match Labels: node-role.kubernetes.io/worker: Paused: false Status: Conditions: Last Transition Time: 2021-01-07T09:35:17Z Message: Reason: Status: False Type: RenderDegraded Last Transition Time: 2021-01-07T11:16:38Z Message: Reason: Status: False Type: Updated Last Transition Time: 2021-01-07T11:16:38Z Message: All nodes are updating to rendered-worker-3b8eb936c7f50c4d1664fb66186e51fc Reason: Status: True Type: Updating Last Transition Time: 2021-01-07T11:18:48Z Message: Node ip-10-0-60-33.us-east-2.compute.internal is reporting: "error enabling unit: Failed to execute operation: File exists\n" Reason: 1 nodes are reporting degraded status on sync Status: True Type: NodeDegraded Last Transition Time: 2021-01-07T11:18:48Z Message: Reason: Status: True Type: Degraded Configuration: Name: rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4 Source: API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 00-worker API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-container-runtime API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 01-worker-kubelet API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-fips API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-generated-registries API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfig Name: 99-worker-ssh Degraded Machine Count: 1 Machine Count: 2 Observed Generation: 5 Ready Machine Count: 0 Unavailable Machine Count: 1 Updated Machine Count: 0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SetDesiredConfig 97m machineconfigcontroller-nodecontroller Targeted node ip-10-0-63-136.us-east-2.compute.internal to config rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4 Normal SetDesiredConfig 94m machineconfigcontroller-nodecontroller Targeted node ip-10-0-71-35.us-east-2.compute.internal to config rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4 Normal SetDesiredConfig 92m machineconfigcontroller-nodecontroller Targeted node ip-10-0-53-113.us-east-2.compute.internal to config rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4 Normal SetDesiredConfig 20m machineconfigcontroller-nodecontroller Targeted node ip-10-0-60-33.us-east-2.compute.internal to config rendered-worker-3b8eb936c7f50c4d1664fb66186e51fc $ oc describe node ip-10-0-60-33.us-east-2.compute.internal Name: ip-10-0-60-33.us-east-2.compute.internal Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=us-east-2 failure-domain.beta.kubernetes.io/zone=us-east-2a kubernetes.io/arch=amd64 kubernetes.io/hostname=ip-10-0-60-33.us-east-2.compute.internal kubernetes.io/os=linux node-role.kubernetes.io/worker= node.kubernetes.io/instance-type=m4.xlarge node.openshift.io/os_id=rhel topology.ebs.csi.aws.com/zone=us-east-2a topology.kubernetes.io/region=us-east-2 topology.kubernetes.io/zone=us-east-2a Annotations: csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-06d3f727f99a1d2da"} k8s.ovn.org/l3-gateway-config: {"default":{"mode":"shared","interface-id":"br-ex_ip-10-0-60-33.us-east-2.compute.internal","mac-address":"02:20:5c:5f:8e:92","ip-addresse... k8s.ovn.org/node-chassis-id: 80072e11-3efc-4667-9f3f-63f3d8a2282a k8s.ovn.org/node-local-nat-ip: {"default":["169.254.10.233"]} k8s.ovn.org/node-mgmt-port-mac-address: 02:ef:71:ca:a0:b9 k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.0.60.33/20"} k8s.ovn.org/node-subnets: {"default":"10.131.2.0/23"} machineconfiguration.openshift.io/currentConfig: rendered-worker-8057ddcdd5ed8f83f444d8ea4f9963c4 machineconfiguration.openshift.io/desiredConfig: rendered-worker-3b8eb936c7f50c4d1664fb66186e51fc machineconfiguration.openshift.io/reason: error enabling unit: Failed to execute operation: File exists machineconfiguration.openshift.io/ssh: accessed machineconfiguration.openshift.io/state: Degraded volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 07 Jan 2021 15:59:02 +0530 Taints: node.kubernetes.io/unschedulable:NoSchedule Unschedulable: true Lease: HolderIdentity: ip-10-0-60-33.us-east-2.compute.internal AcquireTime: <unset> RenewTime: Thu, 07 Jan 2021 17:07:14 +0530 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Thu, 07 Jan 2021 17:05:04 +0530 Thu, 07 Jan 2021 15:59:02 +0530 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Thu, 07 Jan 2021 17:05:04 +0530 Thu, 07 Jan 2021 15:59:02 +0530 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Thu, 07 Jan 2021 17:05:04 +0530 Thu, 07 Jan 2021 15:59:02 +0530 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Thu, 07 Jan 2021 17:05:04 +0530 Thu, 07 Jan 2021 15:59:52 +0530 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.0.60.33 Hostname: ip-10-0-60-33.us-east-2.compute.internal InternalDNS: ip-10-0-60-33.us-east-2.compute.internal Capacity: attachable-volumes-aws-ebs: 39 cpu: 4 ephemeral-storage: 31444972Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 16264968Ki pods: 250 Allocatable: attachable-volumes-aws-ebs: 39 cpu: 3500m ephemeral-storage: 27905944324 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15113992Ki pods: 250 System Info: Machine ID: 9a1c7f6b38b4416bb786db538b6ff55a System UUID: EC2BE0A1-C9B2-442F-0610-ED4DD17F8AB7 Boot ID: 37e728b8-9ae2-4ea6-a7da-4e4ed61e5e1e Kernel Version: 3.10.0-1160.11.1.el7.x86_64 OS Image: Red Hat Enterprise Linux Server 7.9 (Maipo) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.19.0-118.rhaos4.6.gitf51f94a.el7 Kubelet Version: v1.19.0+9c69bdc Kube-Proxy Version: v1.19.0+9c69bdc ProviderID: aws:///us-east-2a/i-06d3f727f99a1d2da Non-terminated Pods: (12 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- openshift-cluster-csi-drivers aws-ebs-csi-driver-node-jpr6b 30m (0%) 0 (0%) 150Mi (1%) 0 (0%) 38m openshift-cluster-node-tuning-operator tuned-zww9c 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 38m openshift-dns dns-default-j47hl 65m (1%) 0 (0%) 131Mi (0%) 0 (0%) 24m openshift-image-registry node-ca-m2jps 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 38m openshift-ingress-canary ingress-canary-44gwr 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 39m openshift-machine-config-operator machine-config-daemon-g8tcl 40m (1%) 0 (0%) 100Mi (0%) 0 (0%) 23m openshift-monitoring node-exporter-fd95q 9m (0%) 0 (0%) 210Mi (1%) 0 (0%) 39m openshift-multus multus-8pgxt 10m (0%) 0 (0%) 150Mi (1%) 0 (0%) 32m openshift-multus network-metrics-daemon-6vqj9 20m (0%) 0 (0%) 120Mi (0%) 0 (0%) 33m openshift-network-diagnostics network-check-target-mzzdv 10m (0%) 0 (0%) 150Mi (1%) 0 (0%) 34m openshift-ovn-kubernetes ovnkube-node-vs6mv 30m (0%) 0 (0%) 620Mi (4%) 0 (0%) 34m openshift-ovn-kubernetes ovs-node-46vsh 100m (2%) 0 (0%) 300Mi (2%) 0 (0%) 33m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 344m (9%) 0 (0%) memory 2011Mi (13%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) attachable-volumes-aws-ebs 0 0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeNotSchedulable 20m kubelet, ip-10-0-60-33.us-east-2.compute.internal Node ip-10-0-60-33.us-east-2.compute.internal status is now: NodeNotSchedulable
To add to my previous comment, the error I see is reported in bug https://bugzilla.redhat.com/show_bug.cgi?id=1913536. @Ben Howard could you please review this as I did not see write to scheduler error reported in this bug which I guess is fixed in this bug?
Yes I believe you are running into https://bugzilla.redhat.com/show_bug.cgi?id=1913536. I will try to verify this fix by verifying a 4.6->4.6 upgrade with the backported fix in https://bugzilla.redhat.com/show_bug.cgi?id=1913316, as 4.6 should not be affected by this.
I was able to confirm this fix. I created a release image that included this fix from the 4.6.0-0.ci release stream then upgraded my 4.6.9 cluster to it. The upgrade is successful, even with the RHEL worker (though I had to work around https://bugzilla.redhat.com/show_bug.cgi?id=1913154). I see this in the MCD logs on the RHEL node: $ oc logs machine-config-daemon-jl7rb -c machine-config-daemon | grep sched I0107 17:49:12.140304 1880 controlplane.go:50] Device /sys/devices/pci0000:00/0000:00:1e.0/0000:05:01.0/0000:06:0a.0/virtio1/block/vda does not support the bfq scheduler I then rolled back to 4.6.9 where I hit the error (expectedly as 4.6.9 does not contain the fix): E0107 18:36:01.363148 41769 writer.go:135] Marking Degraded due to: write /sys/devices/pci0000:00/0000:00:1e.0/0000:05:01.0/0000:06:0a.0/virtio1/block/vda/queue/scheduler: invalid argument Sunil, you can consider this verified by me.
Thank you Seth, I will mark this as Verified as the error I see will be fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1913536
No docs needed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633
Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1]. If you feel like this bug still needs to be a suspect, please add keyword again. [1]: https://github.com/openshift/enhancements/pull/475