Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1998260

Summary:	machine-config operator not available after node crash test.
Product:	OpenShift Container Platform	Reporter:	Simon <skordas>
Component:	Machine Config Operator	Assignee:	Yu Qi Zhang <jerzhang>
Machine Config Operator sub component:	Machine Config Operator	QA Contact:	Rio Liu <rioliu>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	medium
Priority:	unspecified	CC:	aos-bugs, jkyros, kgarriso, mkrejci
Version:	4.10	Keywords:	Reopened
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-05-10 15:23:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Simon 2021-08-26 17:28:30 UTC

Description of problem:
After 'node crash' test on AWN machine-config operator is not available. Working node is in NotReady status. 

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-25-093627

How reproducible:
So far one run - 100%

Steps to Reproduce:
1. clone git repository:
git clone https://github.com/openshift-scale/kraken.git 
$ cd kraken

2. edit configuration file:
vim config/config.yml

 - update kubeconfig path
 - From `chaos_scenarios` leave only node scenarios.

3. Run kraken
python3 run_kraken.py --config config/config.yaml

This is automated crash test. In general it will run command on workers:
$ oc debug node/$worker_node -- chroot /host -- dd if=/dev/urandom of=/proc/sysrq-trigger

Actual results:
$ oc get nodes
NAME                                         STATUS     ROLES    AGE     VERSION
ip-10-0-139-15.us-east-2.compute.internal    Ready      master   6h3m    v1.22.0-rc.0+5c2f7cd
ip-10-0-149-240.us-east-2.compute.internal   NotReady   worker   5h52m   v1.22.0-rc.0+5c2f7cd
ip-10-0-177-74.us-east-2.compute.internal    Ready      worker   5h51m   v1.22.0-rc.0+5c2f7cd
ip-10-0-182-231.us-east-2.compute.internal   Ready      master   6h3m    v1.22.0-rc.0+5c2f7cd
ip-10-0-200-25.us-east-2.compute.internal    Ready      worker   5h51m   v1.22.0-rc.0+5c2f7cd
ip-10-0-201-102.us-east-2.compute.internal   Ready      master   6h3m    v1.22.0-rc.0+5c2f7cd


$ oc get co machine-config
NAME             VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.9.0-0.nightly-2021-08-25-093627   False       False         True       36m     Cluster not available for 4.9.0-0.nightly-2021-08-25-093627

$ oc describe node ip-10-0-149-240.us-east-2.compute.internal
Name:               ip-10-0-149-240.us-east-2.compute.internal
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m5.large
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-east-2
                    failure-domain.beta.kubernetes.io/zone=us-east-2a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-149-240.us-east-2.compute.internal
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=m5.large
                    node.openshift.io/os_id=rhcos
                    topology.ebs.csi.aws.com/zone=us-east-2a
                    topology.kubernetes.io/region=us-east-2
                    topology.kubernetes.io/zone=us-east-2a
Annotations:        csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-019458366e30d5e26"}
                    k8s.ovn.org/host-addresses: ["10.0.149.240"]
                    k8s.ovn.org/l3-gateway-config:
                      {"default":{"mode":"shared","interface-id":"br-ex_ip-10-0-149-240.us-east-2.compute.internal","mac-address":"02:81:9b:f4:23:fc","ip-addres...
                    k8s.ovn.org/node-chassis-id: 10079734-5849-4928-8c5c-07a366df1c47
                    k8s.ovn.org/node-mgmt-port-mac-address: 86:3e:3a:b8:d2:22
                    k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.0.149.240/19"}
                    k8s.ovn.org/node-subnets: {"default":"10.131.0.0/23"}
                    machine.openshift.io/machine: openshift-machine-api/skordas826a-nwzxp-worker-us-east-2a-cq88n
                    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-d8d7f496556446a0a6443be920044ad3
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-d8d7f496556446a0a6443be920044ad3
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 26 Aug 2021 07:33:46 -0400
Taints:             node.kubernetes.io/unreachable:NoExecute
                    k8s.ovn.org/network-unavailable:NoSchedule
                    node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-0-149-240.us-east-2.compute.internal
  AcquireTime:     <unset>
  RenewTime:       Thu, 26 Aug 2021 12:41:25 -0400
Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----             ------    -----------------                 ------------------                ------              -------
  MemoryPressure   Unknown   Thu, 26 Aug 2021 12:41:25 -0400   Thu, 26 Aug 2021 12:42:08 -0400   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure     Unknown   Thu, 26 Aug 2021 12:41:25 -0400   Thu, 26 Aug 2021 12:42:08 -0400   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure      Unknown   Thu, 26 Aug 2021 12:41:25 -0400   Thu, 26 Aug 2021 12:42:08 -0400   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready            Unknown   Thu, 26 Aug 2021 12:41:25 -0400   Thu, 26 Aug 2021 12:42:08 -0400   NodeStatusUnknown   Kubelet stopped posting node status.
Addresses:
  InternalIP:   10.0.149.240
  Hostname:     ip-10-0-149-240.us-east-2.compute.internal
  InternalDNS:  ip-10-0-149-240.us-east-2.compute.internal
Capacity:
  attachable-volumes-aws-ebs:  25
  cpu:                         2
  ephemeral-storage:           125293548Ki
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      7935212Ki
  pods:                        250
Allocatable:
  attachable-volumes-aws-ebs:  25
  cpu:                         1500m
  ephemeral-storage:           115470533646
  hugepages-1Gi:               0
  hugepages-2Mi:               0
  memory:                      6784236Ki
  pods:                        250
System Info:
  Machine ID:                             ec291f631345c087942a9fea8fe2a126
  System UUID:                            ec291f63-1345-c087-942a-9fea8fe2a126
  Boot ID:                                c8dc3152-f60e-48b7-83ec-a2bd57720055
  Kernel Version:                         4.18.0-305.12.1.el8_4.x86_64
  OS Image:                               Red Hat Enterprise Linux CoreOS 49.84.202108221651-0 (Ootpa)
  Operating System:                       linux
  Architecture:                           amd64
  Container Runtime Version:              cri-o://1.22.0-53.rhaos4.9.git2d289a2.el8
  Kubelet Version:                        v1.22.0-rc.0+5c2f7cd
  Kube-Proxy Version:                     v1.22.0-rc.0+5c2f7cd
ProviderID:                               aws:///us-east-2a/i-019458366e30d5e26
Non-terminated Pods:                      (25 in total)
  Namespace                               Name                                             CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                               ----                                             ------------  ----------  ---------------  -------------  ---
  default                                 ip-10-0-149-240us-east-2computeinternal-debug    0 (0%)        0 (0%)      0 (0%)           0 (0%)         48m
  openshift-cluster-csi-drivers           aws-ebs-csi-driver-node-wch9k                    30m (2%)      0 (0%)      150Mi (2%)       0 (0%)         5h52m
  openshift-cluster-node-tuning-operator  tuned-jjvtw                                      10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         5h52m
  openshift-dns                           dns-default-dcxtb                                60m (4%)      0 (0%)      110Mi (1%)       0 (0%)         5h51m
  openshift-dns                           node-resolver-pvbxx                              5m (0%)       0 (0%)      21Mi (0%)        0 (0%)         5h52m
  openshift-image-registry                image-registry-568957b9d6-rdkdx                  100m (6%)     0 (0%)      256Mi (3%)       0 (0%)         5h52m
  openshift-image-registry                node-ca-875ck                                    10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         5h52m
  openshift-ingress-canary                ingress-canary-8zj6c                             10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         5h51m
  openshift-ingress                       router-default-65bdc775fd-dwrbr                  100m (6%)     0 (0%)      256Mi (3%)       0 (0%)         5h51m
  openshift-machine-config-operator       machine-config-daemon-ngrhx                      40m (2%)      0 (0%)      100Mi (1%)       0 (0%)         5h52m
  openshift-marketplace                   certified-operators-f6smg                        10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         68m
  openshift-marketplace                   community-operators-rrmq2                        10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         5h59m
  openshift-marketplace                   redhat-marketplace-vck2g                         10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         5h59m
  openshift-marketplace                   redhat-operators-n2kvh                           10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         92m
  openshift-monitoring                    alertmanager-main-2                              8m (0%)       0 (0%)      105Mi (1%)       0 (0%)         5h49m
  openshift-monitoring                    kube-state-metrics-59b87859b8-kgw7r              4m (0%)       0 (0%)      110Mi (1%)       0 (0%)         6h
  openshift-monitoring                    node-exporter-qm46v                              9m (0%)       0 (0%)      47Mi (0%)        0 (0%)         5h52m
  openshift-monitoring                    openshift-state-metrics-66585c8c7c-hdv4x         3m (0%)       0 (0%)      72Mi (1%)        0 (0%)         6h
  openshift-monitoring                    telemeter-client-6bd9bc5f84-557hq                3m (0%)       0 (0%)      70Mi (1%)        0 (0%)         6h
  openshift-multus                        multus-additional-cni-plugins-pg4qm              10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         5h52m
  openshift-multus                        multus-bx467                                     10m (0%)      0 (0%)      65Mi (0%)        0 (0%)         5h52m
  openshift-multus                        network-metrics-daemon-c447r                     20m (1%)      0 (0%)      120Mi (1%)       0 (0%)         5h52m
  openshift-network-diagnostics           network-check-source-75749bc6b4-c8jrf            10m (0%)      0 (0%)      40Mi (0%)        0 (0%)         6h4m
  openshift-network-diagnostics           network-check-target-tm2fs                       10m (0%)      0 (0%)      15Mi (0%)        0 (0%)         5h52m
  openshift-ovn-kubernetes                ovnkube-node-p2bmb                               40m (2%)      0 (0%)      640Mi (9%)       0 (0%)         5h52m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         532m (35%)    0 (0%)
  memory                      2467Mi (37%)  0 (0%)
  ephemeral-storage           0 (0%)        0 (0%)
  hugepages-1Gi               0 (0%)        0 (0%)
  hugepages-2Mi               0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0
Events:
  Type     Reason                   Age                From     Message
  ----     ------                   ----               ----     -------
  Normal   Starting                 49m                kubelet  Starting kubelet.
  Normal   NodeAllocatableEnforced  49m                kubelet  Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory  49m (x2 over 49m)  kubelet  Node ip-10-0-149-240.us-east-2.compute.internal status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    49m (x2 over 49m)  kubelet  Node ip-10-0-149-240.us-east-2.compute.internal status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     49m (x2 over 49m)  kubelet  Node ip-10-0-149-240.us-east-2.compute.internal status is now: NodeHasSufficientPID
  Warning  Rebooted                 49m                kubelet  Node ip-10-0-149-240.us-east-2.compute.internal has been rebooted, boot id: 19ae10f2-e530-4b99-b4e5-0ab011386454
  Normal   NodeReady                49m                kubelet  Node ip-10-0-149-240.us-east-2.compute.internal status is now: NodeReady
  Normal   Starting                 45m                kubelet  Starting kubelet.
  Normal   NodeAllocatableEnforced  45m                kubelet  Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory  45m (x2 over 45m)  kubelet  Node ip-10-0-149-240.us-east-2.compute.internal status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    45m (x2 over 45m)  kubelet  Node ip-10-0-149-240.us-east-2.compute.internal status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     45m (x2 over 45m)  kubelet  Node ip-10-0-149-240.us-east-2.compute.internal status is now: NodeHasSufficientPID
  Warning  Rebooted                 45m                kubelet  Node ip-10-0-149-240.us-east-2.compute.internal has been rebooted, boot id: c8dc3152-f60e-48b7-83ec-a2bd57720055
  Normal   NodeReady                45m                kubelet  Node ip-10-0-149-240.us-east-2.compute.internal status is now: NodeReady


$ oc describe co machine-config
Name:         machine-config
Namespace:    
Labels:       <none>
Annotations:  exclude.release.openshift.io/internal-openshift-hosted: true
              include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2021-08-26T11:20:51Z
  Generation:          1
  Managed Fields:
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:exclude.release.openshift.io/internal-openshift-hosted:
          f:include.release.openshift.io/self-managed-high-availability:
          f:include.release.openshift.io/single-node-developer:
        f:ownerReferences:
          .:
          k:{"uid":"4ae87e73-9fde-4bb0-990c-c6045f5299c6"}:
      f:spec:
    Manager:      cluster-version-operator
    Operation:    Update
    Time:         2021-08-26T11:20:51Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
    Manager:      cluster-version-operator
    Operation:    Update
    Subresource:  status
    Time:         2021-08-26T11:20:52Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
        f:extension:
          .:
          f:master:
          f:worker:
        f:relatedObjects:
        f:versions:
    Manager:      machine-config-operator
    Operation:    Update
    Subresource:  status
    Time:         2021-08-26T11:28:22Z
  Owner References:
    API Version:     config.openshift.io/v1
    Kind:            ClusterVersion
    Name:            version
    UID:             4ae87e73-9fde-4bb0-990c-c6045f5299c6
  Resource Version:  139116
  UID:               512d0dab-67b1-452b-bce9-834c773d47fb
Spec:
Status:
  Conditions:
    Last Transition Time:  2021-08-26T11:28:23Z
    Message:               Cluster version is 4.9.0-0.nightly-2021-08-25-093627
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2021-08-26T16:49:53Z
    Message:               One or more machine config pools are updating, please see `oc get mcp` for further details
    Reason:                PoolUpdating
    Status:                False
    Type:                  Upgradeable
    Last Transition Time:  2021-08-26T16:49:53Z
    Message:               Failed to resync 4.9.0-0.nightly-2021-08-25-093627 because: timed out waiting for the condition during waitForDaemonsetRollout: Daemonset machine-config-daemon is not ready. status: (desired: 6, updated: 6, ready: 5, unavailable: 1)
    Reason:                MachineConfigDaemonFailed
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2021-08-26T16:49:53Z
    Message:               Cluster not available for 4.9.0-0.nightly-2021-08-25-093627
    Status:                False
    Type:                  Available
  Extension:
    Master:  all 3 nodes are at latest configuration rendered-master-099495b629c595198096d3816ac65c45
    Worker:  3 (ready 2) out of 3 nodes are updating to latest configuration rendered-worker-d8d7f496556446a0a6443be920044ad3
  Related Objects:
    Group:     
    Name:      openshift-machine-config-operator
    Resource:  namespaces
    Group:     machineconfiguration.openshift.io
    Name:      
    Resource:  machineconfigpools
    Group:     machineconfiguration.openshift.io
    Name:      
    Resource:  controllerconfigs
    Group:     machineconfiguration.openshift.io
    Name:      
    Resource:  kubeletconfigs
    Group:     machineconfiguration.openshift.io
    Name:      
    Resource:  containerruntimeconfigs
    Group:     machineconfiguration.openshift.io
    Name:      
    Resource:  machineconfigs
    Group:     
    Name:      
    Resource:  nodes
    Group:     
    Name:      openshift-kni-infra
    Resource:  namespaces
    Group:     
    Name:      openshift-openstack-infra
    Resource:  namespaces
    Group:     
    Name:      openshift-ovirt-infra
    Resource:  namespaces
    Group:     
    Name:      openshift-vsphere-infra
    Resource:  namespaces
  Versions:
    Name:     operator
    Version:  4.9.0-0.nightly-2021-08-25-093627
Events:       <none>


Expected results:
Node should reboot and be ready.

Comment 2 John Kyros 2021-08-30 21:53:06 UTC

The kubelet status from your node describe makes it look you have more problems than just the MCO being degraded, and that node ip-10-0-149-240.us-east-2.compute.internal might not be okay: 

Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----             ------    -----------------                 ------------------                ------              -------
  MemoryPressure   Unknown   Thu, 26 Aug 2021 12:41:25 -0400   Thu, 26 Aug 2021 12:42:08 -0400   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure     Unknown   Thu, 26 Aug 2021 12:41:25 -0400   Thu, 26 Aug 2021 12:42:08 -0400   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure      Unknown   Thu, 26 Aug 2021 12:41:25 -0400   Thu, 26 Aug 2021 12:42:08 -0400   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready            Unknown   Thu, 26 Aug 2021 12:41:25 -0400   Thu, 26 Aug 2021 12:42:08 -0400   NodeStatusUnknown   Kubelet stopped posting node status.


Also, with the exception of tuned, it looks like none of your pods on ip-10-0-149-240.us-east-2.compute.internal are ready: 

jkyros@jkyros-t590 masters]$ omg get pods -A  -o wide | grep 'ip-10-0-149-240.us-east-2.compute.internal'
default                                           ip-10-0-149-240us-east-2computeinternal-debug                        0/1    Pending    0         35m    10.0.149.240  ip-10-0-149-240.us-east-2.compute.internal
openshift-cluster-csi-drivers                     aws-ebs-csi-driver-node-wch9k                                        0/3    Running    2         5h41m  10.0.149.240  ip-10-0-149-240.us-east-2.compute.internal
openshift-cluster-node-tuning-operator            tuned-jjvtw                                                          1/1    Running    2         5h40m  10.0.149.240  ip-10-0-149-240.us-east-2.compute.internal
openshift-dns                                     dns-default-dcxtb                                                    0/2    Running    2         5h39m                ip-10-0-149-240.us-east-2.compute.internal
openshift-dns                                     node-resolver-pvbxx                                                  0/1    Running    2         5h40m  10.0.149.240  ip-10-0-149-240.us-east-2.compute.internal
openshift-image-registry                          image-registry-568957b9d6-rdkdx                                      0/1    Running    2         5h40m                ip-10-0-149-240.us-east-2.compute.internal
openshift-image-registry                          node-ca-875ck                                                        0/1    Running    2         5h40m  10.0.149.240  ip-10-0-149-240.us-east-2.compute.internal
openshift-ingress-canary                          ingress-canary-8zj6c                                                 0/1    Running    2         5h38m                ip-10-0-149-240.us-east-2.compute.internal
openshift-ingress                                 router-default-65bdc775fd-dwrbr                                      0/1    Running    2         5h38m                ip-10-0-149-240.us-east-2.compute.internal
openshift-machine-config-operator                 machine-config-daemon-ngrhx                                          0/2    Running    2         5h40m  10.0.149.240  ip-10-0-149-240.us-east-2.compute.internal
openshift-marketplace                             certified-operators-f6smg                                            0/1    Running    2         55m                  ip-10-0-149-240.us-east-2.compute.internal
openshift-marketplace                             community-operators-rrmq2                                            0/1    Running    2         5h47m                ip-10-0-149-240.us-east-2.compute.internal
openshift-marketplace                             redhat-marketplace-vck2g                                             0/1    Running    2         5h47m                ip-10-0-149-240.us-east-2.compute.internal
openshift-marketplace                             redhat-operators-n2kvh                                               0/1    Running    2         1h20m                ip-10-0-149-240.us-east-2.compute.internal
openshift-monitoring                              alertmanager-main-2                                                  0/5    Running    2         5h37m                ip-10-0-149-240.us-east-2.compute.internal
openshift-monitoring                              kube-state-metrics-59b87859b8-kgw7r                                  0/3    Running    2         5h47m                ip-10-0-149-240.us-east-2.compute.internal
openshift-monitoring                              node-exporter-qm46v                                                  0/2    Running    2         5h40m  10.0.149.240  ip-10-0-149-240.us-east-2.compute.internal
openshift-monitoring                              openshift-state-metrics-66585c8c7c-hdv4x                             0/3    Running    2         5h47m                ip-10-0-149-240.us-east-2.compute.internal
openshift-monitoring                              telemeter-client-6bd9bc5f84-557hq                                    0/3    Running    2         5h47m                ip-10-0-149-240.us-east-2.compute.internal
openshift-multus                                  multus-additional-cni-plugins-pg4qm                                  0/1    Pending    2         5h40m  10.0.149.240  ip-10-0-149-240.us-east-2.compute.internal
openshift-multus                                  multus-bx467                                                         0/1    Running    2         5h40m  10.0.149.240  ip-10-0-149-240.us-east-2.compute.internal
openshift-multus                                  network-metrics-daemon-c447r                                         0/2    Running    2         5h40m                ip-10-0-149-240.us-east-2.compute.internal
openshift-network-diagnostics                     network-check-source-75749bc6b4-c8jrf                                0/1    Running    2         5h51m                ip-10-0-149-240.us-east-2.compute.internal
openshift-network-diagnostics                     network-check-target-tm2fs                                           0/1    Running    2         5h40m                ip-10-0-149-240.us-east-2.compute.internal
openshift-ovn-kubernetes                          ovnkube-node-p2bmb                                                   0/4    Running    3         5h40m  10.0.149.240  ip-10-0-149-240.us-east-2.compute.internal


This doesn't look like the MCO as cause -- more that the MCO is a victim and complaining about it. 

Can you still get into the "problem node" ip-10-0-149-240.us-east-2.compute.internal ?

Would you be able to upload the journal logs or take a sosreport of that node so we can see what's going on in there?

Comment 3 John Kyros 2021-12-01 19:35:02 UTC

This hit in the middle of whatever stability issues were happening with [1] and it appeared to resolve with later nightlies, I suspect that the root cause here was similar (and regardless, outside the MCO).  

I'm going to close this, as I believe the underlying problems have been resolved, but if you manage to reproduce it on a current nightly, please reopen it. Thanks!

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1997905

Comment 4 Simon 2022-03-11 16:15:51 UTC

Unfortunately I'm getting the same results with 4.10.0-0.nightly-2022-03-09-224546 version.

$ oc version
Client Version: 4.9.0-0.nightly-2021-07-20-014024
Server Version: 4.10.0-0.nightly-2022-03-09-224546
Kubernetes Version: v1.23.3+e419edf

Comment 6 Kirsten Garrison 2022-03-14 18:13:13 UTC

Simon,

Can you clarify what exactly the bug is? That if a node crashes and is unavailable the MCO is degraded (which would be expected behaviour) or something else?

Secondly can you explain what you are doing to your clusters, ie this is a crash test and it seems like the node crashed as intended? We need some extra details here so we can figure out what exactly is going on and whether this is a bug or not.

Thanks!

Comment 8 Simon 2022-05-10 15:23:17 UTC

Sorry for this one.
That was retest.
Next time when I see the same issue I'll start from node...