Bug 2035494 - [SDN Migration]ovnkube-node pods CrashLoopBackOff after sdn migrated to ovn for RHEL workers
Summary: [SDN Migration]ovnkube-node pods CrashLoopBackOff after sdn migrated to ovn f...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Peng Liu
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-24 09:24 UTC by huirwang
Modified: 2022-03-10 16:36 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:36:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift openshift-ansible pull 12362 0 None open Bug 2035494: Set keyfile to be the primary NetworkManager plugin 2021-12-28 09:31:00 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:36:48 UTC

Description huirwang 2021-12-24 09:24:45 UTC
Description of problem:
ovnkube-node pods CrashLoopBackOff after sdn migrated to ovn for RHEL 8 workers

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2021-12-23-153012 

How reproducible:
Always

Steps to Reproduce:
Following https://docs.openshift.com/container-platform/4.9/networking/ovn_kubernetes_network_provider/migrate-from-openshift-sdn.html to do the SDN migration, after step 8 rebooting all the nodes, ovnkube-node pods located on RHEL 8 were CrashLoopBackOff 

$ oc get nodes -o wide
NAME                                        STATUS   ROLES    AGE    VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-49-63.us-east-2.compute.internal    Ready    worker   81m    v1.22.1+6859754   10.0.49.63    <none>        Red Hat Enterprise Linux 8.4 (Ootpa)                            4.18.0-348.7.1.el8_5.x86_64    cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8
ip-10-0-53-118.us-east-2.compute.internal   Ready    worker   126m   v1.22.1+6859754   10.0.53.118   <none>        Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa)   4.18.0-305.30.1.el8_4.x86_64   cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8
ip-10-0-57-108.us-east-2.compute.internal   Ready    worker   81m    v1.22.1+6859754   10.0.57.108   <none>        Red Hat Enterprise Linux 8.4 (Ootpa)                            4.18.0-348.7.1.el8_5.x86_64    cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8
ip-10-0-57-131.us-east-2.compute.internal   Ready    master   131m   v1.22.1+6859754   10.0.57.131   <none>        Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa)   4.18.0-305.30.1.el8_4.x86_64   cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8
ip-10-0-61-20.us-east-2.compute.internal    Ready    worker   126m   v1.22.1+6859754   10.0.61.20    <none>        Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa)   4.18.0-305.30.1.el8_4.x86_64   cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8
ip-10-0-61-89.us-east-2.compute.internal    Ready    master   131m   v1.22.1+6859754   10.0.61.89    <none>        Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa)   4.18.0-305.30.1.el8_4.x86_64   cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8
ip-10-0-69-211.us-east-2.compute.internal   Ready    master   131m   v1.22.1+6859754   10.0.69.211   <none>        Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa)   4.18.0-305.30.1.el8_4.x86_64   cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8
ip-10-0-77-246.us-east-2.compute.internal   Ready    worker   126m   v1.22.1+6859754   10.0.77.246   <none>        Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa)   4.18.0-305.30.1.el8_4.x86_64   cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8


$ oc get pods -n openshift-ovn-kubernetes -o wide
NAME                   READY   STATUS             RESTARTS         AGE   IP            NODE                                        NOMINATED NODE   READINESS GATES
ovnkube-master-hc9zv   6/6     Running            6                50m   10.0.57.131   ip-10-0-57-131.us-east-2.compute.internal   <none>           <none>
ovnkube-master-kwghr   6/6     Running            14 (42m ago)     50m   10.0.61.89    ip-10-0-61-89.us-east-2.compute.internal    <none>           <none>
ovnkube-master-zrvgw   6/6     Running            14 (42m ago)     50m   10.0.69.211   ip-10-0-69-211.us-east-2.compute.internal   <none>           <none>
ovnkube-node-6nknf     5/5     Running            10 (42m ago)     50m   10.0.77.246   ip-10-0-77-246.us-east-2.compute.internal   <none>           <none>
ovnkube-node-dbfc6     4/5     CrashLoopBackOff   22 (12s ago)     50m   10.0.57.108   ip-10-0-57-108.us-east-2.compute.internal   <none>           <none>
ovnkube-node-gfqqc     5/5     Running            10 (42m ago)     50m   10.0.57.131   ip-10-0-57-131.us-east-2.compute.internal   <none>           <none>
ovnkube-node-khz8j     5/5     Running            10 (42m ago)     50m   10.0.69.211   ip-10-0-69-211.us-east-2.compute.internal   <none>           <none>
ovnkube-node-qdjcp     5/5     Running            10 (42m ago)     50m   10.0.61.89    ip-10-0-61-89.us-east-2.compute.internal    <none>           <none>
ovnkube-node-qm6jj     5/5     Running            10 (42m ago)     50m   10.0.53.118   ip-10-0-53-118.us-east-2.compute.internal   <none>           <none>
ovnkube-node-z826v     4/5     CrashLoopBackOff   21 (4m30s ago)   50m   10.0.49.63    ip-10-0-49-63.us-east-2.compute.internal    <none>           <none>
ovnkube-node-zbxx5     5/5     Running            9                50m   10.0.61.20    ip-10-0-61-20.us-east-2.compute.internal    <none>           <none>           9                40m

 oc describe pod ovnkube-node-dbfc6  -n openshift-ovn-kubernetes
Name:                 ovnkube-node-dbfc6
Namespace:            openshift-ovn-kubernetes
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 ip-10-0-57-108.us-east-2.compute.internal/10.0.57.108
Start Time:           Fri, 24 Dec 2021 08:33:30 +0000
Labels:               app=ovnkube-node
                      component=network
                      controller-revision-hash=59bf78fddb
                      kubernetes.io/os=linux
                      openshift.io/component=network
                      pod-template-generation=1
                      type=infra
Annotations:          networkoperator.openshift.io/ip-family-mode: single-stack
Status:               Running
IP:                   10.0.57.108
IPs:
  IP:           10.0.57.108
Controlled By:  DaemonSet/ovnkube-node
Containers:
  ovn-controller:
    Container ID:  cri-o://e3a6d496dc6dc0cffaef9346a4085678566ee6f790c9164a5f00fb60aa2801c7
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      set -e
      if [[ -f "/env/${K8S_NODE}" ]]; then
        set -o allexport
        source "/env/${K8S_NODE}"
        set +o allexport
      fi  
      
      echo "$(date -Iseconds) - starting ovn-controller"
      exec ovn-controller unix:/var/run/openvswitch/db.sock -vfile:off \
        --no-chdir --pidfile=/var/run/ovn/ovn-controller.pid \
        --syslog-method="null" \
        --log-file=/var/log/ovn/acl-audit-log.log \
        -vFACILITY:"local0" \
        -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt \
        -vconsole:"${OVN_LOG_LEVEL}" -vconsole:"acl_log:off" \
        -vPATTERN:console:"%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m" \
        -vsyslog:"acl_log:info" \
        -vfile:"acl_log:info"
      
    State:          Running
      Started:      Fri, 24 Dec 2021 08:41:16 +0000
    Ready:          True
    Restart Count:  1
    Requests:
      cpu:     10m
      memory:  300Mi
    Environment:
      OVN_LOG_LEVEL:  info
      K8S_NODE:        (v1:spec.nodeName)
    Mounts:
      /dev/log from log-socket (rw)
      /env from env-overrides (rw)
      /etc/openvswitch from etc-openvswitch (rw)
      /etc/ovn/ from etc-openvswitch (rw)
      /ovn-ca from ovn-ca (rw)
      /ovn-cert from ovn-cert (rw)
      /run/openvswitch from run-openvswitch (rw)
      /run/ovn/ from run-ovn (rw)
      /var/lib/openvswitch from var-lib-openvswitch (rw)
      /var/log/ovn from node-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2qmxp (ro)
  ovn-acl-logging:
    Container ID:  cri-o://676fda8f0c7bf11462e18bd6421f4fb6db67f70ccdfd366fa5d147d1e0e0347d
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      set -euo pipefail
      
      # Rotate audit log files when then get to max size (in bytes)
      MAXFILESIZE=$(( "50"*1000000 )) 
      LOGFILE=/var/log/ovn/acl-audit-log.log
      CONTROLLERPID=$(cat /run/ovn/ovn-controller.pid)
      
      # Redirect err to null so no messages are shown upon rotation
      tail -F ${LOGFILE} 2> /dev/null &
      
      while true
      do
        # Make sure ovn-controller's logfile exists, and get current size in bytes 
        if [ -f "$LOGFILE" ]; then 
          file_size=`du -b ${LOGFILE} | tr -s '\t' ' ' | cut -d' ' -f1`
        else 
          ovs-appctl -t /var/run/ovn/ovn-controller.${CONTROLLERPID}.ctl vlog/reopen
          file_size=`du -b ${LOGFILE} | tr -s '\t' ' ' | cut -d' ' -f1`
        fi 
        
        if [ $file_size -gt $MAXFILESIZE ];then
          echo "Rotating OVN ACL Log File"
          timestamp=`date '+%Y-%m-%dT%H-%M-%S'`
          mv ${LOGFILE} /var/log/ovn/acl-audit-log.$timestamp.log
          ovs-appctl -t /run/ovn/ovn-controller.${CONTROLLERPID}.ctl vlog/reopen
          CONTROLLERPID=$(cat /run/ovn/ovn-controller.pid)
        fi
      
        # sleep for 30 seconds to avoid wasting CPU 
        sleep 30 
      done
      
    State:          Running
      Started:      Fri, 24 Dec 2021 08:41:17 +0000
    Ready:          True
    Restart Count:  1
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /run/ovn/ from run-ovn (rw)
      /var/log/ovn from node-log (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2qmxp (ro)
  kube-rbac-proxy:
    Container ID:  cri-o://e70841efc791cdc5be3970fc2eb42477a525e47beb4390b72d917ae6f03b8a4f
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269
    Port:          9103/TCP
    Host Port:     9103/TCP
    Command:
      /bin/bash
      -c
      #!/bin/bash
      set -euo pipefail
      TLS_PK=/etc/pki/tls/metrics-cert/tls.key
      TLS_CERT=/etc/pki/tls/metrics-cert/tls.crt
      # As the secret mount is optional we must wait for the files to be present.
      # The service is created in monitor.yaml and this is created in sdn.yaml.
      # If it isn't created there is probably an issue so we want to crashloop.
      retries=0
      TS=$(date +%s)
      WARN_TS=$(( ${TS} + $(( 20 * 60)) ))
      HAS_LOGGED_INFO=0
      
      log_missing_certs(){
          CUR_TS=$(date +%s)
          if [[ "${CUR_TS}" -gt "WARN_TS"  ]]; then
            echo $(date -Iseconds) WARN: ovn-node-metrics-cert not mounted after 20 minutes.
          elif [[ "${HAS_LOGGED_INFO}" -eq 0 ]] ; then
            echo $(date -Iseconds) INFO: ovn-node-metrics-cert not mounted. Waiting one hour.
            HAS_LOGGED_INFO=1
          fi
      }
      while [[ ! -f "${TLS_PK}" ||  ! -f "${TLS_CERT}" ]] ; do
        log_missing_certs
        sleep 5
      done
      
      echo $(date -Iseconds) INFO: ovn-node-metrics-certs mounted, starting kube-rbac-proxy
      exec /usr/bin/kube-rbac-proxy \
        --logtostderr \
        --secure-listen-address=:9103 \
        --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 \
        --upstream=http://127.0.0.1:29103/ \
        --tls-private-key-file=${TLS_PK} \
        --tls-cert-file=${TLS_CERT}
      
    State:          Running
      Started:      Fri, 24 Dec 2021 08:41:17 +0000
    Ready:          True
    Restart Count:  1
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /etc/pki/tls/metrics-cert from ovn-node-metrics-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2qmxp (ro)
  kube-rbac-proxy-ovn-metrics:
    Container ID:  cri-o://98211440f5388a4430fa33f37745ba561f27768b4069f7da072c661000e636dc
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269
    Port:          9105/TCP
    Host Port:     9105/TCP
    Command:
      /bin/bash
      -c
      #!/bin/bash
      set -euo pipefail
      TLS_PK=/etc/pki/tls/metrics-cert/tls.key
      TLS_CERT=/etc/pki/tls/metrics-cert/tls.crt
      # As the secret mount is optional we must wait for the files to be present.
      # The service is created in monitor.yaml and this is created in sdn.yaml.
      # If it isn't created there is probably an issue so we want to crashloop.
      retries=0
      TS=$(date +%s)
      WARN_TS=$(( ${TS} + $(( 20 * 60)) ))
      HAS_LOGGED_INFO=0
      
      log_missing_certs(){
          CUR_TS=$(date +%s)
          if [[ "${CUR_TS}" -gt "WARN_TS"  ]]; then
            echo $(date -Iseconds) WARN: ovn-node-metrics-cert not mounted after 20 minutes.
          elif [[ "${HAS_LOGGED_INFO}" -eq 0 ]] ; then
            echo $(date -Iseconds) INFO: ovn-node-metrics-cert not mounted. Waiting one hour.
            HAS_LOGGED_INFO=1
          fi
      }
      while [[ ! -f "${TLS_PK}" ||  ! -f "${TLS_CERT}" ]] ; do
        log_missing_certs
        sleep 5
      done
      
      echo $(date -Iseconds) INFO: ovn-node-metrics-certs mounted, starting kube-rbac-proxy
      exec /usr/bin/kube-rbac-proxy \
        --logtostderr \
        --secure-listen-address=:9105 \
        --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 \
        --upstream=http://127.0.0.1:29105/ \
        --tls-private-key-file=${TLS_PK} \
        --tls-cert-file=${TLS_CERT}
      
    State:          Running
      Started:      Fri, 24 Dec 2021 08:41:17 +0000
    Ready:          True
    Restart Count:  1
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /etc/pki/tls/metrics-cert from ovn-node-metrics-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2qmxp (ro)
  ovnkube-node:
    Container ID:  cri-o://0153a86c42278b5c782fa58b2332c761cbc356285933805be18dcaffaae562dc
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48
    Port:          29103/TCP
    Host Port:     29103/TCP
    Command:
      /bin/bash
      -c
      set -xe
      if [[ -f "/env/${K8S_NODE}" ]]; then
        set -o allexport
        source "/env/${K8S_NODE}"
        set +o allexport
      fi
      echo "I$(date "+%m%d %H:%M:%S.%N") - waiting for db_ip addresses"
      cp -f /usr/libexec/cni/ovn-k8s-cni-overlay /cni-bin-dir/
      ovn_config_namespace=openshift-ovn-kubernetes
      echo "I$(date "+%m%d %H:%M:%S.%N") - disable conntrack on geneve port"
      iptables -t raw -A PREROUTING -p udp --dport 6081 -j NOTRACK
      iptables -t raw -A OUTPUT -p udp --dport 6081 -j NOTRACK
      ip6tables -t raw -A PREROUTING -p udp --dport 6081 -j NOTRACK
      ip6tables -t raw -A OUTPUT -p udp --dport 6081 -j NOTRACK
      retries=0
      while true; do
        # TODO: change to use '--request-timeout=30s', if https://github.com/kubernetes/kubernetes/issues/49343 is fixed. 
        db_ip=$(timeout 30 kubectl get ep -n ${ovn_config_namespace} ovnkube-db -o jsonpath='{.subsets[0].addresses[0].ip}')
        if [[ -n "${db_ip}" ]]; then
          break
        fi
        (( retries += 1 ))
        if [[ "${retries}" -gt 40 ]]; then
          echo "E$(date "+%m%d %H:%M:%S.%N") - db endpoint never came up"
          exit 1
        fi
        echo "I$(date "+%m%d %H:%M:%S.%N") - waiting for db endpoint"
        sleep 5
      done
      
      echo "I$(date "+%m%d %H:%M:%S.%N") - starting ovnkube-node db_ip ${db_ip}"
      
      if [ "shared" == "shared" ]; then
        gateway_mode_flags="--gateway-mode shared --gateway-interface br-ex"
      elif [ "shared" == "local" ]; then
        gateway_mode_flags="--gateway-mode local --gateway-interface br-ex"
      else
        echo "Invalid OVN_GATEWAY_MODE: \"shared\". Must be \"local\" or \"shared\"."
        exit 1
      fi
      
      export_network_flows_flags=
      if [[ -n "${NETFLOW_COLLECTORS}" ]] ; then
        export_network_flows_flags="--netflow-targets ${NETFLOW_COLLECTORS}"
      fi
      if [[ -n "${SFLOW_COLLECTORS}" ]] ; then
        export_network_flows_flags="$export_network_flows_flags --sflow-targets ${SFLOW_COLLECTORS}"
      fi
      if [[ -n "${IPFIX_COLLECTORS}" ]] ; then
        export_network_flows_flags="$export_network_flows_flags --ipfix-targets ${IPFIX_COLLECTORS}"
      fi
      if [[ -n "${IPFIX_CACHE_MAX_FLOWS}" ]] ; then
        export_network_flows_flags="$export_network_flows_flags --ipfix-cache-max-flows ${IPFIX_CACHE_MAX_FLOWS}"
      fi
      if [[ -n "${IPFIX_CACHE_ACTIVE_TIMEOUT}" ]] ; then
        export_network_flows_flags="$export_network_flows_flags --ipfix-cache-active-timeout ${IPFIX_CACHE_ACTIVE_TIMEOUT}"
      fi
      if [[ -n "${IPFIX_SAMPLING}" ]] ; then
        export_network_flows_flags="$export_network_flows_flags --ipfix-sampling ${IPFIX_SAMPLING}"
      fi
      gw_interface_flag=
      # if br-ex1 is configured on the node, we want to use it for external gateway traffic
      if [ -d /sys/class/net/br-ex1 ]; then
        gw_interface_flag="--exgw-interface=br-ex1"
      fi
      
      node_mgmt_port_netdev_flags=
      if [[ -n "${OVNKUBE_NODE_MGMT_PORT_NETDEV}" ]] ; then
        node_mgmt_port_netdev_flags="--ovnkube-node-mgmt-port-netdev ${OVNKUBE_NODE_MGMT_PORT_NETDEV}"
      fi
      
      exec /usr/bin/ovnkube --init-node "${K8S_NODE}" \
        --nb-address "ssl:10.0.57.131:9641,ssl:10.0.61.89:9641,ssl:10.0.69.211:9641" \
        --sb-address "ssl:10.0.57.131:9642,ssl:10.0.61.89:9642,ssl:10.0.69.211:9642" \
        --nb-client-privkey /ovn-cert/tls.key \
        --nb-client-cert /ovn-cert/tls.crt \
        --nb-client-cacert /ovn-ca/ca-bundle.crt \
        --nb-cert-common-name "ovn" \
        --sb-client-privkey /ovn-cert/tls.key \
        --sb-client-cert /ovn-cert/tls.crt \
        --sb-client-cacert /ovn-ca/ca-bundle.crt \
        --sb-cert-common-name "ovn" \
        --config-file=/run/ovnkube-config/ovnkube.conf \
        --loglevel "${OVN_KUBE_LOG_LEVEL}" \
        --inactivity-probe="${OVN_CONTROLLER_INACTIVITY_PROBE}" \
        ${gateway_mode_flags} \
        --metrics-bind-address "127.0.0.1:29103" \
        --ovn-metrics-bind-address "127.0.0.1:29105" \
        --metrics-enable-pprof \
        ${export_network_flows_flags} \
        ${gw_interface_flag}
      
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   e\n\nup                  : true\n\nup                  : true\n\nup                  : false\n\nup                  : true\n\nup                  : true\n\nup                  : true\n"
I1224 09:13:48.544922   21412 ovs.go:208] exec(3): stderr: ""
I1224 09:13:48.544944   21412 node.go:315] Detected support for port binding with external IDs
I1224 09:13:48.545048   21412 ovs.go:204] exec(4): /usr/bin/ovs-vsctl --timeout=15 -- --if-exists del-port br-int k8s-ip-10-0-57- -- --may-exist add-port br-int ovn-k8s-mp0 -- set interface ovn-k8s-mp0 type=internal mtu_request=8901 external-ids:iface-id=k8s-ip-10-0-57-108.us-east-2.compute.internal
I1224 09:13:48.550909   21412 ovs.go:207] exec(4): stdout: ""
I1224 09:13:48.550934   21412 ovs.go:208] exec(4): stderr: ""
I1224 09:13:48.550947   21412 ovs.go:204] exec(5): /usr/bin/ovs-vsctl --timeout=15 --if-exists get interface ovn-k8s-mp0 mac_in_use
I1224 09:13:48.556400   21412 ovs.go:207] exec(5): stdout: "\"ae:0d:df:25:3b:7d\"\n"
I1224 09:13:48.556436   21412 ovs.go:208] exec(5): stderr: ""
I1224 09:13:48.556461   21412 ovs.go:204] exec(6): /usr/bin/ovs-vsctl --timeout=15 set interface ovn-k8s-mp0 mac=ae\:0d\:df\:25\:3b\:7d
I1224 09:13:48.561730   21412 ovs.go:207] exec(6): stdout: ""
I1224 09:13:48.561766   21412 ovs.go:208] exec(6): stderr: ""
I1224 09:13:48.597987   21412 gateway_init.go:261] Initializing Gateway Functionality
I1224 09:13:48.598172   21412 gateway_localnet.go:131] Node local addresses initialized to: map[10.0.57.108:{10.0.48.0 fffff000} 10.128.0.2:{10.128.0.0 fffffe00} 127.0.0.1:{127.0.0.0 ff000000} ::1:{::1 ffffffffffffffffffffffffffffffff} fe80::a3:e3ff:fe4c:4e0e:{fe80:: ffffffffffffffff0000000000000000} fe80::a8b9:7fff:fe7a:66f9:{fe80:: ffffffffffffffff0000000000000000} fe80::ac0d:dfff:fe25:3b7d:{fe80:: ffffffffffffffff0000000000000000}]
I1224 09:13:48.598364   21412 helper_linux.go:74] Found default gateway interface eth0 10.0.48.1
F1224 09:13:48.598420   21412 ovnkube.go:133] could not find IP addresses: failed to lookup link br-ex: Link not found

      Exit Code:    1
      Started:      Fri, 24 Dec 2021 09:13:47 +0000
      Finished:     Fri, 24 Dec 2021 09:13:48 +0000
    Ready:          False
    Restart Count:  16
    Requests:
      cpu:      10m
      memory:   300Mi
    Readiness:  exec [test -f /etc/cni/net.d/10-ovn-kubernetes.conf] delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:
      KUBERNETES_SERVICE_PORT:          6443
      KUBERNETES_SERVICE_HOST:          api-int.huirwang-1224b.qe.devcluster.openshift.com
      OVN_CONTROLLER_INACTIVITY_PROBE:  180000
      OVN_KUBE_LOG_LEVEL:               4
      K8S_NODE:                          (v1:spec.nodeName)
    Mounts:
      /cni-bin-dir from host-cni-bin (rw)
      /env from env-overrides (rw)
      /etc/cni/net.d from host-cni-netd (rw)
      /etc/openvswitch from etc-openvswitch (rw)
      /etc/ovn/ from etc-openvswitch (rw)
      /etc/systemd/system from systemd-units (ro)
      /host from host-slash (ro)
      /ovn-ca from ovn-ca (rw)
      /ovn-cert from ovn-cert (rw)
      /run/netns from host-run-netns (ro)
      /run/openvswitch from run-openvswitch (rw)
      /run/ovn-kubernetes/ from host-run-ovn-kubernetes (rw)
      /run/ovn/ from run-ovn (rw)
      /run/ovnkube-config/ from ovnkube-config (rw)
      /var/lib/cni/networks/ovn-k8s-cni-overlay from host-var-lib-cni-networks-ovn-kubernetes (rw)
      /var/lib/openvswitch from var-lib-openvswitch (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2qmxp (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  systemd-units:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/systemd/system
    HostPathType:  
  host-slash:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:  
  host-run-netns:
    Type:          HostPath (bare host directory volume)
    Path:          /run/netns
    HostPathType:  
  var-lib-openvswitch:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/openvswitch/data
    HostPathType:  
  etc-openvswitch:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/openvswitch
    HostPathType:  
  run-openvswitch:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/openvswitch
    HostPathType:  
  run-ovn:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/ovn
    HostPathType:  
  node-log:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/ovn
    HostPathType:  
  log-socket:
    Type:          HostPath (bare host directory volume)
    Path:          /dev/log
    HostPathType:  
  host-run-ovn-kubernetes:
    Type:          HostPath (bare host directory volume)
    Path:          /run/ovn-kubernetes
    HostPathType:  
  host-cni-bin:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/bin
    HostPathType:  
  host-cni-netd:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/multus/cni/net.d
    HostPathType:  
  host-var-lib-cni-networks-ovn-kubernetes:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/networks/ovn-k8s-cni-overlay
    HostPathType:  
  ovnkube-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      ovnkube-config
    Optional:  false
  env-overrides:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      env-overrides
    Optional:  true
  ovn-ca:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      ovn-ca
    Optional:  false
  ovn-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ovn-cert
    Optional:    false
  ovn-node-metrics-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ovn-node-metrics-cert
    Optional:    true
  kube-api-access-2qmxp:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              beta.kubernetes.io/os=linux
Tolerations:                 op=Exists
Events:
  Type     Reason     Age                        From               Message
  ----     ------     ----                       ----               -------
  Normal   Scheduled  37m                        default-scheduler  Successfully assigned openshift-ovn-kubernetes/ovnkube-node-dbfc6 to ip-10-0-57-108.us-east-2.compute.internal
  Normal   Pulling    37m                        kubelet            Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48"
  Normal   Pulled     37m                        kubelet            Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" in 14.722538563s
  Normal   Created    37m                        kubelet            Created container ovn-controller
  Normal   Started    37m                        kubelet            Started container ovn-controller
  Normal   Pulled     37m                        kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" already present on machine
  Normal   Created    37m                        kubelet            Created container ovn-acl-logging
  Normal   Started    37m                        kubelet            Started container ovn-acl-logging
  Normal   Pulled     37m                        kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269" already present on machine
  Normal   Created    37m                        kubelet            Created container kube-rbac-proxy
  Normal   Started    37m                        kubelet            Started container kube-rbac-proxy
  Normal   Pulled     37m                        kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269" already present on machine
  Normal   Created    37m                        kubelet            Created container kube-rbac-proxy-ovn-metrics
  Normal   Started    37m                        kubelet            Started container kube-rbac-proxy-ovn-metrics
  Normal   Pulled     37m                        kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" already present on machine
  Normal   Created    37m                        kubelet            Created container ovnkube-node
  Normal   Started    37m                        kubelet            Started container ovnkube-node
  Warning  Unhealthy  36m (x9 over 37m)          kubelet            Readiness probe failed:
  Normal   Created    29m                        kubelet            Created container ovn-controller
  Normal   Started    29m                        kubelet            Started container ovn-controller
  Normal   Pulled     29m                        kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" already present on machine
  Normal   Pulled     29m                        kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" already present on machine
  Normal   Started    29m                        kubelet            Started container kube-rbac-proxy-ovn-metrics
  Normal   Created    29m                        kubelet            Created container ovn-acl-logging
  Normal   Started    29m                        kubelet            Started container ovn-acl-logging
  Normal   Pulled     29m                        kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269" already present on machine
  Normal   Created    29m                        kubelet            Created container kube-rbac-proxy
  Normal   Pulled     29m                        kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269" already present on machine
  Normal   Created    29m                        kubelet            Created container kube-rbac-proxy-ovn-metrics
  Normal   Started    29m                        kubelet            Started container kube-rbac-proxy
  Normal   Created    29m (x2 over 29m)          kubelet            Created container ovnkube-node
  Normal   Started    29m (x2 over 29m)          kubelet            Started container ovnkube-node
  Warning  Unhealthy  29m (x5 over 29m)          kubelet            Readiness probe failed:
  Normal   Pulled     28m (x3 over 29m)          kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" already present on machine
  Warning  BackOff    <invalid> (x141 over 29m)  kubelet            Back-off restarting failed container
Actual results:




Expected results:
SDN migration should be successful.

Additional info:

Comment 3 Peng Liu 2021-12-28 03:32:42 UTC
This is a regression introduced by https://github.com/openshift/machine-config-operator/pull/2742. We need to configure NetworkManager use 'keyfile' as the primary plugin for RHEL workers.

Comment 10 errata-xmlrpc 2022-03-10 16:36:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.