Description of problem: ovnkube-node pods CrashLoopBackOff after sdn migrated to ovn for RHEL 8 workers Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2021-12-23-153012 How reproducible: Always Steps to Reproduce: Following https://docs.openshift.com/container-platform/4.9/networking/ovn_kubernetes_network_provider/migrate-from-openshift-sdn.html to do the SDN migration, after step 8 rebooting all the nodes, ovnkube-node pods located on RHEL 8 were CrashLoopBackOff $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-49-63.us-east-2.compute.internal Ready worker 81m v1.22.1+6859754 10.0.49.63 <none> Red Hat Enterprise Linux 8.4 (Ootpa) 4.18.0-348.7.1.el8_5.x86_64 cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8 ip-10-0-53-118.us-east-2.compute.internal Ready worker 126m v1.22.1+6859754 10.0.53.118 <none> Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa) 4.18.0-305.30.1.el8_4.x86_64 cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8 ip-10-0-57-108.us-east-2.compute.internal Ready worker 81m v1.22.1+6859754 10.0.57.108 <none> Red Hat Enterprise Linux 8.4 (Ootpa) 4.18.0-348.7.1.el8_5.x86_64 cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8 ip-10-0-57-131.us-east-2.compute.internal Ready master 131m v1.22.1+6859754 10.0.57.131 <none> Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa) 4.18.0-305.30.1.el8_4.x86_64 cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8 ip-10-0-61-20.us-east-2.compute.internal Ready worker 126m v1.22.1+6859754 10.0.61.20 <none> Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa) 4.18.0-305.30.1.el8_4.x86_64 cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8 ip-10-0-61-89.us-east-2.compute.internal Ready master 131m v1.22.1+6859754 10.0.61.89 <none> Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa) 4.18.0-305.30.1.el8_4.x86_64 cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8 ip-10-0-69-211.us-east-2.compute.internal Ready master 131m v1.22.1+6859754 10.0.69.211 <none> Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa) 4.18.0-305.30.1.el8_4.x86_64 cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8 ip-10-0-77-246.us-east-2.compute.internal Ready worker 126m v1.22.1+6859754 10.0.77.246 <none> Red Hat Enterprise Linux CoreOS 410.84.202112230202-0 (Ootpa) 4.18.0-305.30.1.el8_4.x86_64 cri-o://1.23.0-98.rhaos4.10.git9b7f5ae.el8 $ oc get pods -n openshift-ovn-kubernetes -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-hc9zv 6/6 Running 6 50m 10.0.57.131 ip-10-0-57-131.us-east-2.compute.internal <none> <none> ovnkube-master-kwghr 6/6 Running 14 (42m ago) 50m 10.0.61.89 ip-10-0-61-89.us-east-2.compute.internal <none> <none> ovnkube-master-zrvgw 6/6 Running 14 (42m ago) 50m 10.0.69.211 ip-10-0-69-211.us-east-2.compute.internal <none> <none> ovnkube-node-6nknf 5/5 Running 10 (42m ago) 50m 10.0.77.246 ip-10-0-77-246.us-east-2.compute.internal <none> <none> ovnkube-node-dbfc6 4/5 CrashLoopBackOff 22 (12s ago) 50m 10.0.57.108 ip-10-0-57-108.us-east-2.compute.internal <none> <none> ovnkube-node-gfqqc 5/5 Running 10 (42m ago) 50m 10.0.57.131 ip-10-0-57-131.us-east-2.compute.internal <none> <none> ovnkube-node-khz8j 5/5 Running 10 (42m ago) 50m 10.0.69.211 ip-10-0-69-211.us-east-2.compute.internal <none> <none> ovnkube-node-qdjcp 5/5 Running 10 (42m ago) 50m 10.0.61.89 ip-10-0-61-89.us-east-2.compute.internal <none> <none> ovnkube-node-qm6jj 5/5 Running 10 (42m ago) 50m 10.0.53.118 ip-10-0-53-118.us-east-2.compute.internal <none> <none> ovnkube-node-z826v 4/5 CrashLoopBackOff 21 (4m30s ago) 50m 10.0.49.63 ip-10-0-49-63.us-east-2.compute.internal <none> <none> ovnkube-node-zbxx5 5/5 Running 9 50m 10.0.61.20 ip-10-0-61-20.us-east-2.compute.internal <none> <none> 9 40m oc describe pod ovnkube-node-dbfc6 -n openshift-ovn-kubernetes Name: ovnkube-node-dbfc6 Namespace: openshift-ovn-kubernetes Priority: 2000001000 Priority Class Name: system-node-critical Node: ip-10-0-57-108.us-east-2.compute.internal/10.0.57.108 Start Time: Fri, 24 Dec 2021 08:33:30 +0000 Labels: app=ovnkube-node component=network controller-revision-hash=59bf78fddb kubernetes.io/os=linux openshift.io/component=network pod-template-generation=1 type=infra Annotations: networkoperator.openshift.io/ip-family-mode: single-stack Status: Running IP: 10.0.57.108 IPs: IP: 10.0.57.108 Controlled By: DaemonSet/ovnkube-node Containers: ovn-controller: Container ID: cri-o://e3a6d496dc6dc0cffaef9346a4085678566ee6f790c9164a5f00fb60aa2801c7 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48 Port: <none> Host Port: <none> Command: /bin/bash -c set -e if [[ -f "/env/${K8S_NODE}" ]]; then set -o allexport source "/env/${K8S_NODE}" set +o allexport fi echo "$(date -Iseconds) - starting ovn-controller" exec ovn-controller unix:/var/run/openvswitch/db.sock -vfile:off \ --no-chdir --pidfile=/var/run/ovn/ovn-controller.pid \ --syslog-method="null" \ --log-file=/var/log/ovn/acl-audit-log.log \ -vFACILITY:"local0" \ -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt \ -vconsole:"${OVN_LOG_LEVEL}" -vconsole:"acl_log:off" \ -vPATTERN:console:"%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m" \ -vsyslog:"acl_log:info" \ -vfile:"acl_log:info" State: Running Started: Fri, 24 Dec 2021 08:41:16 +0000 Ready: True Restart Count: 1 Requests: cpu: 10m memory: 300Mi Environment: OVN_LOG_LEVEL: info K8S_NODE: (v1:spec.nodeName) Mounts: /dev/log from log-socket (rw) /env from env-overrides (rw) /etc/openvswitch from etc-openvswitch (rw) /etc/ovn/ from etc-openvswitch (rw) /ovn-ca from ovn-ca (rw) /ovn-cert from ovn-cert (rw) /run/openvswitch from run-openvswitch (rw) /run/ovn/ from run-ovn (rw) /var/lib/openvswitch from var-lib-openvswitch (rw) /var/log/ovn from node-log (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2qmxp (ro) ovn-acl-logging: Container ID: cri-o://676fda8f0c7bf11462e18bd6421f4fb6db67f70ccdfd366fa5d147d1e0e0347d Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48 Port: <none> Host Port: <none> Command: /bin/bash -c set -euo pipefail # Rotate audit log files when then get to max size (in bytes) MAXFILESIZE=$(( "50"*1000000 )) LOGFILE=/var/log/ovn/acl-audit-log.log CONTROLLERPID=$(cat /run/ovn/ovn-controller.pid) # Redirect err to null so no messages are shown upon rotation tail -F ${LOGFILE} 2> /dev/null & while true do # Make sure ovn-controller's logfile exists, and get current size in bytes if [ -f "$LOGFILE" ]; then file_size=`du -b ${LOGFILE} | tr -s '\t' ' ' | cut -d' ' -f1` else ovs-appctl -t /var/run/ovn/ovn-controller.${CONTROLLERPID}.ctl vlog/reopen file_size=`du -b ${LOGFILE} | tr -s '\t' ' ' | cut -d' ' -f1` fi if [ $file_size -gt $MAXFILESIZE ];then echo "Rotating OVN ACL Log File" timestamp=`date '+%Y-%m-%dT%H-%M-%S'` mv ${LOGFILE} /var/log/ovn/acl-audit-log.$timestamp.log ovs-appctl -t /run/ovn/ovn-controller.${CONTROLLERPID}.ctl vlog/reopen CONTROLLERPID=$(cat /run/ovn/ovn-controller.pid) fi # sleep for 30 seconds to avoid wasting CPU sleep 30 done State: Running Started: Fri, 24 Dec 2021 08:41:17 +0000 Ready: True Restart Count: 1 Requests: cpu: 10m memory: 20Mi Environment: <none> Mounts: /run/ovn/ from run-ovn (rw) /var/log/ovn from node-log (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2qmxp (ro) kube-rbac-proxy: Container ID: cri-o://e70841efc791cdc5be3970fc2eb42477a525e47beb4390b72d917ae6f03b8a4f Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269 Port: 9103/TCP Host Port: 9103/TCP Command: /bin/bash -c #!/bin/bash set -euo pipefail TLS_PK=/etc/pki/tls/metrics-cert/tls.key TLS_CERT=/etc/pki/tls/metrics-cert/tls.crt # As the secret mount is optional we must wait for the files to be present. # The service is created in monitor.yaml and this is created in sdn.yaml. # If it isn't created there is probably an issue so we want to crashloop. retries=0 TS=$(date +%s) WARN_TS=$(( ${TS} + $(( 20 * 60)) )) HAS_LOGGED_INFO=0 log_missing_certs(){ CUR_TS=$(date +%s) if [[ "${CUR_TS}" -gt "WARN_TS" ]]; then echo $(date -Iseconds) WARN: ovn-node-metrics-cert not mounted after 20 minutes. elif [[ "${HAS_LOGGED_INFO}" -eq 0 ]] ; then echo $(date -Iseconds) INFO: ovn-node-metrics-cert not mounted. Waiting one hour. HAS_LOGGED_INFO=1 fi } while [[ ! -f "${TLS_PK}" || ! -f "${TLS_CERT}" ]] ; do log_missing_certs sleep 5 done echo $(date -Iseconds) INFO: ovn-node-metrics-certs mounted, starting kube-rbac-proxy exec /usr/bin/kube-rbac-proxy \ --logtostderr \ --secure-listen-address=:9103 \ --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 \ --upstream=http://127.0.0.1:29103/ \ --tls-private-key-file=${TLS_PK} \ --tls-cert-file=${TLS_CERT} State: Running Started: Fri, 24 Dec 2021 08:41:17 +0000 Ready: True Restart Count: 1 Requests: cpu: 10m memory: 20Mi Environment: <none> Mounts: /etc/pki/tls/metrics-cert from ovn-node-metrics-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2qmxp (ro) kube-rbac-proxy-ovn-metrics: Container ID: cri-o://98211440f5388a4430fa33f37745ba561f27768b4069f7da072c661000e636dc Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269 Port: 9105/TCP Host Port: 9105/TCP Command: /bin/bash -c #!/bin/bash set -euo pipefail TLS_PK=/etc/pki/tls/metrics-cert/tls.key TLS_CERT=/etc/pki/tls/metrics-cert/tls.crt # As the secret mount is optional we must wait for the files to be present. # The service is created in monitor.yaml and this is created in sdn.yaml. # If it isn't created there is probably an issue so we want to crashloop. retries=0 TS=$(date +%s) WARN_TS=$(( ${TS} + $(( 20 * 60)) )) HAS_LOGGED_INFO=0 log_missing_certs(){ CUR_TS=$(date +%s) if [[ "${CUR_TS}" -gt "WARN_TS" ]]; then echo $(date -Iseconds) WARN: ovn-node-metrics-cert not mounted after 20 minutes. elif [[ "${HAS_LOGGED_INFO}" -eq 0 ]] ; then echo $(date -Iseconds) INFO: ovn-node-metrics-cert not mounted. Waiting one hour. HAS_LOGGED_INFO=1 fi } while [[ ! -f "${TLS_PK}" || ! -f "${TLS_CERT}" ]] ; do log_missing_certs sleep 5 done echo $(date -Iseconds) INFO: ovn-node-metrics-certs mounted, starting kube-rbac-proxy exec /usr/bin/kube-rbac-proxy \ --logtostderr \ --secure-listen-address=:9105 \ --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 \ --upstream=http://127.0.0.1:29105/ \ --tls-private-key-file=${TLS_PK} \ --tls-cert-file=${TLS_CERT} State: Running Started: Fri, 24 Dec 2021 08:41:17 +0000 Ready: True Restart Count: 1 Requests: cpu: 10m memory: 20Mi Environment: <none> Mounts: /etc/pki/tls/metrics-cert from ovn-node-metrics-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2qmxp (ro) ovnkube-node: Container ID: cri-o://0153a86c42278b5c782fa58b2332c761cbc356285933805be18dcaffaae562dc Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48 Port: 29103/TCP Host Port: 29103/TCP Command: /bin/bash -c set -xe if [[ -f "/env/${K8S_NODE}" ]]; then set -o allexport source "/env/${K8S_NODE}" set +o allexport fi echo "I$(date "+%m%d %H:%M:%S.%N") - waiting for db_ip addresses" cp -f /usr/libexec/cni/ovn-k8s-cni-overlay /cni-bin-dir/ ovn_config_namespace=openshift-ovn-kubernetes echo "I$(date "+%m%d %H:%M:%S.%N") - disable conntrack on geneve port" iptables -t raw -A PREROUTING -p udp --dport 6081 -j NOTRACK iptables -t raw -A OUTPUT -p udp --dport 6081 -j NOTRACK ip6tables -t raw -A PREROUTING -p udp --dport 6081 -j NOTRACK ip6tables -t raw -A OUTPUT -p udp --dport 6081 -j NOTRACK retries=0 while true; do # TODO: change to use '--request-timeout=30s', if https://github.com/kubernetes/kubernetes/issues/49343 is fixed. db_ip=$(timeout 30 kubectl get ep -n ${ovn_config_namespace} ovnkube-db -o jsonpath='{.subsets[0].addresses[0].ip}') if [[ -n "${db_ip}" ]]; then break fi (( retries += 1 )) if [[ "${retries}" -gt 40 ]]; then echo "E$(date "+%m%d %H:%M:%S.%N") - db endpoint never came up" exit 1 fi echo "I$(date "+%m%d %H:%M:%S.%N") - waiting for db endpoint" sleep 5 done echo "I$(date "+%m%d %H:%M:%S.%N") - starting ovnkube-node db_ip ${db_ip}" if [ "shared" == "shared" ]; then gateway_mode_flags="--gateway-mode shared --gateway-interface br-ex" elif [ "shared" == "local" ]; then gateway_mode_flags="--gateway-mode local --gateway-interface br-ex" else echo "Invalid OVN_GATEWAY_MODE: \"shared\". Must be \"local\" or \"shared\"." exit 1 fi export_network_flows_flags= if [[ -n "${NETFLOW_COLLECTORS}" ]] ; then export_network_flows_flags="--netflow-targets ${NETFLOW_COLLECTORS}" fi if [[ -n "${SFLOW_COLLECTORS}" ]] ; then export_network_flows_flags="$export_network_flows_flags --sflow-targets ${SFLOW_COLLECTORS}" fi if [[ -n "${IPFIX_COLLECTORS}" ]] ; then export_network_flows_flags="$export_network_flows_flags --ipfix-targets ${IPFIX_COLLECTORS}" fi if [[ -n "${IPFIX_CACHE_MAX_FLOWS}" ]] ; then export_network_flows_flags="$export_network_flows_flags --ipfix-cache-max-flows ${IPFIX_CACHE_MAX_FLOWS}" fi if [[ -n "${IPFIX_CACHE_ACTIVE_TIMEOUT}" ]] ; then export_network_flows_flags="$export_network_flows_flags --ipfix-cache-active-timeout ${IPFIX_CACHE_ACTIVE_TIMEOUT}" fi if [[ -n "${IPFIX_SAMPLING}" ]] ; then export_network_flows_flags="$export_network_flows_flags --ipfix-sampling ${IPFIX_SAMPLING}" fi gw_interface_flag= # if br-ex1 is configured on the node, we want to use it for external gateway traffic if [ -d /sys/class/net/br-ex1 ]; then gw_interface_flag="--exgw-interface=br-ex1" fi node_mgmt_port_netdev_flags= if [[ -n "${OVNKUBE_NODE_MGMT_PORT_NETDEV}" ]] ; then node_mgmt_port_netdev_flags="--ovnkube-node-mgmt-port-netdev ${OVNKUBE_NODE_MGMT_PORT_NETDEV}" fi exec /usr/bin/ovnkube --init-node "${K8S_NODE}" \ --nb-address "ssl:10.0.57.131:9641,ssl:10.0.61.89:9641,ssl:10.0.69.211:9641" \ --sb-address "ssl:10.0.57.131:9642,ssl:10.0.61.89:9642,ssl:10.0.69.211:9642" \ --nb-client-privkey /ovn-cert/tls.key \ --nb-client-cert /ovn-cert/tls.crt \ --nb-client-cacert /ovn-ca/ca-bundle.crt \ --nb-cert-common-name "ovn" \ --sb-client-privkey /ovn-cert/tls.key \ --sb-client-cert /ovn-cert/tls.crt \ --sb-client-cacert /ovn-ca/ca-bundle.crt \ --sb-cert-common-name "ovn" \ --config-file=/run/ovnkube-config/ovnkube.conf \ --loglevel "${OVN_KUBE_LOG_LEVEL}" \ --inactivity-probe="${OVN_CONTROLLER_INACTIVITY_PROBE}" \ ${gateway_mode_flags} \ --metrics-bind-address "127.0.0.1:29103" \ --ovn-metrics-bind-address "127.0.0.1:29105" \ --metrics-enable-pprof \ ${export_network_flows_flags} \ ${gw_interface_flag} State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: e\n\nup : true\n\nup : true\n\nup : false\n\nup : true\n\nup : true\n\nup : true\n" I1224 09:13:48.544922 21412 ovs.go:208] exec(3): stderr: "" I1224 09:13:48.544944 21412 node.go:315] Detected support for port binding with external IDs I1224 09:13:48.545048 21412 ovs.go:204] exec(4): /usr/bin/ovs-vsctl --timeout=15 -- --if-exists del-port br-int k8s-ip-10-0-57- -- --may-exist add-port br-int ovn-k8s-mp0 -- set interface ovn-k8s-mp0 type=internal mtu_request=8901 external-ids:iface-id=k8s-ip-10-0-57-108.us-east-2.compute.internal I1224 09:13:48.550909 21412 ovs.go:207] exec(4): stdout: "" I1224 09:13:48.550934 21412 ovs.go:208] exec(4): stderr: "" I1224 09:13:48.550947 21412 ovs.go:204] exec(5): /usr/bin/ovs-vsctl --timeout=15 --if-exists get interface ovn-k8s-mp0 mac_in_use I1224 09:13:48.556400 21412 ovs.go:207] exec(5): stdout: "\"ae:0d:df:25:3b:7d\"\n" I1224 09:13:48.556436 21412 ovs.go:208] exec(5): stderr: "" I1224 09:13:48.556461 21412 ovs.go:204] exec(6): /usr/bin/ovs-vsctl --timeout=15 set interface ovn-k8s-mp0 mac=ae\:0d\:df\:25\:3b\:7d I1224 09:13:48.561730 21412 ovs.go:207] exec(6): stdout: "" I1224 09:13:48.561766 21412 ovs.go:208] exec(6): stderr: "" I1224 09:13:48.597987 21412 gateway_init.go:261] Initializing Gateway Functionality I1224 09:13:48.598172 21412 gateway_localnet.go:131] Node local addresses initialized to: map[10.0.57.108:{10.0.48.0 fffff000} 10.128.0.2:{10.128.0.0 fffffe00} 127.0.0.1:{127.0.0.0 ff000000} ::1:{::1 ffffffffffffffffffffffffffffffff} fe80::a3:e3ff:fe4c:4e0e:{fe80:: ffffffffffffffff0000000000000000} fe80::a8b9:7fff:fe7a:66f9:{fe80:: ffffffffffffffff0000000000000000} fe80::ac0d:dfff:fe25:3b7d:{fe80:: ffffffffffffffff0000000000000000}] I1224 09:13:48.598364 21412 helper_linux.go:74] Found default gateway interface eth0 10.0.48.1 F1224 09:13:48.598420 21412 ovnkube.go:133] could not find IP addresses: failed to lookup link br-ex: Link not found Exit Code: 1 Started: Fri, 24 Dec 2021 09:13:47 +0000 Finished: Fri, 24 Dec 2021 09:13:48 +0000 Ready: False Restart Count: 16 Requests: cpu: 10m memory: 300Mi Readiness: exec [test -f /etc/cni/net.d/10-ovn-kubernetes.conf] delay=5s timeout=1s period=5s #success=1 #failure=3 Environment: KUBERNETES_SERVICE_PORT: 6443 KUBERNETES_SERVICE_HOST: api-int.huirwang-1224b.qe.devcluster.openshift.com OVN_CONTROLLER_INACTIVITY_PROBE: 180000 OVN_KUBE_LOG_LEVEL: 4 K8S_NODE: (v1:spec.nodeName) Mounts: /cni-bin-dir from host-cni-bin (rw) /env from env-overrides (rw) /etc/cni/net.d from host-cni-netd (rw) /etc/openvswitch from etc-openvswitch (rw) /etc/ovn/ from etc-openvswitch (rw) /etc/systemd/system from systemd-units (ro) /host from host-slash (ro) /ovn-ca from ovn-ca (rw) /ovn-cert from ovn-cert (rw) /run/netns from host-run-netns (ro) /run/openvswitch from run-openvswitch (rw) /run/ovn-kubernetes/ from host-run-ovn-kubernetes (rw) /run/ovn/ from run-ovn (rw) /run/ovnkube-config/ from ovnkube-config (rw) /var/lib/cni/networks/ovn-k8s-cni-overlay from host-var-lib-cni-networks-ovn-kubernetes (rw) /var/lib/openvswitch from var-lib-openvswitch (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2qmxp (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: systemd-units: Type: HostPath (bare host directory volume) Path: /etc/systemd/system HostPathType: host-slash: Type: HostPath (bare host directory volume) Path: / HostPathType: host-run-netns: Type: HostPath (bare host directory volume) Path: /run/netns HostPathType: var-lib-openvswitch: Type: HostPath (bare host directory volume) Path: /var/lib/openvswitch/data HostPathType: etc-openvswitch: Type: HostPath (bare host directory volume) Path: /etc/openvswitch HostPathType: run-openvswitch: Type: HostPath (bare host directory volume) Path: /var/run/openvswitch HostPathType: run-ovn: Type: HostPath (bare host directory volume) Path: /var/run/ovn HostPathType: node-log: Type: HostPath (bare host directory volume) Path: /var/log/ovn HostPathType: log-socket: Type: HostPath (bare host directory volume) Path: /dev/log HostPathType: host-run-ovn-kubernetes: Type: HostPath (bare host directory volume) Path: /run/ovn-kubernetes HostPathType: host-cni-bin: Type: HostPath (bare host directory volume) Path: /var/lib/cni/bin HostPathType: host-cni-netd: Type: HostPath (bare host directory volume) Path: /var/run/multus/cni/net.d HostPathType: host-var-lib-cni-networks-ovn-kubernetes: Type: HostPath (bare host directory volume) Path: /var/lib/cni/networks/ovn-k8s-cni-overlay HostPathType: ovnkube-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: ovnkube-config Optional: false env-overrides: Type: ConfigMap (a volume populated by a ConfigMap) Name: env-overrides Optional: true ovn-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: ovn-ca Optional: false ovn-cert: Type: Secret (a volume populated by a Secret) SecretName: ovn-cert Optional: false ovn-node-metrics-cert: Type: Secret (a volume populated by a Secret) SecretName: ovn-node-metrics-cert Optional: true kube-api-access-2qmxp: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux Tolerations: op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 37m default-scheduler Successfully assigned openshift-ovn-kubernetes/ovnkube-node-dbfc6 to ip-10-0-57-108.us-east-2.compute.internal Normal Pulling 37m kubelet Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" Normal Pulled 37m kubelet Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" in 14.722538563s Normal Created 37m kubelet Created container ovn-controller Normal Started 37m kubelet Started container ovn-controller Normal Pulled 37m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" already present on machine Normal Created 37m kubelet Created container ovn-acl-logging Normal Started 37m kubelet Started container ovn-acl-logging Normal Pulled 37m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269" already present on machine Normal Created 37m kubelet Created container kube-rbac-proxy Normal Started 37m kubelet Started container kube-rbac-proxy Normal Pulled 37m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269" already present on machine Normal Created 37m kubelet Created container kube-rbac-proxy-ovn-metrics Normal Started 37m kubelet Started container kube-rbac-proxy-ovn-metrics Normal Pulled 37m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" already present on machine Normal Created 37m kubelet Created container ovnkube-node Normal Started 37m kubelet Started container ovnkube-node Warning Unhealthy 36m (x9 over 37m) kubelet Readiness probe failed: Normal Created 29m kubelet Created container ovn-controller Normal Started 29m kubelet Started container ovn-controller Normal Pulled 29m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" already present on machine Normal Pulled 29m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" already present on machine Normal Started 29m kubelet Started container kube-rbac-proxy-ovn-metrics Normal Created 29m kubelet Created container ovn-acl-logging Normal Started 29m kubelet Started container ovn-acl-logging Normal Pulled 29m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269" already present on machine Normal Created 29m kubelet Created container kube-rbac-proxy Normal Pulled 29m kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4de366563e3175887872d54a03d4e6d79195ea011f28510118a78e3d4d0269" already present on machine Normal Created 29m kubelet Created container kube-rbac-proxy-ovn-metrics Normal Started 29m kubelet Started container kube-rbac-proxy Normal Created 29m (x2 over 29m) kubelet Created container ovnkube-node Normal Started 29m (x2 over 29m) kubelet Started container ovnkube-node Warning Unhealthy 29m (x5 over 29m) kubelet Readiness probe failed: Normal Pulled 28m (x3 over 29m) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:763084f718a274941e70cbf9da008c69348d0acdbbe48175336d841681187f48" already present on machine Warning BackOff <invalid> (x141 over 29m) kubelet Back-off restarting failed container Actual results: Expected results: SDN migration should be successful. Additional info:
This is a regression introduced by https://github.com/openshift/machine-config-operator/pull/2742. We need to configure NetworkManager use 'keyfile' as the primary plugin for RHEL workers.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056