Description of problem: Install OCP 4.4.0-0.nightly-2020-01-06-072200 with OVNKubernetes on AWS failed, get following ERROR: ERROR Cluster operator network Degraded is True with RolloutHung: DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress Version-Release number of selected component (if applicable): # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 3m10s Unable to apply 4.4.0-0.nightly-2020-01-06-072200: an unknown error has occurred How reproducible: Always Steps to Reproduce: 1. Install OCP 4.4.0-0.nightly-2020-01-06-072200 with OVNKubernetes on AWS 2. Installation failed, get error: ERROR Cluster operator network Degraded is True with RolloutHung: DaemonSet "openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress 3. In bootstrap machine, find ovnkube-node pod in CrashLoopBackOff status, check logs of ovnkube-node container, get error "kubectl: command not found", seems like kubectl package missed in image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55e9bb82599f0f3ddd65ea9b7085290f770228a163a0ca0c8b810e34ab9f38d9 # oc get pod -o wide -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-7djdn 4/4 Running 0 2m24s 10.0.130.88 ip-10-0-130-88.us-east-2.compute.internal <none> <none> ovnkube-master-fkz47 4/4 Running 0 2m24s 10.0.149.152 ip-10-0-149-152.us-east-2.compute.internal <none> <none> ovnkube-master-q488f 4/4 Running 0 2m24s 10.0.162.43 ip-10-0-162-43.us-east-2.compute.internal <none> <none> ovnkube-node-5qkll 2/3 CrashLoopBackOff 4 2m24s 10.0.130.88 ip-10-0-130-88.us-east-2.compute.internal <none> <none> ovnkube-node-h49hs 2/3 CrashLoopBackOff 4 2m24s 10.0.162.43 ip-10-0-162-43.us-east-2.compute.internal <none> <none> ovnkube-node-xp4fv 2/3 CrashLoopBackOff 4 2m24s 10.0.149.152 ip-10-0-149-152.us-east-2.compute.internal <none> <none> # oc get pods ovnkube-node-5qkll -n openshift-ovn-kubernetes -o jsonpath='{.spec.containers[*].name}' ovs-daemons ovn-controller ovnkube-node # oc logs ovnkube-node-5qkll -c ovnkube-node -n openshift-ovn-kubernetes + [[ -f /env/ip-10-0-130-88.us-east-2.compute.internal ]] + cp -f /usr/libexec/cni/ovn-k8s-cni-overlay /cni-bin-dir/ + ovn_config_namespace=openshift-ovn-kubernetes + retries=0 + true ++ kubectl get ep -n openshift-ovn-kubernetes ovnkube-db -o 'jsonpath={.subsets[0].addresses[0].ip}' /bin/bash: line 10: kubectl: command not found + db_ip= [root@ip-10-0-9-118 ~]# oc describe pod ovnkube-node-5qkll -c ovnkube-node -n openshift-ovn-kubernetes Error: unknown shorthand flag: 'c' in -c See 'oc describe --help' for usage. [root@ip-10-0-9-118 ~]# oc describe pod ovnkube-node-5qkll -n openshift-ovn-kubernetes Name: ovnkube-node-5qkll Namespace: openshift-ovn-kubernetes Priority: 2000001000 Priority Class Name: system-node-critical Node: ip-10-0-130-88.us-east-2.compute.internal/10.0.130.88 Start Time: Wed, 08 Jan 2020 02:55:42 +0000 Labels: app=ovnkube-node component=network controller-revision-hash=747fb98b88 kubernetes.io/os=linux openshift.io/component=network pod-template-generation=1 type=infra Annotations: <none> Status: Running IP: 10.0.130.88 IPs: IP: 10.0.130.88 Controlled By: DaemonSet/ovnkube-node Containers: ovs-daemons: Container ID: cri-o://7c4b8ef9ee57640d8d800e96fa9b787f34d9c9b5f0525921cc03cd98be704959 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55e9bb82599f0f3ddd65ea9b7085290f770228a163a0ca0c8b810e34ab9f38d9 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55e9bb82599f0f3ddd65ea9b7085290f770228a163a0ca0c8b810e34ab9f38d9 Port: <none> Host Port: <none> Command: /bin/bash -c #!/bin/bash set -e if [[ -f "/env/${K8S_NODE}" ]]; then set -o allexport source "/env/${K8S_NODE}" set +o allexport fi if [[ -f "/old/openvswitch/conf.db" && ! -f "/etc/openvswitch/conf.db" ]]; then mv /old/openvswitch/conf.db /etc/openvswitch/conf.db fi chown -R openvswitch:openvswitch /run/openvswitch chown -R openvswitch:openvswitch /etc/openvswitch function quit { /usr/share/openvswitch/scripts/ovs-ctl stop exit 0 } trap quit SIGTERM /usr/share/openvswitch/scripts/ovs-ctl start --ovs-user=openvswitch:openvswitch --system-id=random ovs-appctl vlog/set "file:${OVS_LOG_LEVEL}" /usr/share/openvswitch/scripts/ovs-ctl --protocol=udp --dport=6081 enable-protocol tail -F --pid=$(cat /var/run/openvswitch/ovs-vswitchd.pid) /var/log/openvswitch/ovs-vswitchd.log & tail -F --pid=$(cat /var/run/openvswitch/ovsdb-server.pid) /var/log/openvswitch/ovsdb-server.log & wait State: Running Started: Wed, 08 Jan 2020 02:55:55 +0000 Ready: True Restart Count: 0 Requests: cpu: 100m memory: 300Mi Liveness: exec [/usr/share/openvswitch/scripts/ovs-ctl status] delay=15s timeout=1s period=5s #success=1 #failure=3 Readiness: exec [/usr/share/openvswitch/scripts/ovs-ctl status] delay=15s timeout=1s period=5s #success=1 #failure=3 Environment: OVS_LOG_LEVEL: info K8S_NODE: (v1:spec.nodeName) Mounts: /env from env-overrides (rw) /etc/openvswitch from etc-openvswitch (rw) /lib/modules from host-modules (ro) /old/openvswitch from old-openvswitch-database (rw) /run/openvswitch from run-openvswitch (rw) /sys from host-sys (ro) /var/lib/openvswitch from var-lib-openvswitch (rw) /var/run/secrets/kubernetes.io/serviceaccount from ovn-kubernetes-node-token-dwfgf (ro) ovn-controller: Container ID: cri-o://d4d7969789fffa369f6cdec23932803f812644943e8902f154c0f81b4716dd7b Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55e9bb82599f0f3ddd65ea9b7085290f770228a163a0ca0c8b810e34ab9f38d9 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55e9bb82599f0f3ddd65ea9b7085290f770228a163a0ca0c8b810e34ab9f38d9 Port: <none> Host Port: <none> Command: /bin/bash -c set -e if [[ -f "/env/${K8S_NODE}" ]]; then set -o allexport source "/env/${K8S_NODE}" set +o allexport fi echo /ovn-cert/tls.key cat /ovn-cert/tls.key echo /ovn-cert/tls.crt cat /ovn-cert/tls.crt echo /ovn-ca/ca-bundle.crt cat /ovn-ca/ca-bundle.crt exec ovn-controller unix:/var/run/openvswitch/db.sock -vfile:off \ --no-chdir --pidfile=/var/run/openvswitch/ovn-controller.pid \ -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt \ -vconsole:"${OVN_LOG_LEVEL}" State: Running Started: Wed, 08 Jan 2020 02:55:55 +0000 Ready: True Restart Count: 0 Requests: cpu: 100m memory: 300Mi Environment: OVN_LOG_LEVEL: info K8S_NODE: (v1:spec.nodeName) Mounts: /env from env-overrides (rw) /etc/openvswitch from etc-openvswitch (rw) /ovn-ca from ovn-ca (rw) /ovn-cert from ovn-cert (rw) /run/openvswitch from run-openvswitch (rw) /var/lib/openvswitch from var-lib-openvswitch (rw) /var/run/secrets/kubernetes.io/serviceaccount from ovn-kubernetes-node-token-dwfgf (ro) ovnkube-node: Container ID: cri-o://9741afe526e2f7a04ec7dd07e537e0018ffd4344ba65c8f9b56f2c1480edd6a7 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55e9bb82599f0f3ddd65ea9b7085290f770228a163a0ca0c8b810e34ab9f38d9 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55e9bb82599f0f3ddd65ea9b7085290f770228a163a0ca0c8b810e34ab9f38d9 Port: 9101/TCP Host Port: 9101/TCP Command: /bin/bash -c set -xe if [[ -f "/env/${K8S_NODE}" ]]; then set -o allexport source "/env/${K8S_NODE}" set +o allexport fi cp -f /usr/libexec/cni/ovn-k8s-cni-overlay /cni-bin-dir/ ovn_config_namespace=openshift-ovn-kubernetes retries=0 while true; do db_ip=$(kubectl get ep -n ${ovn_config_namespace} ovnkube-db -o jsonpath='{.subsets[0].addresses[0].ip}') if [[ -n "${db_ip}" ]]; then break fi (( retries += 1 )) if [[ "${retries}" -gt 40 ]]; then echo "db endpoint never came up" exit 1 fi echo "waiting for db endpoint" sleep 5 done hybrid_overlay_flags= if [[ -n "" ]]; then hybrid_overlay_flags="--enable-hybrid-overlay" if [[ -n "" ]]; then hybrid_overlay_flags="${hybrid_overlay_flags} --hybrid-overlay-cluster-subnets=" fi fi OVN_NODES_ARRAY=(ip-10-0-130-88.us-east-2.compute.internal ip-10-0-149-152.us-east-2.compute.internal ip-10-0-162-43.us-east-2.compute.internal) nb_addr_list="" sb_addr_list="" for i in "${!OVN_NODES_ARRAY[@]}"; do if [[ $i != 0 ]]; then nb_addr_list="${nb_addr_list}," sb_addr_list="${sb_addr_list}," fi host=$(getent ahostsv4 "${OVN_NODES_ARRAY[$i]}" | grep RAW | awk '{print $1}') nb_addr_list="${nb_addr_list}ssl://${host}:9641" sb_addr_list="${sb_addr_list}ssl://${host}:9642" done echo /ovn-cert/tls.key cat /ovn-cert/tls.key echo /ovn-cert/tls.crt cat /ovn-cert/tls.crt echo /ovn-ca/ca-bundle.crt cat /ovn-ca/ca-bundle.crt exec /usr/bin/ovnkube --init-node "${K8S_NODE}" \ --cluster-subnets "${OVN_NET_CIDR}" \ --k8s-service-cidr "${OVN_SVC_CIDR}" \ --k8s-apiserver "https://api-int.sgao-cluster.qe.devcluster.openshift.com:6443" \ --ovn-config-namespace ${ovn_config_namespace} \ --nb-address "${nb_addr_list}" \ --sb-address "${sb_addr_list}" \ --nb-client-privkey /ovn-cert/tls.key \ --nb-client-cert /ovn-cert/tls.crt \ --nb-client-cacert /ovn-ca/ca-bundle.crt \ --sb-client-privkey /ovn-cert/tls.key \ --sb-client-cert /ovn-cert/tls.crt \ --sb-client-cacert /ovn-ca/ca-bundle.crt \ --config-file=/run/ovnkube-config/ovnkube.conf \ --loglevel "${OVN_KUBE_LOG_LEVEL}" \ ${hybrid_overlay_flags} \ --pidfile /var/run/openvswitch/ovnkube-node.pid \ --metrics-bind-address "0.0.0.0:9101" State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: + [[ -f /env/ip-10-0-130-88.us-east-2.compute.internal ]] + cp -f /usr/libexec/cni/ovn-k8s-cni-overlay /cni-bin-dir/ + ovn_config_namespace=openshift-ovn-kubernetes + retries=0 + true ++ kubectl get ep -n openshift-ovn-kubernetes ovnkube-db -o 'jsonpath={.subsets[0].addresses[0].ip}' /bin/bash: line 10: kubectl: command not found + db_ip= Exit Code: 127 Started: Wed, 08 Jan 2020 03:27:02 +0000 Finished: Wed, 08 Jan 2020 03:27:02 +0000 Ready: False Restart Count: 11 Requests: cpu: 100m memory: 300Mi Readiness: exec [test -f /etc/cni/net.d/10-ovn-kubernetes.conf] delay=5s timeout=1s period=5s #success=1 #failure=3 Environment: OVN_HYBRID_OVERLAY_ENABLE: OVN_HYBRID_OVERLAY_NET_CIDR: KUBERNETES_SERVICE_PORT: 6443 KUBERNETES_SERVICE_HOST: api-int.sgao-cluster.qe.devcluster.openshift.com OVN_KUBE_LOG_LEVEL: 4 K8S_NODE: (v1:spec.nodeName) Mounts: /cni-bin-dir from host-cni-bin (rw) /env from env-overrides (rw) /etc/cni/net.d from host-cni-netd (rw) /etc/openvswitch from etc-openvswitch (rw) /host from host-slash (ro) /ovn-ca from ovn-ca (rw) /ovn-cert from ovn-cert (rw) /run/netns from host-run-netns (ro) /run/openvswitch from run-openvswitch (rw) /run/ovn-kubernetes/ from host-run-ovn-kubernetes (rw) /run/ovnkube-config/ from ovnkube-config (rw) /var/lib/cni/networks/ovn-k8s-cni-overlay from host-var-lib-cni-networks-ovn-kubernetes (rw) /var/lib/openvswitch from var-lib-openvswitch (rw) /var/run/secrets/kubernetes.io/serviceaccount from ovn-kubernetes-node-token-dwfgf (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: host-slash: Type: HostPath (bare host directory volume) Path: / HostPathType: host-modules: Type: HostPath (bare host directory volume) Path: /lib/modules HostPathType: host-run-netns: Type: HostPath (bare host directory volume) Path: /run/netns HostPathType: var-lib-openvswitch: Type: HostPath (bare host directory volume) Path: /var/lib/openvswitch/data HostPathType: etc-openvswitch: Type: HostPath (bare host directory volume) Path: /var/lib/openvswitch/etc HostPathType: run-openvswitch: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> old-openvswitch-database: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> host-run-ovn-kubernetes: Type: HostPath (bare host directory volume) Path: /run/ovn-kubernetes HostPathType: host-sys: Type: HostPath (bare host directory volume) Path: /sys HostPathType: host-cni-bin: Type: HostPath (bare host directory volume) Path: /var/lib/cni/bin HostPathType: host-cni-netd: Type: HostPath (bare host directory volume) Path: /var/run/multus/cni/net.d HostPathType: host-var-lib-cni-networks-ovn-kubernetes: Type: HostPath (bare host directory volume) Path: /var/lib/cni/networks/ovn-k8s-cni-overlay HostPathType: ovnkube-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: ovnkube-config Optional: false env-overrides: Type: ConfigMap (a volume populated by a ConfigMap) Name: env-overrides Optional: true ovn-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: ovn-ca Optional: false ovn-cert: Type: Secret (a volume populated by a Secret) SecretName: ovn-cert Optional: false ovn-kubernetes-node-token-dwfgf: Type: Secret (a volume populated by a Secret) SecretName: ovn-kubernetes-node-token-dwfgf Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux Tolerations: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned openshift-ovn-kubernetes/ovnkube-node-5qkll to ip-10-0-130-88.us-east-2.compute.internal Normal Pulling 36m kubelet, ip-10-0-130-88.us-east-2.compute.internal Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55e9bb82599f0f3ddd65ea9b7085290f770228a163a0ca0c8b810e34ab9f38d9" Normal Pulled 35m kubelet, ip-10-0-130-88.us-east-2.compute.internal Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55e9bb82599f0f3ddd65ea9b7085290f770228a163a0ca0c8b810e34ab9f38d9" Normal Created 35m kubelet, ip-10-0-130-88.us-east-2.compute.internal Created container ovs-daemons Normal Started 35m kubelet, ip-10-0-130-88.us-east-2.compute.internal Started container ovs-daemons Normal Pulled 35m kubelet, ip-10-0-130-88.us-east-2.compute.internal Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55e9bb82599f0f3ddd65ea9b7085290f770228a163a0ca0c8b810e34ab9f38d9" already present on machine Normal Created 35m kubelet, ip-10-0-130-88.us-east-2.compute.internal Created container ovn-controller Normal Started 35m kubelet, ip-10-0-130-88.us-east-2.compute.internal Started container ovn-controller Normal Pulled 35m (x4 over 35m) kubelet, ip-10-0-130-88.us-east-2.compute.internal Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:55e9bb82599f0f3ddd65ea9b7085290f770228a163a0ca0c8b810e34ab9f38d9" already present on machine Normal Created 35m (x4 over 35m) kubelet, ip-10-0-130-88.us-east-2.compute.internal Created container ovnkube-node Normal Started 35m (x4 over 35m) kubelet, ip-10-0-130-88.us-east-2.compute.internal Started container ovnkube-node Warning BackOff 53s (x162 over 35m) kubelet, ip-10-0-130-88.us-east-2.compute.internal Back-off restarting failed container Actual results: Installation failed Expected results: Installation pass Additional info:
This bug fixed in OCP 4.4.0-0.nightly-2020-01-12-032939, Version: # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-01-12-032939 True False 17m Cluster version is 4.4.0-0.nightly-2020-01-12-032939 Steps: 1, Install OCP 4.4 4.4.0-0.nightly-2020-01-12-032939 and succeed 2, Check ovnkube-node image updated # oc describe pod ovnkube-node-2sm9v -n openshift-ovn-kubernetes | grep "Image ID" Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ba36719f8d038c93b2ba4b8de7f12846d7b96fe812c64b2c74242d31e3061092 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ba36719f8d038c93b2ba4b8de7f12846d7b96fe812c64b2c74242d31e3061092
Hey @gaoshang , so is this fixed then , can we close the bug?
(In reply to Ricardo Carrillo Cruz from comment #2) > Hey @gaoshang , so is this fixed then , can we close the bug? According to above comment, moved bug status to VERIFIED, thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581