Created attachment 1762537 [details] oc adm must gather logs Created attachment 1762537 [details] oc adm must gather logs Created attachment 1762537 [details] oc adm must gather logs Created attachment 1762537 [details] oc adm must gather logs Created attachment 1762537 [details] oc adm must gather logs Description of problem: machine-config pod in containercreating state after migration Version-Release number of selected component (if applicable): 4.8.0-0.nightly-ppc64le-2021-03-08-045421 How reproducible: After the OVNKube migration gets completed, it has been seen that co get degraded and unstable and the pods get into container creating state. [root@ktania-48-bastion ~]# oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}' OVNKubernetes [root@ktania-48-bastion ~]# oc get machineconfigpool -n openshift-machine-config-operator NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-39eb02fc1740972313bfb43b25984015 True False False 3 3 3 0 40h worker rendered-worker-0eaf2d65761bd2fbb9984835a4986e26 True False False 2 2 2 0 40h [root@ktania-48-bastion ~]# oc get csr | grep "Pending" [root@ktania-48-bastion ~]# oc get nodes NAME STATUS ROLES AGE VERSION master-0 Ready master 42h v1.20.0+69d7e87 master-1 Ready master 42h v1.20.0+69d7e87 master-2 Ready master 42h v1.20.0+69d7e87 worker-0 Ready worker 42h v1.20.0+69d7e87 worker-1 Ready worker 42h v1.20.0+69d7e87 [root@ktania-48-bastion ~]# oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication 4.8.0-0.nightly-ppc64le-2021-03-08-045421 False True False 23h baremetal 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h cloud-credential 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h cluster-autoscaler 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h config-operator 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h console 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False True 39h csi-snapshot-controller 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h dns 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h etcd 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h image-registry 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 39h ingress 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 39h insights 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 39h kube-apiserver 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h kube-controller-manager 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h kube-scheduler 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h kube-storage-version-migrator 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 39h machine-api 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h machine-approver 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h machine-config 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 39h marketplace 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h monitoring 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 39h network 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True True True 40h node-tuning 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h openshift-apiserver 4.8.0-0.nightly-ppc64le-2021-03-08-045421 False False False 23h openshift-controller-manager 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h openshift-samples 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h operator-lifecycle-manager 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h operator-lifecycle-manager-catalog 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h operator-lifecycle-manager-packageserver 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 23h service-ca 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h storage 4.8.0-0.nightly-ppc64le-2021-03-08-045421 True False False 40h [root@ktania-48-bastion ~]# oc get pod -n openshift-machine-config-operator NAME READY STATUS RESTARTS AGE machine-config-controller-6f467668cd-977kr 0/1 ContainerCreating 0 6h16m machine-config-daemon-8d9bp 2/2 Running 0 42h machine-config-daemon-9nqkg 2/2 Running 0 42h machine-config-daemon-bsvhl 2/2 Running 0 42h machine-config-daemon-cgzsq 2/2 Running 0 42h machine-config-daemon-f7b5h 2/2 Running 0 42h machine-config-operator-86c7698f5f-hlt2n 0/1 ContainerCreating 0 26m machine-config-server-kp9s7 1/1 Running 0 42h machine-config-server-qw4m6 1/1 Running 0 42h machine-config-server-tnvg8 1/1 Running 0 42h [root@ktania-48-bastion ~]# oc describe pod machine-config-controller-6f467668cd-ws6z9 -n openshift-machine-config-operator Name: machine-config-controller-6f467668cd-ws6z9 Namespace: openshift-machine-config-operator Priority: 2000000000 Priority Class Name: system-cluster-critical Node: master-1/9.114.99.134 Start Time: Wed, 10 Mar 2021 01:05:14 -0500 Labels: k8s-app=machine-config-controller pod-template-hash=6f467668cd Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.131.0.24/23"],"mac_address":"0a:58:0a:83:00:18","gateway_ips":["10.131.0.1"],"ip_address":"10.131.0.24/23"... Status: Terminating (lasts 4h2m) Termination Grace Period: 30s IP: IPs: <none> Controlled By: ReplicaSet/machine-config-controller-6f467668cd Containers: machine-config-controller: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3a821d740256cb7b9951add0ef0af6bbe1870433c439b0aaf7b8ffeb5ef1655e Image ID: Port: <none> Host Port: <none> Command: /usr/bin/machine-config-controller Args: start --resourcelock-namespace=openshift-machine-config-operator --v=2 State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 20m memory: 50Mi Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from machine-config-controller-token-lrjn9 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: machine-config-controller-token-lrjn9: Type: Secret (a volume populated by a Secret) SecretName: machine-config-controller-token-lrjn9 Optional: false QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 120s node.kubernetes.io/unreachable:NoExecute op=Exists for 120s Events: <none> [root@ktania-48-bastion ~]# oc get pods -A | grep -v "Running\|Completed" NAMESPACE NAME READY STATUS RESTARTS AGE nfs-provisioner nfs-client-provisioner-65ddb449dd-5b5t8 0/1 ContainerCreating 0 22h openshift-apiserver-operator openshift-apiserver-operator-77798d5ddf-vswwc 0/1 ContainerCreating 0 23h openshift-apiserver apiserver-6cbd49df69-44x5d 0/2 Pending 0 4h2m openshift-apiserver apiserver-6cbd49df69-7fc9s 0/2 Terminating 0 23h openshift-apiserver apiserver-6cbd49df69-7mmq4 0/2 Init:0/1 0 23h openshift-apiserver apiserver-6cbd49df69-jttqx 0/2 Init:0/1 0 23h openshift-authentication-operator authentication-operator-644777f6ff-jf742 0/1 ContainerCreating 0 23h openshift-authentication oauth-openshift-659b4b6565-chchm 0/1 Pending 0 4h2m openshift-authentication oauth-openshift-659b4b6565-fgw5d 0/1 ContainerCreating 0 23h openshift-authentication oauth-openshift-659b4b6565-qkl59 0/1 Terminating 0 23h openshift-cloud-credential-operator cloud-credential-operator-7844cd5f7b-7n65x 0/2 ContainerCreating 0 23h openshift-cluster-machine-approver machine-approver-568b48b94d-dbgnw 1/2 CrashLoopBackOff 77 23h openshift-cluster-node-tuning-operator cluster-node-tuning-operator-777f99848d-948tk 0/1 Terminating 0 23h openshift-cluster-node-tuning-operator cluster-node-tuning-operator-777f99848d-jvhfs 0/1 Pending 0 4h2m openshift-cluster-samples-operator cluster-samples-operator-556b4f4958-fwfqn 0/2 ContainerCreating 0 23h openshift-cluster-storage-operator cluster-storage-operator-57bb9bf6d5-qcdbf 0/1 Pending 0 4h2m openshift-cluster-storage-operator cluster-storage-operator-57bb9bf6d5-xgppm 0/1 Terminating 0 23h openshift-cluster-storage-operator csi-snapshot-controller-5ff576bc58-qmlzp 0/1 Terminating 0 23h openshift-cluster-storage-operator csi-snapshot-controller-5ff576bc58-wst8r 0/1 Pending 0 4h2m openshift-cluster-storage-operator csi-snapshot-controller-operator-5bc6875cd6-qrgd4 0/1 ContainerCreating 0 23h openshift-cluster-storage-operator csi-snapshot-webhook-86dd954b68-8s2pk 0/1 Terminating 0 23h openshift-cluster-storage-operator csi-snapshot-webhook-86dd954b68-dnwsg 0/1 Pending 0 4h2m openshift-config-operator openshift-config-operator-654744db69-lpt9w 0/1 ContainerCreating 0 23h openshift-console-operator console-operator-75cfcc996f-cw9t7 0/1 Terminating 0 23h openshift-console-operator console-operator-75cfcc996f-zxrf5 0/1 Pending 0 4h2m openshift-console console-7c84c6885-92w2m 0/1 Pending 0 4h2m openshift-console console-7c84c6885-f2snv 0/1 Terminating 0 23h openshift-console console-7c84c6885-jkbmk 0/1 Pending 0 4h2m openshift-console console-7c84c6885-lnn6l 0/1 Terminating 0 23h openshift-console downloads-77766fb9b9-d4jcw 0/1 Terminating 0 22h openshift-console downloads-77766fb9b9-hnr8g 0/1 Pending 0 4h2m openshift-console downloads-77766fb9b9-j8qxb 0/1 Terminating 0 22h openshift-console downloads-77766fb9b9-jxq6r 0/1 Pending 0 4h2m openshift-controller-manager-operator openshift-controller-manager-operator-5b95959987-j8s8d 0/1 ContainerCreating 0 23h openshift-controller-manager controller-manager-24xbv 0/1 ContainerCreating 0 39h openshift-controller-manager controller-manager-c8xwj 0/1 ContainerCreating 0 39h openshift-controller-manager controller-manager-gll8s 0/1 ContainerCreating 0 39h openshift-dns-operator dns-operator-8655d97566-b4qqm 0/2 Pending 0 4h2m openshift-dns-operator dns-operator-8655d97566-xbfrp 0/2 Terminating 0 23h openshift-dns dns-default-2b9xt 0/3 ContainerCreating 0 40h openshift-dns dns-default-7fdcj 0/3 ContainerCreating 0 40h openshift-dns dns-default-7sfxp 0/3 ContainerCreating 0 40h openshift-dns dns-default-9h6xq 0/3 ContainerCreating 0 39h openshift-dns dns-default-w7wcj 0/3 ContainerCreating 0 39h openshift-etcd-operator etcd-operator-5fb99985b4-fkb8d 0/1 ContainerCreating 0 23h openshift-image-registry cluster-image-registry-operator-6775554c75-xq9r6 0/1 ContainerCreating 0 23h openshift-image-registry image-pruner-1615420800-wk462 0/1 Pending 0 5h16m openshift-image-registry image-registry-ffd7ccd7d-drt6l 0/1 ContainerCreating 0 22h openshift-ingress-canary ingress-canary-6wbdz 0/1 ContainerCreating 0 39h openshift-ingress-canary ingress-canary-p99m7 0/1 ContainerCreating 0 39h openshift-ingress-operator ingress-operator-6588d5bc87-5qdq8 0/2 ContainerCreating 0 23h openshift-ingress router-default-9966b6b6b-n69db 0/1 CrashLoopBackOff 125 23h openshift-ingress router-default-9966b6b6b-pnnl9 0/1 CrashLoopBackOff 103 22h openshift-insights insights-operator-65cf84578d-bkpnh 0/1 ContainerCreating 1 40h openshift-kube-apiserver-operator kube-apiserver-operator-798b887d75-5bln8 0/1 ContainerCreating 0 23h openshift-kube-controller-manager-operator kube-controller-manager-operator-6b5546947d-lqkrw 0/1 ContainerCreating 0 23h openshift-kube-scheduler-operator openshift-kube-scheduler-operator-7d6d89856c-5dntc 0/1 Pending 0 4h2m openshift-kube-scheduler-operator openshift-kube-scheduler-operator-7d6d89856c-x8bbm 0/1 Terminating 0 23h openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-6f6b6d7f5c-92gvr 0/1 ContainerCreating 0 23h openshift-kube-storage-version-migrator migrator-84d8f6c6dc-672vg 0/1 ContainerCreating 0 22h openshift-machine-api cluster-autoscaler-operator-dc9f865cf-56j9k 0/2 ContainerCreating 0 23h openshift-machine-api cluster-baremetal-operator-664c5999c8-jw5jt 0/2 ContainerCreating 0 23h openshift-machine-api machine-api-operator-689bccd7f5-t25m4 0/2 ContainerCreating 0 23h openshift-machine-config-operator machine-config-controller-6f467668cd-977kr 0/1 Pending 0 4h2m openshift-machine-config-operator machine-config-controller-6f467668cd-ws6z9 0/1 Terminating 0 23h openshift-machine-config-operator machine-config-operator-86c7698f5f-rrwfm 0/1 ContainerCreating 0 23h openshift-marketplace marketplace-operator-5b5dd7bcc9-f2wj2 0/1 ContainerCreating 0 23h openshift-monitoring alertmanager-main-0 0/5 ContainerCreating 0 23h openshift-monitoring alertmanager-main-1 0/5 ContainerCreating 0 22h openshift-monitoring alertmanager-main-2 0/5 ContainerCreating 0 22h openshift-monitoring cluster-monitoring-operator-854c6c68b5-fj7hr 0/2 ContainerCreating 0 23h openshift-monitoring grafana-989865765-vdhrs 0/2 ContainerCreating 0 22h openshift-monitoring kube-state-metrics-5bb8cb9bc5-ppld8 0/3 ContainerCreating 0 22h openshift-monitoring openshift-state-metrics-848bd7d949-pfqxn 0/3 ContainerCreating 0 22h openshift-monitoring prometheus-adapter-84c57d866f-4xp9z 0/1 ContainerCreating 0 22h openshift-monitoring prometheus-adapter-84c57d866f-fs2sz 0/1 ContainerCreating 0 22h openshift-monitoring prometheus-k8s-0 0/7 ContainerCreating 0 22h openshift-monitoring prometheus-k8s-1 0/7 ContainerCreating 0 23h openshift-monitoring prometheus-operator-dbb5d666b-6wdr4 0/2 Terminating 0 23h openshift-monitoring prometheus-operator-dbb5d666b-c84rh 0/2 Pending 0 3h59m openshift-monitoring telemeter-client-78f9657f88-skrg6 0/3 ContainerCreating 0 22h openshift-monitoring thanos-querier-54f8c6c887-4r46v 0/5 ContainerCreating 0 22h openshift-monitoring thanos-querier-54f8c6c887-rqvzv 0/5 ContainerCreating 0 22h openshift-multus multus-admission-controller-8k5vq 0/2 ContainerCreating 0 40h openshift-multus multus-admission-controller-fksxv 0/2 ContainerCreating 0 40h openshift-multus multus-admission-controller-s45pz 0/2 ContainerCreating 0 40h openshift-multus network-metrics-daemon-8gfb9 0/2 ContainerCreating 0 40h openshift-multus network-metrics-daemon-gz6bp 0/2 ContainerCreating 0 40h openshift-multus network-metrics-daemon-kqbwj 0/2 ContainerCreating 0 40h openshift-multus network-metrics-daemon-nkctp 0/2 ContainerCreating 0 39h openshift-multus network-metrics-daemon-tppj9 0/2 ContainerCreating 0 39h openshift-network-diagnostics network-check-source-5ccc7fb9cd-rdnqr 0/1 ContainerCreating 0 22h openshift-network-diagnostics network-check-target-59tzc 0/1 ContainerCreating 0 40h openshift-network-diagnostics network-check-target-5vsbs 0/1 ContainerCreating 0 39h openshift-network-diagnostics network-check-target-rvnsf 0/1 ContainerCreating 0 40h openshift-network-diagnostics network-check-target-vs6tq 0/1 ContainerCreating 0 40h openshift-network-diagnostics network-check-target-zzrfr 0/1 ContainerCreating 0 39h openshift-oauth-apiserver apiserver-6cf594b4b5-6gwrj 0/1 Init:0/1 0 23h openshift-oauth-apiserver apiserver-6cf594b4b5-czcc9 0/1 Init:0/1 0 23h openshift-oauth-apiserver apiserver-6cf594b4b5-gnwmh 0/1 Terminating 0 23h openshift-oauth-apiserver apiserver-6cf594b4b5-rpfp9 0/1 Pending 0 4h2m openshift-operator-lifecycle-manager catalog-operator-9574c4ff5-n478j 0/1 ContainerCreating 0 23h openshift-operator-lifecycle-manager olm-operator-675f7cb4cf-fbdxn 0/1 Pending 0 4h2m openshift-operator-lifecycle-manager olm-operator-675f7cb4cf-svnp5 0/1 Terminating 0 23h openshift-operator-lifecycle-manager packageserver-5567566c55-8jqxs 0/1 ContainerCreating 0 23h openshift-operator-lifecycle-manager packageserver-5567566c55-dpmxw 0/1 Terminating 0 23h openshift-operator-lifecycle-manager packageserver-5567566c55-vhgsl 0/1 Pending 0 4h2m openshift-ovn-kubernetes ovnkube-node-dzm9t 2/3 CrashLoopBackOff 81 23h openshift-ovn-kubernetes ovnkube-node-fh2cw 2/3 CrashLoopBackOff 85 23h openshift-ovn-kubernetes ovnkube-node-mr8vm 2/3 CrashLoopBackOff 86 23h openshift-ovn-kubernetes ovnkube-node-smhgn 2/3 Error 83 23h openshift-ovn-kubernetes ovnkube-node-v95rn 2/3 CrashLoopBackOff 84 23h openshift-service-ca-operator service-ca-operator-6f4cdfb89c-mll8t 0/1 ContainerCreating 0 23h openshift-service-ca service-ca-6c94fc887d-hfns4 0/1 ContainerCreating 0 23h [root@ktania-48-bastion ~]# oc describe pod ovnkube-node-mr8vm -n openshift-ovn-kubernetes Name: ovnkube-node-mr8vm Namespace: openshift-ovn-kubernetes Priority: 2000001000 Priority Class Name: system-node-critical Node: master-0/9.114.99.70 Start Time: Wed, 10 Mar 2021 00:54:22 -0500 Labels: app=ovnkube-node component=network controller-revision-hash=db4c44784 kubernetes.io/os=linux openshift.io/component=network pod-template-generation=1 type=infra Annotations: <none> Status: Running IP: 9.114.99.70 IPs: IP: 9.114.99.70 Controlled By: DaemonSet/ovnkube-node Containers: ovn-controller: Container ID: cri-o://3a3fe21c490972caf3e88385e5c1fad0c11e80cbbb2285007952a128230acbf9 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bc9b1c6f7e550147e8e1b53e668a44037e02912487e990bb1c24771656573a6c Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bc9b1c6f7e550147e8e1b53e668a44037e02912487e990bb1c24771656573a6c Port: <none> Host Port: <none> Command: /bin/bash -c set -e if [[ -f "/env/${K8S_NODE}" ]]; then set -o allexport source "/env/${K8S_NODE}" set +o allexport fi echo "$(date -Iseconds) - starting ovn-controller" exec ovn-controller unix:/var/run/openvswitch/db.sock -vfile:off \ --no-chdir --pidfile=/var/run/ovn/ovn-controller.pid \ -p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt \ -vconsole:"${OVN_LOG_LEVEL}" State: Running Started: Wed, 10 Mar 2021 00:54:54 -0500 Ready: True Restart Count: 0 Requests: cpu: 10m memory: 300Mi Environment: OVN_LOG_LEVEL: info K8S_NODE: (v1:spec.nodeName) Mounts: /env from env-overrides (rw) /etc/openvswitch from etc-openvswitch (rw) /etc/ovn/ from etc-openvswitch (rw) /ovn-ca from ovn-ca (rw) /ovn-cert from ovn-cert (rw) /run/openvswitch from run-openvswitch (rw) /run/ovn/ from run-ovn (rw) /var/lib/openvswitch from var-lib-openvswitch (rw) /var/run/secrets/kubernetes.io/serviceaccount from ovn-kubernetes-node-token-22qmz (ro) kube-rbac-proxy: Container ID: cri-o://71d1d89ecac60830f14ba9ce917ea0f3962b0bb261ef7647ed4b42749f1dd440 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cedc48906e2064b38d982dff88e74e639897039f602928414581fc0a6330d1ed Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cedc48906e2064b38d982dff88e74e639897039f602928414581fc0a6330d1ed Port: 9103/TCP Host Port: 9103/TCP Command: /bin/bash -c #!/bin/bash set -euo pipefail TLS_PK=/etc/pki/tls/metrics-cert/tls.key TLS_CERT=/etc/pki/tls/metrics-cert/tls.crt # As the secret mount is optional we must wait for the files to be present. # The service is created in monitor.yaml and this is created in sdn.yaml. # If it isn't created there is probably an issue so we want to crashloop. retries=0 TS=$(date +%s) WARN_TS=$(( ${TS} + $(( 20 * 60)) )) HAS_LOGGED_INFO=0 log_missing_certs(){ CUR_TS=$(date +%s) if [[ "${CUR_TS}" -gt "WARN_TS" ]]; then echo $(date -Iseconds) WARN: ovn-node-metrics-cert not mounted after 20 minutes. elif [[ "${HAS_LOGGED_INFO}" -eq 0 ]] ; then echo $(date -Iseconds) INFO: ovn-node-metrics-cert not mounted. Waiting one hour. HAS_LOGGED_INFO=1 fi } while [[ ! -f "${TLS_PK}" || ! -f "${TLS_CERT}" ]] ; do log_missing_certs sleep 5 done echo $(date -Iseconds) INFO: ovn-node-metrics-certs mounted, starting kube-rbac-proxy exec /usr/bin/kube-rbac-proxy \ --logtostderr \ --secure-listen-address=:9103 \ --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 \ --upstream=http://127.0.0.1:29103/ \ --tls-private-key-file=${TLS_PK} \ --tls-cert-file=${TLS_CERT} State: Running Started: Wed, 10 Mar 2021 00:54:54 -0500 Ready: True Restart Count: 0 Requests: cpu: 10m memory: 20Mi Environment: <none> Mounts: /etc/pki/tls/metrics-cert from ovn-node-metrics-cert (ro) /var/run/secrets/kubernetes.io/serviceaccount from ovn-kubernetes-node-token-22qmz (ro) ovnkube-node: Container ID: cri-o://38338a9cadfdd5ec01f7185dc9e67295c7bde937d762688cd6a2daa8aef4ba5d Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bc9b1c6f7e550147e8e1b53e668a44037e02912487e990bb1c24771656573a6c Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bc9b1c6f7e550147e8e1b53e668a44037e02912487e990bb1c24771656573a6c Port: 29103/TCP Host Port: 29103/TCP Command: /bin/bash -c set -xe if [[ -f "/env/${K8S_NODE}" ]]; then set -o allexport source "/env/${K8S_NODE}" set +o allexport fi echo "I$(date "+%m%d %H:%M:%S.%N") - waiting for db_ip addresses" cp -f /usr/libexec/cni/ovn-k8s-cni-overlay /cni-bin-dir/ ovn_config_namespace=openshift-ovn-kubernetes echo "I$(date "+%m%d %H:%M:%S.%N") - disable conntrack on geneve port" iptables -t raw -A PREROUTING -p udp --dport 6081 -j NOTRACK iptables -t raw -A OUTPUT -p udp --dport 6081 -j NOTRACK retries=0 while true; do # TODO: change to use '--request-timeout=30s', if https://github.com/kubernetes/kubernetes/issues/49343 is fixed. db_ip=$(timeout 30 kubectl get ep -n ${ovn_config_namespace} ovnkube-db -o jsonpath='{.subsets[0].addresses[0].ip}') if [[ -n "${db_ip}" ]]; then break fi (( retries += 1 )) if [[ "${retries}" -gt 40 ]]; then echo "E$(date "+%m%d %H:%M:%S.%N") - db endpoint never came up" exit 1 fi echo "I$(date "+%m%d %H:%M:%S.%N") - waiting for db endpoint" sleep 5 done echo "I$(date "+%m%d %H:%M:%S.%N") - starting ovnkube-node db_ip ${db_ip}" gateway_mode_flags= # Check to see if ovs is provided by the node. This is only for upgrade from 4.5->4.6 or # openshift-sdn to ovn-kube conversion if grep -q OVNKubernetes /etc/systemd/system/ovs-configuration.service ; then gateway_mode_flags="--gateway-mode local --gateway-interface br-ex" else gateway_mode_flags="--gateway-mode local --gateway-interface none" fi exec /usr/bin/ovnkube --init-node "${K8S_NODE}" \ --nb-address "ssl:9.114.99.105:9641,ssl:9.114.99.134:9641,ssl:9.114.99.70:9641" \ --sb-address "ssl:9.114.99.105:9642,ssl:9.114.99.134:9642,ssl:9.114.99.70:9642" \ --nb-client-privkey /ovn-cert/tls.key \ --nb-client-cert /ovn-cert/tls.crt \ --nb-client-cacert /ovn-ca/ca-bundle.crt \ --nb-cert-common-name "ovn" \ --sb-client-privkey /ovn-cert/tls.key \ --sb-client-cert /ovn-cert/tls.crt \ --sb-client-cacert /ovn-ca/ca-bundle.crt \ --sb-cert-common-name "ovn" \ --config-file=/run/ovnkube-config/ovnkube.conf \ --loglevel "${OVN_KUBE_LOG_LEVEL}" \ --inactivity-probe="${OVN_CONTROLLER_INACTIVITY_PROBE}" \ ${gateway_mode_flags} \ --metrics-bind-address "127.0.0.1:29103" State: Terminated Reason: Error Message: 9] exec(3): stdout: "" I0311 07:29:01.685654 979736 ovs.go:170] exec(3): stderr: "" I0311 07:29:01.685680 979736 ovs.go:166] exec(4): /usr/bin/ovs-ofctl dump-aggregate br-int I0311 07:29:01.690370 979736 ovs.go:169] exec(4): stdout: "NXST_AGGREGATE reply (xid=0x4): packet_count=0 byte_count=0 flow_count=2382\n" I0311 07:29:01.690418 979736 ovs.go:170] exec(4): stderr: "" I0311 07:29:01.690462 979736 ovs.go:166] exec(5): /usr/bin/ovs-vsctl --timeout=15 -- --if-exists del-port br-int k8s-master-0 -- --may-exist add-port br-int ovn-k8s-mp0 -- set interface ovn-k8s-mp0 type=internal mtu_request=1400 external-ids:iface-id=k8s-master-0 I0311 07:29:01.697343 979736 ovs.go:169] exec(5): stdout: "" I0311 07:29:01.697380 979736 ovs.go:170] exec(5): stderr: "" I0311 07:29:01.697404 979736 ovs.go:166] exec(6): /usr/bin/ovs-vsctl --timeout=15 --if-exists get interface ovn-k8s-mp0 mac_in_use I0311 07:29:01.703189 979736 ovs.go:169] exec(6): stdout: "\"02:d7:2b:fa:dd:74\"\n" I0311 07:29:01.703241 979736 ovs.go:170] exec(6): stderr: "" I0311 07:29:01.703282 979736 ovs.go:166] exec(7): /usr/bin/ovs-vsctl --timeout=15 set interface ovn-k8s-mp0 mac=02\:d7\:2b\:fa\:dd\:74 I0311 07:29:01.711008 979736 ovs.go:169] exec(7): stdout: "" I0311 07:29:01.711056 979736 ovs.go:170] exec(7): stderr: "" I0311 07:29:01.759473 979736 gateway_init.go:162] Initializing Gateway Functionality I0311 07:29:01.759824 979736 gateway_localnet.go:184] Node local addresses initialized to: map[10.129.0.2:{10.129.0.0 fffffe00} 127.0.0.1:{127.0.0.0 ff000000} 9.114.99.70:{9.114.96.0 fffffc00} ::1:{::1 ffffffffffffffffffffffffffffffff} fe80::50b2:16ff:fe43:a8a3:{fe80:: ffffffffffffffff0000000000000000} fe80::b47a:e4de:548:45a5:{fe80:: ffffffffffffffff0000000000000000} fe80::d7:2bff:fefa:dd74:{fe80:: ffffffffffffffff0000000000000000}] I0311 07:29:01.760000 979736 helper_linux.go:73] Found default gateway interface env32 9.114.96.1 F0311 07:29:01.760053 979736 ovnkube.go:130] could not find IP addresses: failed to lookup link none: Link not found Exit Code: 1 Started: Thu, 11 Mar 2021 02:29:00 -0500 Finished: Thu, 11 Mar 2021 02:29:01 -0500 Last State: Terminated Reason: Error Message: 9] exec(3): stdout: "" I0311 07:28:09.571730 978518 ovs.go:170] exec(3): stderr: "" I0311 07:28:09.571751 978518 ovs.go:166] exec(4): /usr/bin/ovs-ofctl dump-aggregate br-int I0311 07:28:09.576322 978518 ovs.go:169] exec(4): stdout: "NXST_AGGREGATE reply (xid=0x4): packet_count=0 byte_count=0 flow_count=2382\n" I0311 07:28:09.576395 978518 ovs.go:170] exec(4): stderr: "" I0311 07:28:09.576468 978518 ovs.go:166] exec(5): /usr/bin/ovs-vsctl --timeout=15 -- --if-exists del-port br-int k8s-master-0 -- --may-exist add-port br-int ovn-k8s-mp0 -- set interface ovn-k8s-mp0 type=internal mtu_request=1400 external-ids:iface-id=k8s-master-0 I0311 07:28:09.583200 978518 ovs.go:169] exec(5): stdout: "" I0311 07:28:09.583276 978518 ovs.go:170] exec(5): stderr: "" I0311 07:28:09.583321 978518 ovs.go:166] exec(6): /usr/bin/ovs-vsctl --timeout=15 --if-exists get interface ovn-k8s-mp0 mac_in_use I0311 07:28:09.588979 978518 ovs.go:169] exec(6): stdout: "\"02:d7:2b:fa:dd:74\"\n" I0311 07:28:09.589011 978518 ovs.go:170] exec(6): stderr: "" I0311 07:28:09.589045 978518 ovs.go:166] exec(7): /usr/bin/ovs-vsctl --timeout=15 set interface ovn-k8s-mp0 mac=02\:d7\:2b\:fa\:dd\:74 I0311 07:28:09.594582 978518 ovs.go:169] exec(7): stdout: "" I0311 07:28:09.594609 978518 ovs.go:170] exec(7): stderr: "" I0311 07:28:09.638485 978518 gateway_init.go:162] Initializing Gateway Functionality I0311 07:28:09.638959 978518 gateway_localnet.go:184] Node local addresses initialized to: map[10.129.0.2:{10.129.0.0 fffffe00} 127.0.0.1:{127.0.0.0 ff000000} 9.114.99.70:{9.114.96.0 fffffc00} ::1:{::1 ffffffffffffffffffffffffffffffff} fe80::50b2:16ff:fe43:a8a3:{fe80:: ffffffffffffffff0000000000000000} fe80::b47a:e4de:548:45a5:{fe80:: ffffffffffffffff0000000000000000} fe80::d7:2bff:fefa:dd74:{fe80:: ffffffffffffffff0000000000000000}] I0311 07:28:09.639157 978518 helper_linux.go:73] Found default gateway interface env32 9.114.96.1 F0311 07:28:09.639228 978518 ovnkube.go:130] could not find IP addresses: failed to lookup link none: Link not found Exit Code: 1 Started: Thu, 11 Mar 2021 02:28:08 -0500 Finished: Thu, 11 Mar 2021 02:28:09 -0500 Ready: False Restart Count: 91 Requests: cpu: 10m memory: 300Mi Readiness: exec [test -f /etc/cni/net.d/10-ovn-kubernetes.conf] delay=5s timeout=1s period=5s #success=1 #failure=3 Environment: KUBERNETES_SERVICE_PORT: 6443 KUBERNETES_SERVICE_HOST: api-int.ktania-48.redhat.com OVN_CONTROLLER_INACTIVITY_PROBE: 30000 OVN_KUBE_LOG_LEVEL: 4 K8S_NODE: (v1:spec.nodeName) Mounts: /cni-bin-dir from host-cni-bin (rw) /env from env-overrides (rw) /etc/cni/net.d from host-cni-netd (rw) /etc/openvswitch from etc-openvswitch (rw) /etc/ovn/ from etc-openvswitch (rw) /etc/systemd/system from systemd-units (ro) /host from host-slash (ro) /ovn-ca from ovn-ca (rw) /ovn-cert from ovn-cert (rw) /run/netns from host-run-netns (ro) /run/openvswitch from run-openvswitch (rw) /run/ovn-kubernetes/ from host-run-ovn-kubernetes (rw) /run/ovn/ from run-ovn (rw) /run/ovnkube-config/ from ovnkube-config (rw) /var/lib/cni/networks/ovn-k8s-cni-overlay from host-var-lib-cni-networks-ovn-kubernetes (rw) /var/lib/openvswitch from var-lib-openvswitch (rw) /var/run/secrets/kubernetes.io/serviceaccount from ovn-kubernetes-node-token-22qmz (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: systemd-units: Type: HostPath (bare host directory volume) Path: /etc/systemd/system HostPathType: host-slash: Type: HostPath (bare host directory volume) Path: / HostPathType: host-run-netns: Type: HostPath (bare host directory volume) Path: /run/netns HostPathType: var-lib-openvswitch: Type: HostPath (bare host directory volume) Path: /var/lib/openvswitch/data HostPathType: etc-openvswitch: Type: HostPath (bare host directory volume) Path: /var/lib/openvswitch/etc HostPathType: run-openvswitch: Type: HostPath (bare host directory volume) Path: /var/run/openvswitch HostPathType: run-ovn: Type: HostPath (bare host directory volume) Path: /var/run/ovn HostPathType: host-run-ovn-kubernetes: Type: HostPath (bare host directory volume) Path: /run/ovn-kubernetes HostPathType: host-cni-bin: Type: HostPath (bare host directory volume) Path: /var/lib/cni/bin HostPathType: host-cni-netd: Type: HostPath (bare host directory volume) Path: /var/run/multus/cni/net.d HostPathType: host-var-lib-cni-networks-ovn-kubernetes: Type: HostPath (bare host directory volume) Path: /var/lib/cni/networks/ovn-k8s-cni-overlay HostPathType: ovnkube-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: ovnkube-config Optional: false env-overrides: Type: ConfigMap (a volume populated by a ConfigMap) Name: env-overrides Optional: true ovn-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: ovn-ca Optional: false ovn-cert: Type: Secret (a volume populated by a Secret) SecretName: ovn-cert Optional: false ovn-node-metrics-cert: Type: Secret (a volume populated by a Secret) SecretName: ovn-node-metrics-cert Optional: true ovn-kubernetes-node-token-22qmz: Type: Secret (a volume populated by a Secret) SecretName: ovn-kubernetes-node-token-22qmz Optional: false QoS Class: Burstable Node-Selectors: beta.kubernetes.io/os=linux Tolerations: op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 12s (x4 over 109s) kubelet Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bc9b1c6f7e550147e8e1b53e668a44037e02912487e990bb1c24771656573a6c" already present on machine Normal Created 12s (x4 over 108s) kubelet Created container ovnkube-node Normal Started 12s (x4 over 108s) kubelet Started container ovnkube-node Warning BackOff 10s (x8 over 107s) kubelet Back-off restarting failed container
pliu any updates?
In 4.8.0, there are some ovn-kube changes that break the current SDN migration approach. We need to refactor the migration solution in a way. Here's the PR I proposed https://github.com/openshift/cluster-network-operator/pull/763. After this PR merged, there will be a new operation procedure for 4.8, which I think will fix this bz.
Part of the test is to validate the migration of OpenshiftSDN to OVNKube. Installation with OpenshiftSDN was successful. However migration failed with BZ. So currently this BZ is blocking a regressions validation story. Add this info here. I can see activity is happening with the PR.
The IBM Z team has also reported observation of this issue on the s390x platform. Should we escalate this bug to a blocker.
This BZ should be a blocker.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438