Description of problem: Pod gets stuck in ContainerCreating when a gwip is single stack (i.e. ipv4 only) in a dual stack cluster Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: On a dual stack cluster: 1. Create a gw pod with annotations like: k8s.ovn.org/routing-namespaces: test k8s.ovn.org/routing-network: foo k8s.v1.cni.cncf.io/network-status: '[{"name":"foo","interface":"net1","ips":["172.19.0.5"],"mac":"01:23:45:67:89:10"}]' 2. Create a pod in the test namespace Actual results: Pod gets stuck in "container creating" Expected results: Pod goes running Additional info:
yprokule hi, any chance you can help verify this issue on dual stack cluster?
@Federico Still see the same error in 4.9.0-0.nightly-2021-08-07-175228 [root@ocp-edge50 auth]# oc get pod NAME READY STATUS RESTARTS AGE runtimeconfig-pod 0/1 ContainerCreating 0 12s [root@ocp-edge50 auth]# oc describe pod runtimeconfig-pod Name: runtimeconfig-pod Namespace: test Priority: 0 Node: worker-0-0.ocp-edge-cluster-0.qe.lab.redhat.com/192.168.123.146 Start Time: Thu, 12 Aug 2021 21:05:01 +0300 Labels: <none> Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.128.2.13/23","fd01:0:0:5::d/64"],"mac_address":"0a:58:0a:80:02:0d","gateway_ips":["10.128.2.1","fd01:0:0:5... k8s.ovn.org/routing-namespaces: test k8s.ovn.org/routing-network: runtimeconfig-def k8s.v1.cni.cncf.io/networks: [ { "name": "runtimeconfig-def", "ips": [ "192.168.22.2/24" ], "mac": "CA:FE:C0:FF:EE:00" } ] openshift.io/scc: privileged Status: Pending IP: IPs: <none> Containers: runtimeconfig-pod: Container ID: Image: quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95 Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9p74k (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-9p74k: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 22s default-scheduler Successfully assigned test/runtimeconfig-pod to worker-0-0.ocp-edge-cluster-0.qe.lab.redhat.com Warning ErrorAddingLogicalPort 23s (x3 over 23s) controlplane failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod runtimeconfig-pod: unexpected end of JSON input Warning FailedCreatePodSandBox 2s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_runtimeconfig-pod_test_91a57da8-8183-4440-b768-c4b0e73fdad0_0(9953452b3b7ee7e5319193fc79c775e9965a96272c863789f18aa9fb06d34e4d): error adding pod test_runtimeconfig-pod to CNI network "multus-cni-network": [test/runtimeconfig-pod:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[test/runtimeconfig-pod 9953452b3b7ee7e5319193fc79c775e9965a96272c863789f18aa9fb06d34e4d] [test/runtimeconfig-pod 9953452b3b7ee7e5319193fc79c775e9965a96272c863789f18aa9fb06d34e4d] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:80:02:0d [10.128.2.13/23 fd01:0:0:5::d/64] ' [root@ocp-edge50 auth]# oc project Using project "test" on server "https://api.ocp-edge-cluster-0.qe.lab.redhat.com:6443". [root@ocp-edge50 auth]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-08-07-175228 True False 129m Cluster version is 4.9.0-0.nightly-2021-08-07-175228 [root@ocp-edge50 auth]#
@Federico Move this to assign, could you help check this?
@zzhao I think it's failing because: Warning ErrorAddingLogicalPort 23s (x3 over 23s) controlplane failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod runtimeconfig-pod: unexpected end of JSON input This is because your pod doesn't have the network status annotation, and the failure is legit. The `k8s.ovn.org/routing-network` annotation must refer a secondary network. If you don't have SR-IOV, you can either try with macvlan, or even easier, you can forge it as in https://bugzilla.redhat.com/show_bug.cgi?id=1977330#c2 : 1. Create a gw pod with annotations like: k8s.ovn.org/routing-namespaces: test k8s.ovn.org/routing-network: foo k8s.v1.cni.cncf.io/network-status: '[{"name":"foo","interface":"net1","ips":["172.19.0.5"],"mac":"01:23:45:67:89:10"}]' OVNK cares only about the routing network and the ip, not if it binds to a real secondary network.
Fellow steps in comment 7, testing passed in dual stack using 4.9.0-0.nightly-2021-08-31-081832 [root@ocp-edge50 ~]# cat testpod.yaml apiVersion: v1 kind: Pod metadata: name: testpod1 annotations: k8s.ovn.org/routing-namespaces: test k8s.ovn.org/routing-network: multus-bridge k8s.v1.cni.cncf.io/network-status: '[{"name":"foo","interface":"net1","ips":["172.19.0.5"],"mac":"01:23:45:67:89:10"}]' spec: nodeName: containers: - name: testapp1 image: quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95 imagePullPolicy: IfNotPresent command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 300000; done;" ] [root@ocp-edge50 ~]# oc new-project test [root@ocp-edge50 ~]# oc create -f testpod.yaml pod/testpod1 created [root@ocp-edge50 ~]# oc get pod NAME READY STATUS RESTARTS AGE testpod1 1/1 Running 0 113s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759