Bug 1977330
| Summary: | Single stack external gateway makes the pod not starting with dual stack clusters | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Federico Paolinelli <fpaoline> | |
| Component: | Networking | Assignee: | Federico Paolinelli <fpaoline> | |
| Networking sub component: | ovn-kubernetes | QA Contact: | Weibin Liang <weliang> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | medium | CC: | anbhat, mcornea, mifiedle, npinaeva, rbrattai, yprokule, zzhao | |
| Version: | 4.9 | Flags: | npinaeva:
needinfo+
|
|
| Target Milestone: | --- | |||
| Target Release: | 4.9.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1986708 (view as bug list) | Environment: | ||
| Last Closed: | 2021-10-18 17:36:56 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1977279, 1986708 | |||
yprokule hi, any chance you can help verify this issue on dual stack cluster? @Federico Still see the same error in 4.9.0-0.nightly-2021-08-07-175228
[root@ocp-edge50 auth]# oc get pod
NAME READY STATUS RESTARTS AGE
runtimeconfig-pod 0/1 ContainerCreating 0 12s
[root@ocp-edge50 auth]# oc describe pod runtimeconfig-pod
Name: runtimeconfig-pod
Namespace: test
Priority: 0
Node: worker-0-0.ocp-edge-cluster-0.qe.lab.redhat.com/192.168.123.146
Start Time: Thu, 12 Aug 2021 21:05:01 +0300
Labels: <none>
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["10.128.2.13/23","fd01:0:0:5::d/64"],"mac_address":"0a:58:0a:80:02:0d","gateway_ips":["10.128.2.1","fd01:0:0:5...
k8s.ovn.org/routing-namespaces: test
k8s.ovn.org/routing-network: runtimeconfig-def
k8s.v1.cni.cncf.io/networks: [ { "name": "runtimeconfig-def", "ips": [ "192.168.22.2/24" ], "mac": "CA:FE:C0:FF:EE:00" } ]
openshift.io/scc: privileged
Status: Pending
IP:
IPs: <none>
Containers:
runtimeconfig-pod:
Container ID:
Image: quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9p74k (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-9p74k:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 22s default-scheduler Successfully assigned test/runtimeconfig-pod to worker-0-0.ocp-edge-cluster-0.qe.lab.redhat.com
Warning ErrorAddingLogicalPort 23s (x3 over 23s) controlplane failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod runtimeconfig-pod: unexpected end of JSON input
Warning FailedCreatePodSandBox 2s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_runtimeconfig-pod_test_91a57da8-8183-4440-b768-c4b0e73fdad0_0(9953452b3b7ee7e5319193fc79c775e9965a96272c863789f18aa9fb06d34e4d): error adding pod test_runtimeconfig-pod to CNI network "multus-cni-network": [test/runtimeconfig-pod:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[test/runtimeconfig-pod 9953452b3b7ee7e5319193fc79c775e9965a96272c863789f18aa9fb06d34e4d] [test/runtimeconfig-pod 9953452b3b7ee7e5319193fc79c775e9965a96272c863789f18aa9fb06d34e4d] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:80:02:0d [10.128.2.13/23 fd01:0:0:5::d/64]
'
[root@ocp-edge50 auth]# oc project
Using project "test" on server "https://api.ocp-edge-cluster-0.qe.lab.redhat.com:6443".
[root@ocp-edge50 auth]# oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.nightly-2021-08-07-175228 True False 129m Cluster version is 4.9.0-0.nightly-2021-08-07-175228
[root@ocp-edge50 auth]#
@Federico Move this to assign, could you help check this? @zzhao I think it's failing because: Warning ErrorAddingLogicalPort 23s (x3 over 23s) controlplane failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod runtimeconfig-pod: unexpected end of JSON input This is because your pod doesn't have the network status annotation, and the failure is legit. The `k8s.ovn.org/routing-network` annotation must refer a secondary network. If you don't have SR-IOV, you can either try with macvlan, or even easier, you can forge it as in https://bugzilla.redhat.com/show_bug.cgi?id=1977330#c2 : 1. Create a gw pod with annotations like: k8s.ovn.org/routing-namespaces: test k8s.ovn.org/routing-network: foo k8s.v1.cni.cncf.io/network-status: '[{"name":"foo","interface":"net1","ips":["172.19.0.5"],"mac":"01:23:45:67:89:10"}]' OVNK cares only about the routing network and the ip, not if it binds to a real secondary network. Fellow steps in comment 7, testing passed in dual stack using 4.9.0-0.nightly-2021-08-31-081832 [root@ocp-edge50 ~]# cat testpod.yaml apiVersion: v1 kind: Pod metadata: name: testpod1 annotations: k8s.ovn.org/routing-namespaces: test k8s.ovn.org/routing-network: multus-bridge k8s.v1.cni.cncf.io/network-status: '[{"name":"foo","interface":"net1","ips":["172.19.0.5"],"mac":"01:23:45:67:89:10"}]' spec: nodeName: containers: - name: testapp1 image: quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95 imagePullPolicy: IfNotPresent command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do sleep 300000; done;" ] [root@ocp-edge50 ~]# oc new-project test [root@ocp-edge50 ~]# oc create -f testpod.yaml pod/testpod1 created [root@ocp-edge50 ~]# oc get pod NAME READY STATUS RESTARTS AGE testpod1 1/1 Running 0 113s Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |
Description of problem: Pod gets stuck in ContainerCreating when a gwip is single stack (i.e. ipv4 only) in a dual stack cluster Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: On a dual stack cluster: 1. Create a gw pod with annotations like: k8s.ovn.org/routing-namespaces: test k8s.ovn.org/routing-network: foo k8s.v1.cni.cncf.io/network-status: '[{"name":"foo","interface":"net1","ips":["172.19.0.5"],"mac":"01:23:45:67:89:10"}]' 2. Create a pod in the test namespace Actual results: Pod gets stuck in "container creating" Expected results: Pod goes running Additional info: