Bug 1977330 - Single stack external gateway makes the pod not starting with dual stack clusters
Summary: Single stack external gateway makes the pod not starting with dual stack clus...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Federico Paolinelli
QA Contact: Weibin Liang
URL:
Whiteboard:
Depends On:
Blocks: 1977279 1986708
TreeView+ depends on / blocked
 
Reported: 2021-06-29 13:26 UTC by Federico Paolinelli
Modified: 2021-10-18 17:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1986708 (view as bug list)
Environment:
Last Closed: 2021-10-18 17:36:56 UTC
Target Upstream Version:
Embargoed:
npinaeva: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 600 0 None closed Bug 1973286: Merge 2021-07-06 2021-07-28 07:24:50 UTC
Github ovn-org ovn-kubernetes pull 2293 0 None open Add routes for pod: fail only after checking all the gw addresses / ips 2021-06-29 14:18:39 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:37:20 UTC

Description Federico Paolinelli 2021-06-29 13:26:07 UTC
Description of problem:

Pod gets stuck in ContainerCreating when a gwip is single stack (i.e. ipv4 only) in a dual stack cluster

Version-Release number of selected component (if applicable):


How reproducible:

Always

Steps to Reproduce:
On a dual stack cluster:

1. Create a gw pod with annotations like:

      k8s.ovn.org/routing-namespaces: test
      k8s.ovn.org/routing-network: foo
      k8s.v1.cni.cncf.io/network-status: '[{"name":"foo","interface":"net1","ips":["172.19.0.5"],"mac":"01:23:45:67:89:10"}]'

2. Create a pod in the test namespace


Actual results:
Pod gets stuck in "container creating"

Expected results:
Pod goes running

Additional info:

Comment 1 zhaozhanqi 2021-07-29 08:08:23 UTC
yprokule hi, any chance you can help verify this issue on dual stack cluster?

Comment 3 Weibin Liang 2021-08-12 18:53:21 UTC
@Federico Still see the same error in 4.9.0-0.nightly-2021-08-07-175228

[root@ocp-edge50 auth]# oc get pod
NAME                READY   STATUS              RESTARTS   AGE
runtimeconfig-pod   0/1     ContainerCreating   0          12s
[root@ocp-edge50 auth]# oc describe pod runtimeconfig-pod
Name:         runtimeconfig-pod
Namespace:    test
Priority:     0
Node:         worker-0-0.ocp-edge-cluster-0.qe.lab.redhat.com/192.168.123.146
Start Time:   Thu, 12 Aug 2021 21:05:01 +0300
Labels:       <none>
Annotations:  k8s.ovn.org/pod-networks:
                {"default":{"ip_addresses":["10.128.2.13/23","fd01:0:0:5::d/64"],"mac_address":"0a:58:0a:80:02:0d","gateway_ips":["10.128.2.1","fd01:0:0:5...
              k8s.ovn.org/routing-namespaces: test
              k8s.ovn.org/routing-network: runtimeconfig-def
              k8s.v1.cni.cncf.io/networks: [ { "name": "runtimeconfig-def", "ips": [ "192.168.22.2/24" ], "mac": "CA:FE:C0:FF:EE:00" } ]
              openshift.io/scc: privileged
Status:       Pending
IP:           
IPs:          <none>
Containers:
  runtimeconfig-pod:
    Container ID:   
    Image:          quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9p74k (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-9p74k:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   Scheduled               22s                default-scheduler  Successfully assigned test/runtimeconfig-pod to worker-0-0.ocp-edge-cluster-0.qe.lab.redhat.com
  Warning  ErrorAddingLogicalPort  23s (x3 over 23s)  controlplane       failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod runtimeconfig-pod: unexpected end of JSON input
  Warning  FailedCreatePodSandBox  2s                 kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_runtimeconfig-pod_test_91a57da8-8183-4440-b768-c4b0e73fdad0_0(9953452b3b7ee7e5319193fc79c775e9965a96272c863789f18aa9fb06d34e4d): error adding pod test_runtimeconfig-pod to CNI network "multus-cni-network": [test/runtimeconfig-pod:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[test/runtimeconfig-pod 9953452b3b7ee7e5319193fc79c775e9965a96272c863789f18aa9fb06d34e4d] [test/runtimeconfig-pod 9953452b3b7ee7e5319193fc79c775e9965a96272c863789f18aa9fb06d34e4d] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:80:02:0d [10.128.2.13/23 fd01:0:0:5::d/64]
'
[root@ocp-edge50 auth]# oc project
Using project "test" on server "https://api.ocp-edge-cluster-0.qe.lab.redhat.com:6443".
[root@ocp-edge50 auth]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-08-07-175228   True        False         129m    Cluster version is 4.9.0-0.nightly-2021-08-07-175228
[root@ocp-edge50 auth]#

Comment 4 zhaozhanqi 2021-08-17 01:43:17 UTC
@Federico Move this to assign, could you help check this?

Comment 7 Federico Paolinelli 2021-08-30 12:38:02 UTC
@zzhao I think it's failing because:

  Warning  ErrorAddingLogicalPort  23s (x3 over 23s)  controlplane       failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod runtimeconfig-pod: unexpected end of JSON input

This is because your pod doesn't have the network status annotation, and the failure is legit. The `k8s.ovn.org/routing-network` annotation must refer a secondary network. If you don't have SR-IOV, you can either try with macvlan, or even easier, you can forge it as
in https://bugzilla.redhat.com/show_bug.cgi?id=1977330#c2 :

1. Create a gw pod with annotations like:

      k8s.ovn.org/routing-namespaces: test
      k8s.ovn.org/routing-network: foo
      k8s.v1.cni.cncf.io/network-status: '[{"name":"foo","interface":"net1","ips":["172.19.0.5"],"mac":"01:23:45:67:89:10"}]'

OVNK cares only about the routing network and the ip, not if it binds to a real secondary network.

Comment 8 Weibin Liang 2021-08-31 19:25:24 UTC
Fellow steps in comment 7, testing passed in dual stack using 4.9.0-0.nightly-2021-08-31-081832

[root@ocp-edge50 ~]# cat testpod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: testpod1
  annotations:
    k8s.ovn.org/routing-namespaces: test
    k8s.ovn.org/routing-network: multus-bridge
    k8s.v1.cni.cncf.io/network-status: '[{"name":"foo","interface":"net1","ips":["172.19.0.5"],"mac":"01:23:45:67:89:10"}]'
spec:
  nodeName:
  containers:
  - name: testapp1
    image: quay.io/openshifttest/hello-sdn@sha256:d5785550cf77b7932b090fcd1a2625472912fb3189d5973f177a5a2c347a1f95
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
[root@ocp-edge50 ~]# oc new-project test
[root@ocp-edge50 ~]# oc create -f testpod.yaml 
pod/testpod1 created
[root@ocp-edge50 ~]# oc get pod
NAME                                 READY   STATUS    RESTARTS   AGE
testpod1                             1/1     Running   0          113s

Comment 11 errata-xmlrpc 2021-10-18 17:36:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.