Description of problem: When creating egress router, the pod failed to start and the following error appears in the node log: Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.272902 23785 cni.go:260] Error adding network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.272939 23785 cni.go:228] Error while adding to cni network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.693932 23785 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69" network for pod "egress-1": NetworkPlugin cni failed to set up pod "egress-1_bmengp1" network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.694035 23785 kuberuntime_sandbox.go:56] CreatePodSandbox for pod "egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69" network for pod "egress-1": NetworkPlugin cni failed to set up pod "egress-1_bmengp1" network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.694061 23785 kuberuntime_manager.go:646] createPodSandbox for pod "egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69" network for pod "egress-1": NetworkPlugin cni failed to set up pod "egress-1_bmengp1" network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.694105 23785 pod_workers.go:186] Error syncing pod b664fcf3-993e-11e8-836b-5254005ce8d4 ("egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)"), skipping: failed to "CreatePodSandbox" for "egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)" with CreatePodSandboxError: "CreatePodSandbox for pod \"egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)\" failed: rpc error: code = Unknown desc = failed to set up sandbox container \"5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69\" network for pod \"egress-1\": NetworkPlugin cni failed to set up pod \"egress-1_bmengp1\" network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists" Aug 06 14:14:06 ocp311-node.bmeng.local atomic-openshift-node[23785]: W0806 14:14:06.182346 23785 docker_sandbox.go:372] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "egress-1_bmengp1": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69" Aug 06 14:14:06 ocp311-node.bmeng.local atomic-openshift-node[23785]: I0806 14:14:06.192920 23785 kubelet.go:1869] SyncLoop (PLEG): "egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)", event: &pleg.PodLifecycleEvent{ID:"b664fcf3-993e-11e8-836b-5254005ce8d4", Type:"ContainerDied", Data:"5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69"} Aug 06 14:14:06 ocp311-node.bmeng.local atomic-openshift-node[23785]: W0806 14:14:06.193004 23785 pod_container_deletor.go:75] Container "5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69" not found in pod's containers Aug 06 14:14:06 ocp311-node.bmeng.local atomic-openshift-node[23785]: I0806 14:14:06.493566 23785 kuberuntime_manager.go:403] No ready sandbox for pod "egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)" can be found. Need to start a new one Version-Release number of selected component (if applicable): v3.11.0-0.11.0 How reproducible: always Steps to Reproduce: 1. Create project via user and give the privileged scc to the user's service account 2. Create egress router pod with the template below 3. Actual results: The egress router pod keeps in ContainerCreating status and cannot be created successfully. Expected results: The pod creation succeeded. Additional info: $ cat egressrouter.yaml apiVersion: v1 kind: Pod metadata: name: egress-1 labels: name: egress-1 annotations: pod.network.openshift.io/assign-macvlan: "true" spec: containers: - name: egress-router image: $registry/openshift3/ose-egress-router:v3.11 securityContext: privileged: true env: - name: EGRESS_SOURCE value: 10.66.140.200 - name: EGRESS_GATEWAY value: 10.66.141.254 - name: EGRESS_DESTINATION value: 61.135.218.24 - name: EGRESS_ROUTER_MODE value: legacy
@cdc @dcbw "file exists" error suggests that the added route is already present. I think we need to ignore the error in this case or alternatively add the route only if it doesn't exists.
Rajat found what is probably the root cause - https://github.com/openshift/origin/pull/20115 Either way, the answer is to ignore "already exists" errors - working on a patch now.
Fix is in https://github.com/openshift/origin/pull/20601 Bo, can you test this without that PR merging? I wasn't able to reproduce the fix locally.
(In reply to Casey Callendrello from comment #8) > Fix is in https://github.com/openshift/origin/pull/20601 > > Bo, can you test this without that PR merging? I wasn't able to reproduce > the fix locally. I have tried on latest OCP 3.11 build, after rebuild the sdn-cni-plugin with the fix, rename it to openshift-sdn and replace the one under /opt/cni/bin/. The egress router can be created.
Fix is merged.
Tested on build v3.11.0-0.17.0, issue has been fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652