Bug 1612702 - Failed to create egress router pod due to add route failed when enabling macvlan
Summary: Failed to create egress router pod due to add route failed when enabling macvlan
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.11.0
Assignee: Casey Callendrello
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-06 06:46 UTC by Meng Bo
Modified: 2018-10-11 07:23 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-11 07:23:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:23:49 UTC

Description Meng Bo 2018-08-06 06:46:14 UTC
Description of problem:
When creating egress router, the pod failed to start and the following error appears in the node log:
Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.272902   23785 cni.go:260] Error adding network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists
Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.272939   23785 cni.go:228] Error while adding to cni network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists
Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.693932   23785 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69" network for pod "egress-1": NetworkPlugin cni failed to set up pod "egress-1_bmengp1" network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists
Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.694035   23785 kuberuntime_sandbox.go:56] CreatePodSandbox for pod "egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69" network for pod "egress-1": NetworkPlugin cni failed to set up pod "egress-1_bmengp1" network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists
Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.694061   23785 kuberuntime_manager.go:646] createPodSandbox for pod "egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)" failed: rpc error: code = Unknown desc = failed to set up sandbox container "5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69" network for pod "egress-1": NetworkPlugin cni failed to set up pod "egress-1_bmengp1" network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists
Aug 06 14:14:05 ocp311-node.bmeng.local atomic-openshift-node[23785]: E0806 14:14:05.694105   23785 pod_workers.go:186] Error syncing pod b664fcf3-993e-11e8-836b-5254005ce8d4 ("egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)"), skipping: failed to "CreatePodSandbox" for "egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)" with CreatePodSandboxError: "CreatePodSandbox for pod \"egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)\" failed: rpc error: code = Unknown desc = failed to set up sandbox container \"5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69\" network for pod \"egress-1\": NetworkPlugin cni failed to set up pod \"egress-1_bmengp1\" network: failed to add route to dst: 10.66.140.77/32 via SDN: file exists"
Aug 06 14:14:06 ocp311-node.bmeng.local atomic-openshift-node[23785]: W0806 14:14:06.182346   23785 docker_sandbox.go:372] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "egress-1_bmengp1": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69"
Aug 06 14:14:06 ocp311-node.bmeng.local atomic-openshift-node[23785]: I0806 14:14:06.192920   23785 kubelet.go:1869] SyncLoop (PLEG): "egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)", event: &pleg.PodLifecycleEvent{ID:"b664fcf3-993e-11e8-836b-5254005ce8d4", Type:"ContainerDied", Data:"5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69"}
Aug 06 14:14:06 ocp311-node.bmeng.local atomic-openshift-node[23785]: W0806 14:14:06.193004   23785 pod_container_deletor.go:75] Container "5ac751d27f6b2376a8ff44c6c3195239af77272a348c194048a1714d06135c69" not found in pod's containers
Aug 06 14:14:06 ocp311-node.bmeng.local atomic-openshift-node[23785]: I0806 14:14:06.493566   23785 kuberuntime_manager.go:403] No ready sandbox for pod "egress-1_bmengp1(b664fcf3-993e-11e8-836b-5254005ce8d4)" can be found. Need to start a new one



Version-Release number of selected component (if applicable):
v3.11.0-0.11.0

How reproducible:
always

Steps to Reproduce:
1. Create project via user and give the privileged scc to the user's service account
2. Create egress router pod with the template below
3.

Actual results:
The egress router pod keeps in ContainerCreating status and cannot be created successfully.

Expected results:
The pod creation succeeded.

Additional info:
$ cat egressrouter.yaml
apiVersion: v1
kind: Pod
metadata:
  name: egress-1
  labels:
    name: egress-1
  annotations:
    pod.network.openshift.io/assign-macvlan: "true"
spec:
  containers:
  - name: egress-router
    image: $registry/openshift3/ose-egress-router:v3.11
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE
      value: 10.66.140.200
    - name: EGRESS_GATEWAY
      value: 10.66.141.254
    - name: EGRESS_DESTINATION
      value: 61.135.218.24
    - name: EGRESS_ROUTER_MODE
      value: legacy

Comment 6 Ravi Sankar 2018-08-10 00:32:35 UTC
@cdc @dcbw
"file exists" error suggests that the added route is already present. I think we need to ignore the error in this case or alternatively add the route only if it doesn't exists.

Comment 7 Casey Callendrello 2018-08-10 10:18:43 UTC
Rajat found what is probably the root cause - https://github.com/openshift/origin/pull/20115

Either way, the answer is to ignore "already exists" errors - working on a patch now.

Comment 8 Casey Callendrello 2018-08-10 13:17:41 UTC
Fix is in https://github.com/openshift/origin/pull/20601

Bo, can you test this without that PR merging? I wasn't able to reproduce the fix locally.

Comment 9 Meng Bo 2018-08-13 03:30:16 UTC
(In reply to Casey Callendrello from comment #8)
> Fix is in https://github.com/openshift/origin/pull/20601
> 
> Bo, can you test this without that PR merging? I wasn't able to reproduce
> the fix locally.

I have tried on latest OCP 3.11 build, after rebuild the sdn-cni-plugin with the fix, rename it to openshift-sdn and replace the one under /opt/cni/bin/. The egress router can be created.

Comment 10 Casey Callendrello 2018-08-15 12:41:41 UTC
Fix is merged.

Comment 11 Meng Bo 2018-08-20 06:00:26 UTC
Tested on build v3.11.0-0.17.0, issue has been fixed.

Comment 13 errata-xmlrpc 2018-10-11 07:23:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.