Description of problem: Follow https://docs.openshift.com/container-platform/3.11/admin_guide/managing_networking.html#admin-guide-deploying-an-egress-router-pod to deploy egress router pod in v4.6/SDN, but egress router pod is stuck in Init:0/1 state Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-06-30-112422 How reproducible: Always Steps to Reproduce: [weliang@weliang FILE]$ oc get project | grep sdn openshift-sdn [weliang@weliang FILE]$ cat egressrouterpod.yaml apiVersion: v1 kind: Pod metadata: name: egress-redirect-pod labels: name: egress-redirect-pod annotations: pod.network.openshift.io/assign-macvlan: "true" spec: initContainers: - name: egress-router image: registry.redhat.io/openshift4/ose-egress-router imagePullPolicy: IfNotPresent securityContext: privileged: true env: - name: EGRESS_SOURCE value: 139.178.76.12 - name: EGRESS_GATEWAY value: 139.178.76.1 - name: EGRESS_DESTINATION value: 172.217.7.206 - name: EGRESS_ROUTER_MODE value: init containers: - name: egressrouter-redirect image: registry.redhat.io/openshift4/ose-egress-router imagePullPolicy: IfNotPresent [weliang@weliang FILE]$ oc create -f egressrouterpod.yaml [weliang@weliang FILE]$ oc get pod NAME READY STATUS RESTARTS AGE egress-redirect-pod 0/1 Init:0/1 0 3s [weliang@weliang FILE]$ oc describe pod egress-redirect-pod Name: egress-redirect-pod Namespace: test Priority: 0 Node: ip-10-0-187-180.us-east-2.compute.internal/10.0.187.180 Start Time: Tue, 30 Jun 2020 13:41:43 -0400 Labels: name=egress-redirect-pod Annotations: openshift.io/scc: node-exporter pod.network.openshift.io/assign-macvlan: true Status: Pending IP: IPs: <none> Init Containers: egress-router: Container ID: Image: registry.redhat.io/openshift4/ose-egress-router Image ID: Port: <none> Host Port: <none> State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Environment: EGRESS_SOURCE: 139.178.76.12 EGRESS_GATEWAY: 139.178.76.1 EGRESS_DESTINATION: 172.217.7.206 EGRESS_ROUTER_MODE: init Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-m5dmv (ro) Containers: egressrouter-redirect: Container ID: Image: registry.redhat.io/openshift4/ose-egress-router Image ID: Port: <none> Host Port: <none> State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-m5dmv (ro) Conditions: Type Status Initialized False Ready False ContainersReady False PodScheduled True Volumes: default-token-m5dmv: Type: Secret (a volume populated by a Secret) SecretName: default-token-m5dmv Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 11s default-scheduler Successfully assigned test/egress-redirect-pod to ip-10-0-187-180.us-east-2.compute.internal Warning FailedCreatePodSandBox 10s kubelet, ip-10-0-187-180.us-east-2.compute.internal Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_egress-redirect-pod_test_d22d4544-18ec-43bf-95c4-70f21e5246fe_0(4d0cbd2631875f117df10c0a8501b385919b1162f40382e75b8c87b8bb3cf7ca): Multus: [test/egress-redirect-pod]: error adding container to network "openshift-sdn": delegateAdd: error invoking confAdd - "openshift-sdn": error in getting result from AddNetwork: CNI request failed with status 400: 'could not open netns "/var/run/netns/11cb808b-c141-40e9-86fe-c74643b481aa": unknown FS magic on "/var/run/netns/11cb808b-c141-40e9-86fe-c74643b481aa": 1021994 ' [weliang@weliang ~]$ oc logs -c egress-router egress-redirect-pod Error from server (BadRequest): container "egress-router" in pod "egress-redirect-pod" is waiting to start: PodInitializing [weliang@weliang ~]$ Actual results: egress-redirect-pod 0/1 Init:0/1 Expected results: egress-redirect-pod 1/1 Running Additional info: Deploying no egressrouter pod will be fine: [weliang@weliang tmp]$ oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/Pod/blue-pod-5.yaml pod/blue-pod-5 created [weliang@weliang tmp]$ oc get pod NAME READY STATUS RESTARTS AGE blue-pod-5 1/1 Running 0 8s egress-redirect-pod 0/1 Init:0/1 0 99m
Tried above reproducer on ovn-kubernetes and init container goes into CrashLoopBackoff. Can't find a reason in the logs.
Bug 1852593: Add netns mount #696 https://github.com/openshift/cluster-network-operator/pull/696 need to mount /run/netns HostToContainer
Follow same above steps,still get "unknown FS magic" error message: [weliang@weliang FILE]$ oc describe pod egress-redirect-pod Warning FailedCreatePodSandBox 8s kubelet, ip-10-0-196-223.us-east-2.compute.internal Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_egress-redirect-pod_test_fc087f0a-cfc1-4d27-9630-ed635db013a4_0(b04276353d55ba1c51b26e9d7521204fff09c7d8cd7f86017b398900289666dc): Multus: [test/egress-redirect-pod]: error adding container to network "openshift-sdn": delegateAdd: error invoking confAdd - "openshift-sdn": error in getting result from AddNetwork: CNI request failed with status 400: 'could not open netns "/var/run/netns/1e4fbf1a-dd6e-443d-823a-191ea2007a6f": unknown FS magic on "/var/run/netns/1e4fbf1a-dd6e-443d-823a-191ea2007a6f": 1021994 ' [weliang@weliang FILE]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-07-14-092216 True False 72m Cluster version is 4.6.0-0.nightly-2020-07-14-092216 [weliang@weliang FILE]$
Tested and verified in 4.6.0-0.nightly-2020-09-18-071428 oc describe pod egress-redirect-pod will not show error log as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1852593#c5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196