Hide Forgot
Description of problem: Follow https://docs.openshift.com/container-platform/3.11/admin_guide/managing_networking.html#admin-guide-deploying-an-egress-dns-proxy-pod to deploy egress router dns-proxy pod in v4.6, but egress router pod is stuck in Init:CrashLoopBackOff state, and oc logs egress-pod show: [ALERT] 286/194522 (1) : [haproxy.main()] Cannot chroot(/var/lib/haproxy). Version-Release number of selected component (if applicable): 4.6.0-rc.3 How reproducible: Always Steps to Reproduce: 1. oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/EgressRouter/egress-router-dns-pod.yaml [root@hp-dl388g9-03 ~]# oc get pod NAME READY STATUS RESTARTS AGE egress-router-dns-pod 0/1 CrashLoopBackOff 8 19m [root@hp-dl388g9-03 ~]# oc logs egress-router-dns-pod Running haproxy with config: global log 127.0.0.1 local2 chroot /var/lib/haproxy pidfile /var/lib/haproxy/run/haproxy.pid maxconn 4000 user haproxy group haproxy defaults log global mode tcp option dontlognull option tcplog option redispatch retries 3 timeout http-request 100s timeout queue 1m timeout connect 10s timeout client 1m timeout server 1m timeout http-keep-alive 100s timeout check 10s resolvers dns-resolver nameserver ns1 172.30.0.10:53 resolve_retries 3 timeout retry 1s hold valid 10s frontend fe1 bind :80 default_backend be1 backend be1 server-template dest 6 www.google.com:80 check resolvers dns-resolver resolve-prefer ipv4 frontend fe2 bind :8000 default_backend be2 backend be2 server dest1 209.82.215.211:80 check frontend fe3 bind :8001 default_backend be3 backend be3 server-template dest 5 www.yahoo.com:80 check resolvers dns-resolver resolve-prefer ipv4 [ALERT] 286/200126 (1) : [haproxy.main()] Cannot chroot(/var/lib/haproxy). [root@hp-dl388g9-03 ~]# oc describe pod egress-router-dns-pod Name: egress-router-dns-pod Namespace: test3 Priority: 0 Node: dell-per740-14.rhts.eng.pek2.redhat.com/10.73.116.62 Start Time: Tue, 13 Oct 2020 15:45:00 -0400 Labels: name=egress-router-dns-pod Annotations: k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.131.0.50" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.131.0.50" ], "default": true, "dns": {} }] openshift.io/scc: node-exporter pod.network.openshift.io/assign-macvlan: true Status: Running IP: 10.131.0.50 IPs: IP: 10.131.0.50 Init Containers: egress-router-setup: Container ID: cri-o://aae35c2b3b87b32146a388a3b8896ba829824da57624ca6a0d97025dad77d5c0 Image: registry.redhat.io/openshift4/ose-egress-router Image ID: registry.redhat.io/openshift4/ose-egress-router@sha256:50c573e5d4d5256d03cb4f6559c1a1e7f621d92c3223f11ec05d07d0510fa938 Port: <none> Host Port: <none> State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 13 Oct 2020 15:45:03 -0400 Finished: Tue, 13 Oct 2020 15:45:03 -0400 Ready: True Restart Count: 0 Environment: EGRESS_SOURCE: 10.73.116.69 EGRESS_GATEWAY: 10.73.117.254 EGRESS_ROUTER_MODE: dns-proxy Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-f2h6x (ro) Containers: egress-router-dns-proxy: Container ID: cri-o://df817637c5c849d28eb02b595cdff299baf085fe40fb17dba816cc9a9a2e77e7 Image: registry.redhat.io/openshift4/ose-egress-dns-proxy Image ID: registry.redhat.io/openshift4/ose-egress-dns-proxy@sha256:4eaa3f3ad651de08d0322d445992e8549cadfb8fe823bc94a5b3f820a7bb68fc Port: <none> Host Port: <none> State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Tue, 13 Oct 2020 16:01:24 -0400 Finished: Tue, 13 Oct 2020 16:01:26 -0400 Ready: False Restart Count: 8 Environment: EGRESS_DNS_PROXY_DEBUG: 1 EGRESS_DNS_PROXY_DESTINATION: 80 www.google.com 8000 209.82.215.211 80 8001 www.yahoo.com 80 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-f2h6x (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: default-token-f2h6x: Type: Secret (a volume populated by a Secret) SecretName: default-token-f2h6x Optional: false QoS Class: BestEffort Node-Selectors: app=egressrouter-dns Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 20m Successfully assigned test3/egress-router-dns-pod to dell-per740-14.rhts.eng.pek2.redhat.com Normal AddedInterface 19m multus Add eth0 [10.131.0.50/23] Normal Pulled 19m kubelet, dell-per740-14.rhts.eng.pek2.redhat.com Container image "registry.redhat.io/openshift4/ose-egress-router" already present on machine Normal Created 19m kubelet, dell-per740-14.rhts.eng.pek2.redhat.com Created container egress-router-setup Normal Started 19m kubelet, dell-per740-14.rhts.eng.pek2.redhat.com Started container egress-router-setup Normal Pulled 17m (x5 over 19m) kubelet, dell-per740-14.rhts.eng.pek2.redhat.com Container image "registry.redhat.io/openshift4/ose-egress-dns-proxy" already present on machine Normal Created 17m (x5 over 19m) kubelet, dell-per740-14.rhts.eng.pek2.redhat.com Created container egress-router-dns-proxy Normal Started 17m (x5 over 19m) kubelet, dell-per740-14.rhts.eng.pek2.redhat.com Started container egress-router-dns-proxy Warning BackOff 4m33s (x69 over 19m) kubelet, dell-per740-14.rhts.eng.pek2.redhat.com Back-off restarting failed container [root@hp-dl388g9-03 ~]# Actual results: egress-router-dns-pod 0/1 CrashLoopBackOff Expected results: egress-router-dns-pod 1/1 Running Additional info: In v3.11 testing, dose not see error "[haproxy.main()] Cannot chroot(/var/lib/haproxy)" in pod log
Must-gather logs: http://file.rdu.redhat.com/~weliang/must-gather.zip
If I copy https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/EgressRouter/egress-router-dns-pod.yaml, modify the copy to specify securityContext.privileged: true in the "egress-router-dns-proxy" container spec, and create the pod, then the pod starts and runs. For completeness, the following is the pod definition that I used: apiVersion: v1 kind: Pod metadata: name: egress-router-dns-pod labels: name: egress-router-dns-pod annotations: pod.network.openshift.io/assign-macvlan: "true" spec: initContainers: - name: egress-router-setup image: registry.redhat.io/openshift4/ose-egress-router imagePullPolicy: IfNotPresent securityContext: privileged: true env: - name: EGRESS_SOURCE value: 10.73.116.69 - name: EGRESS_GATEWAY value: 10.73.117.254 - name: EGRESS_ROUTER_MODE value: dns-proxy containers: - name: egress-router-dns-proxy image: registry.redhat.io/openshift4/ose-egress-dns-proxy imagePullPolicy: IfNotPresent securityContext: privileged: true env: - name: EGRESS_DNS_PROXY_DEBUG value: "1" - name: EGRESS_DNS_PROXY_DESTINATION value: | 80 www.google.com 8000 209.82.215.211 80 8001 www.yahoo.com 80 nodeSelector: app: egressrouter-dns I also had to label an arbitrary node to match the pod's node selector so that the pod could be scheduled: % oc label $(oc get nodes -o name | grep -e worker | tail -n 1) app=egressrouter-dns The reported failure seems to be caused by a discrepancy in security constraints between OCP 3 and OCP 4. However, we do not yet have documentation for using ose-egress-dns-proxy in OCP 4, and since you say the instructions for OCP 3 do work on 3.11, we don't really have any documentation related to this issue that needs to be fixed at this time. When we add documentation for egress routers in OCP 4, we will need to make sure to take this discrepancy into account. (I couldn't find a BZ specifically for adding documentation for egress routers, but maybe bug 1870710 encompasses that topic.) Weibin, would you mind verifying that the modified pod definition above works properly on OCP 4? If it does, I believe we can classify this BZ as a documentation issue.
Hi Miciah, Specify securityContext.privileged: true in the "egress-router-dns-proxy" container spec [weliang@weliang verification-tests]$ oc label node ip-10-0-128-133.us-east-2.compute.internal app=egressrouter-dns node/ip-10-0-128-133.us-east-2.compute.internal labeled [weliang@weliang verification-tests]$ oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/EgressRouter/egress-router-dns-pod.yaml pod/egress-router-dns-pod created [weliang@weliang verification-tests]$ oc get pods NAME READY STATUS RESTARTS AGE egress-router-dns-pod 0/1 Init:CrashLoopBackOff 2 46s [weliang@weliang verification-tests]$ oc logs egress-router-dns-pod Error from server (BadRequest): container "egress-router-dns-proxy" in pod "egress-router-dns-pod" is waiting to start: PodInitializing [weliang@weliang verification-tests]$ oc describe pod egress-router-dns-pod Name: egress-router-dns-pod Namespace: egressrouter Priority: 0 Node: ip-10-0-128-133.us-east-2.compute.internal/10.0.128.133 Start Time: Fri, 23 Oct 2020 14:53:25 -0400 Labels: name=egress-router-dns-pod Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.129.2.16/23"],"mac_address":"0a:58:0a:81:02:10","gateway_ips":["10.129.2.1"],"ip_address":"10.129.2.16/23"... k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.129.2.16" ], "mac": "0a:58:0a:81:02:10", "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.129.2.16" ], "mac": "0a:58:0a:81:02:10", "default": true, "dns": {} }] openshift.io/scc: privileged pod.network.openshift.io/assign-macvlan: true Status: Pending IP: 10.129.2.16 IPs: IP: 10.129.2.16 Init Containers: egress-router-setup: Container ID: cri-o://dcffe5bcaeb9b314b6c725ccf13460ec5b173c076ba47d5ceea44279931694aa Image: registry.redhat.io/openshift4/ose-egress-router Image ID: registry.redhat.io/openshift4/ose-egress-router@sha256:61770462cfc33f732b8e30eb5b3805123a18dba02375ca7d511981e3855d1a45 Port: <none> Host Port: <none> State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Fri, 23 Oct 2020 14:54:19 -0400 Finished: Fri, 23 Oct 2020 14:54:19 -0400 Ready: False Restart Count: 3 Environment: EGRESS_SOURCE: 10.73.116.69 EGRESS_GATEWAY: 10.73.117.254 EGRESS_ROUTER_MODE: dns-proxy Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-hhxbf (ro) Containers: egress-router-dns-proxy: Container ID: Image: registry.redhat.io/openshift4/ose-egress-dns-proxy Image ID: Port: <none> Host Port: <none> State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Environment: EGRESS_DNS_PROXY_DEBUG: 1 EGRESS_DNS_PROXY_DESTINATION: 80 www.google.com 8000 209.82.215.211 80 8001 www.yahoo.com 80 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-hhxbf (ro) Conditions: Type Status Initialized False Ready False ContainersReady False PodScheduled True Volumes: default-token-hhxbf: Type: Secret (a volume populated by a Secret) SecretName: default-token-hhxbf Optional: false QoS Class: BestEffort Node-Selectors: app=egressrouter-dns Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> Successfully assigned egressrouter/egress-router-dns-pod to ip-10-0-128-133.us-east-2.compute.internal Normal AddedInterface 71s multus Add eth0 [10.129.2.16/23] Normal Pulling 71s kubelet, ip-10-0-128-133.us-east-2.compute.internal Pulling image "registry.redhat.io/openshift4/ose-egress-router" Normal Pulled 60s kubelet, ip-10-0-128-133.us-east-2.compute.internal Successfully pulled image "registry.redhat.io/openshift4/ose-egress-router" in 10.799083895s Normal Created 19s (x4 over 60s) kubelet, ip-10-0-128-133.us-east-2.compute.internal Created container egress-router-setup Normal Started 19s (x4 over 60s) kubelet, ip-10-0-128-133.us-east-2.compute.internal Started container egress-router-setup Normal Pulled 19s (x3 over 59s) kubelet, ip-10-0-128-133.us-east-2.compute.internal Container image "registry.redhat.io/openshift4/ose-egress-router" already present on machine Warning BackOff 5s (x6 over 58s) kubelet, ip-10-0-128-133.us-east-2.compute.internal Back-off restarting failed container [weliang@weliang verification-tests]$ The kubeconfig for this cluster: https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Launch%20Environment%20Flexy/119944/artifact/workdir/install-dir/auth/kubeconfig/*view*/
Hi Miciah, Re-test and get egress pod running now. I will close this bug and open a new doc bug or add our finding in bug 1870710 Thanks! Weibin