Bug 1888024

Summary: [SDN] egress router dns-proxy: [haproxy.main()] Cannot chroot(/var/lib/haproxy).
Product: OpenShift Container Platform Reporter: Weibin Liang <weliang>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: router QA Contact: Hongan Li <hongli>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: medium CC: amcdermo, aos-bugs, bperkins, dmellado, mfisher
Version: 4.6   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1888039 (view as bug list) Environment:
Last Closed: 2020-10-24 15:32:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1888039    

Description Weibin Liang 2020-10-13 20:06:36 UTC
Description of problem:
Follow https://docs.openshift.com/container-platform/3.11/admin_guide/managing_networking.html#admin-guide-deploying-an-egress-dns-proxy-pod to deploy egress router dns-proxy pod in v4.6, but egress router pod is stuck in Init:CrashLoopBackOff state, and oc logs egress-pod show: [ALERT] 286/194522 (1) : [haproxy.main()] Cannot chroot(/var/lib/haproxy).


Version-Release number of selected component (if applicable):
4.6.0-rc.3

How reproducible:
Always

Steps to Reproduce:
1. oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/EgressRouter/egress-router-dns-pod.yaml
[root@hp-dl388g9-03 ~]# oc get pod
NAME                    READY   STATUS             RESTARTS   AGE
egress-router-dns-pod   0/1     CrashLoopBackOff   8          19m
[root@hp-dl388g9-03 ~]# oc logs egress-router-dns-pod
Running haproxy with config:
  
  global
      log         127.0.0.1 local2
  
      chroot      /var/lib/haproxy
      pidfile     /var/lib/haproxy/run/haproxy.pid
      maxconn     4000
      user        haproxy
      group       haproxy
  
  defaults
      log                     global
      mode                    tcp
      option                  dontlognull
      option                  tcplog
      option                  redispatch
      retries                 3
      timeout http-request    100s
      timeout queue           1m
      timeout connect         10s
      timeout client          1m
      timeout server          1m
      timeout http-keep-alive 100s
      timeout check           10s
  
  resolvers dns-resolver
      nameserver ns1 172.30.0.10:53
      resolve_retries      3
      timeout retry        1s
      hold valid           10s
  
  
  frontend fe1
      bind :80
      default_backend be1
  
  backend be1
      server-template dest 6 www.google.com:80 check resolvers dns-resolver resolve-prefer ipv4
  
  
  frontend fe2
      bind :8000
      default_backend be2
  
  backend be2
      server dest1 209.82.215.211:80 check 
  
  
  frontend fe3
      bind :8001
      default_backend be3
  
  backend be3
      server-template dest 5 www.yahoo.com:80 check resolvers dns-resolver resolve-prefer ipv4
  


[ALERT] 286/200126 (1) : [haproxy.main()] Cannot chroot(/var/lib/haproxy).
[root@hp-dl388g9-03 ~]# oc describe pod egress-router-dns-pod
Name:         egress-router-dns-pod
Namespace:    test3
Priority:     0
Node:         dell-per740-14.rhts.eng.pek2.redhat.com/10.73.116.62
Start Time:   Tue, 13 Oct 2020 15:45:00 -0400
Labels:       name=egress-router-dns-pod
Annotations:  k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.131.0.50"
                    ],
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.131.0.50"
                    ],
                    "default": true,
                    "dns": {}
                }]
              openshift.io/scc: node-exporter
              pod.network.openshift.io/assign-macvlan: true
Status:       Running
IP:           10.131.0.50
IPs:
  IP:  10.131.0.50
Init Containers:
  egress-router-setup:
    Container ID:   cri-o://aae35c2b3b87b32146a388a3b8896ba829824da57624ca6a0d97025dad77d5c0
    Image:          registry.redhat.io/openshift4/ose-egress-router
    Image ID:       registry.redhat.io/openshift4/ose-egress-router@sha256:50c573e5d4d5256d03cb4f6559c1a1e7f621d92c3223f11ec05d07d0510fa938
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 13 Oct 2020 15:45:03 -0400
      Finished:     Tue, 13 Oct 2020 15:45:03 -0400
    Ready:          True
    Restart Count:  0
    Environment:
      EGRESS_SOURCE:       10.73.116.69
      EGRESS_GATEWAY:      10.73.117.254
      EGRESS_ROUTER_MODE:  dns-proxy
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-f2h6x (ro)
Containers:
  egress-router-dns-proxy:
    Container ID:   cri-o://df817637c5c849d28eb02b595cdff299baf085fe40fb17dba816cc9a9a2e77e7
    Image:          registry.redhat.io/openshift4/ose-egress-dns-proxy
    Image ID:       registry.redhat.io/openshift4/ose-egress-dns-proxy@sha256:4eaa3f3ad651de08d0322d445992e8549cadfb8fe823bc94a5b3f820a7bb68fc
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 13 Oct 2020 16:01:24 -0400
      Finished:     Tue, 13 Oct 2020 16:01:26 -0400
    Ready:          False
    Restart Count:  8
    Environment:
      EGRESS_DNS_PROXY_DEBUG:        1
      EGRESS_DNS_PROXY_DESTINATION:  80     www.google.com
                                     8000   209.82.215.211 80
                                     8001   www.yahoo.com  80
                                     
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-f2h6x (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-f2h6x:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-f2h6x
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  app=egressrouter-dns
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason          Age                   From                                              Message
  ----     ------          ----                  ----                                              -------
  Normal   Scheduled       20m                                                                     Successfully assigned test3/egress-router-dns-pod to dell-per740-14.rhts.eng.pek2.redhat.com
  Normal   AddedInterface  19m                   multus                                            Add eth0 [10.131.0.50/23]
  Normal   Pulled          19m                   kubelet, dell-per740-14.rhts.eng.pek2.redhat.com  Container image "registry.redhat.io/openshift4/ose-egress-router" already present on machine
  Normal   Created         19m                   kubelet, dell-per740-14.rhts.eng.pek2.redhat.com  Created container egress-router-setup
  Normal   Started         19m                   kubelet, dell-per740-14.rhts.eng.pek2.redhat.com  Started container egress-router-setup
  Normal   Pulled          17m (x5 over 19m)     kubelet, dell-per740-14.rhts.eng.pek2.redhat.com  Container image "registry.redhat.io/openshift4/ose-egress-dns-proxy" already present on machine
  Normal   Created         17m (x5 over 19m)     kubelet, dell-per740-14.rhts.eng.pek2.redhat.com  Created container egress-router-dns-proxy
  Normal   Started         17m (x5 over 19m)     kubelet, dell-per740-14.rhts.eng.pek2.redhat.com  Started container egress-router-dns-proxy
  Warning  BackOff         4m33s (x69 over 19m)  kubelet, dell-per740-14.rhts.eng.pek2.redhat.com  Back-off restarting failed container
[root@hp-dl388g9-03 ~]# 

Actual results:
egress-router-dns-pod   0/1     CrashLoopBackOff

Expected results:
egress-router-dns-pod   1/1     Running

Additional info:
In v3.11 testing, dose not see error "[haproxy.main()] Cannot chroot(/var/lib/haproxy)" in pod log

Comment 1 Weibin Liang 2020-10-13 20:19:29 UTC
Must-gather logs: http://file.rdu.redhat.com/~weliang/must-gather.zip

Comment 4 Miciah Dashiel Butler Masters 2020-10-22 18:48:25 UTC
If I copy https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/EgressRouter/egress-router-dns-pod.yaml, modify the copy to specify securityContext.privileged: true in the "egress-router-dns-proxy" container spec, and create the pod, then the pod starts and runs.  For completeness, the following is the pod definition that I used:

    apiVersion: v1
    kind: Pod
    metadata:
      name: egress-router-dns-pod
      labels:
        name: egress-router-dns-pod
      annotations:
        pod.network.openshift.io/assign-macvlan: "true"
    spec:
      initContainers:
      - name: egress-router-setup
        image: registry.redhat.io/openshift4/ose-egress-router
        imagePullPolicy:  IfNotPresent
        securityContext:
          privileged: true
        env:
        - name: EGRESS_SOURCE
          value: 10.73.116.69
        - name: EGRESS_GATEWAY
          value: 10.73.117.254
        - name: EGRESS_ROUTER_MODE
          value: dns-proxy
      containers:
      - name: egress-router-dns-proxy
        image: registry.redhat.io/openshift4/ose-egress-dns-proxy
        imagePullPolicy:  IfNotPresent
        securityContext:
          privileged: true
        env:
        - name: EGRESS_DNS_PROXY_DEBUG
          value: "1"
        - name: EGRESS_DNS_PROXY_DESTINATION
          value: |
            80     www.google.com
            8000   209.82.215.211 80
            8001   www.yahoo.com  80
      nodeSelector:
        app: egressrouter-dns

I also had to label an arbitrary node to match the pod's node selector so that the pod could be scheduled:

    % oc label $(oc get nodes -o name | grep -e worker | tail -n 1) app=egressrouter-dns

The reported failure seems to be caused by a discrepancy in security constraints between OCP 3 and OCP 4.  However, we do not yet have documentation for using ose-egress-dns-proxy in OCP 4, and since you say the instructions for OCP 3 do work on 3.11, we don't really have any documentation related to this issue that needs to be fixed at this time.  When we add documentation for egress routers in OCP 4, we will need to make sure to take this discrepancy into account.  (I couldn't find a BZ specifically for adding documentation for egress routers, but maybe bug 1870710 encompasses that topic.)  

Weibin, would you mind verifying that the modified pod definition above works properly on OCP 4?  If it does, I believe we can classify this BZ as a documentation issue.

Comment 5 Weibin Liang 2020-10-23 19:33:49 UTC
Hi Miciah,

Specify securityContext.privileged: true in the "egress-router-dns-proxy" container spec

[weliang@weliang verification-tests]$ oc label node ip-10-0-128-133.us-east-2.compute.internal app=egressrouter-dns
node/ip-10-0-128-133.us-east-2.compute.internal labeled
[weliang@weliang verification-tests]$ oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/Features/EgressRouter/egress-router-dns-pod.yaml
pod/egress-router-dns-pod created
[weliang@weliang verification-tests]$ oc get pods
NAME                    READY   STATUS                  RESTARTS   AGE
egress-router-dns-pod   0/1     Init:CrashLoopBackOff   2          46s
[weliang@weliang verification-tests]$ oc logs egress-router-dns-pod
Error from server (BadRequest): container "egress-router-dns-proxy" in pod "egress-router-dns-pod" is waiting to start: PodInitializing
[weliang@weliang verification-tests]$ oc describe pod egress-router-dns-pod
Name:         egress-router-dns-pod
Namespace:    egressrouter
Priority:     0
Node:         ip-10-0-128-133.us-east-2.compute.internal/10.0.128.133
Start Time:   Fri, 23 Oct 2020 14:53:25 -0400
Labels:       name=egress-router-dns-pod
Annotations:  k8s.ovn.org/pod-networks:
                {"default":{"ip_addresses":["10.129.2.16/23"],"mac_address":"0a:58:0a:81:02:10","gateway_ips":["10.129.2.1"],"ip_address":"10.129.2.16/23"...
              k8s.v1.cni.cncf.io/network-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.129.2.16"
                    ],
                    "mac": "0a:58:0a:81:02:10",
                    "default": true,
                    "dns": {}
                }]
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "",
                    "interface": "eth0",
                    "ips": [
                        "10.129.2.16"
                    ],
                    "mac": "0a:58:0a:81:02:10",
                    "default": true,
                    "dns": {}
                }]
              openshift.io/scc: privileged
              pod.network.openshift.io/assign-macvlan: true
Status:       Pending
IP:           10.129.2.16
IPs:
  IP:  10.129.2.16
Init Containers:
  egress-router-setup:
    Container ID:   cri-o://dcffe5bcaeb9b314b6c725ccf13460ec5b173c076ba47d5ceea44279931694aa
    Image:          registry.redhat.io/openshift4/ose-egress-router
    Image ID:       registry.redhat.io/openshift4/ose-egress-router@sha256:61770462cfc33f732b8e30eb5b3805123a18dba02375ca7d511981e3855d1a45
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 23 Oct 2020 14:54:19 -0400
      Finished:     Fri, 23 Oct 2020 14:54:19 -0400
    Ready:          False
    Restart Count:  3
    Environment:
      EGRESS_SOURCE:       10.73.116.69
      EGRESS_GATEWAY:      10.73.117.254
      EGRESS_ROUTER_MODE:  dns-proxy
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hhxbf (ro)
Containers:
  egress-router-dns-proxy:
    Container ID:   
    Image:          registry.redhat.io/openshift4/ose-egress-dns-proxy
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Environment:
      EGRESS_DNS_PROXY_DEBUG:        1
      EGRESS_DNS_PROXY_DESTINATION:  80     www.google.com
                                     8000   209.82.215.211 80
                                     8001   www.yahoo.com  80
                                     
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hhxbf (ro)
Conditions:
  Type              Status
  Initialized       False 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-hhxbf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hhxbf
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  app=egressrouter-dns
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason          Age                From                                                 Message
  ----     ------          ----               ----                                                 -------
  Normal   Scheduled       <unknown>                                                               Successfully assigned egressrouter/egress-router-dns-pod to ip-10-0-128-133.us-east-2.compute.internal
  Normal   AddedInterface  71s                multus                                               Add eth0 [10.129.2.16/23]
  Normal   Pulling         71s                kubelet, ip-10-0-128-133.us-east-2.compute.internal  Pulling image "registry.redhat.io/openshift4/ose-egress-router"
  Normal   Pulled          60s                kubelet, ip-10-0-128-133.us-east-2.compute.internal  Successfully pulled image "registry.redhat.io/openshift4/ose-egress-router" in 10.799083895s
  Normal   Created         19s (x4 over 60s)  kubelet, ip-10-0-128-133.us-east-2.compute.internal  Created container egress-router-setup
  Normal   Started         19s (x4 over 60s)  kubelet, ip-10-0-128-133.us-east-2.compute.internal  Started container egress-router-setup
  Normal   Pulled          19s (x3 over 59s)  kubelet, ip-10-0-128-133.us-east-2.compute.internal  Container image "registry.redhat.io/openshift4/ose-egress-router" already present on machine
  Warning  BackOff         5s (x6 over 58s)   kubelet, ip-10-0-128-133.us-east-2.compute.internal  Back-off restarting failed container
[weliang@weliang verification-tests]$ 

The kubeconfig for this cluster: https://mastern-jenkins-csb-openshift-qe.cloud.paas.psi.redhat.com/job/Launch%20Environment%20Flexy/119944/artifact/workdir/install-dir/auth/kubeconfig/*view*/

Comment 6 Weibin Liang 2020-10-24 15:32:30 UTC
Hi Miciah,

Re-test and get egress pod running now. I will close this bug and open a new doc bug or add our finding in bug 1870710

Thanks!
Weibin