Bug 1935691
Summary: | ovnkube ExternalIP for services that listen on port 80/443 will break IngressControllers after node reboot or scale in / scale out | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Andreas Karis <akaris> | |
Component: | Networking | Assignee: | Alexander Constantinescu <aconstan> | |
Networking sub component: | ovn-kubernetes | QA Contact: | Weibin Liang <weliang> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | aconstan, astoycos, zzhao | |
Version: | 4.6 | |||
Target Milestone: | --- | |||
Target Release: | 4.6.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1937727 (view as bug list) | Environment: | ||
Last Closed: | 2021-03-30 17:03:16 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1937727 | |||
Bug Blocks: |
Description
Andreas Karis
2021-03-05 11:51:24 UTC
Prerequisites: Build python image and push it to registry: ~~~ IMAGE=registry.example.com:5000/python:latest mkdir python cat <<'EOF' > python/Dockerfile FROM registry.access.redhat.com/ubi8/ubi RUN yum install iproute iputils tcpdump python38 -y EOF cd python buildah bud -t $IMAGE . podman push $IMAGE cd - ~~~ Scale in ingress operator: ~~~ [root@openshift-jumpserver-0 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.18 True False 9m38s Error while reconciling 4.6.18: the cluster operator monitoring is degraded [root@openshift-jumpserver-0 ~]# oc scale ingresscontrollers -n openshift-ingress-operator --replicas=1 default ingresscontroller.operator.openshift.io/default scaled [root@openshift-jumpserver-0 ~]# oc get pods -A -o wide | grep -i ingress openshift-ingress-operator ingress-operator-6cfd945dfb-qc8bd 2/2 Running 0 19m 172.26.0.36 openshift-master-2 <none> <none> openshift-ingress router-default-6d6d869656-sqj4h 1/1 Running 0 23m 192.168.123.221 openshift-worker-1 <none> <none> ~~~ Now deploy the service: ~~~ oc new-project test oc project test oc adm policy add-scc-to-user privileged -z default cat <<'EOF' > deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: fedora-deployment labels: app: fedora-deployment spec: replicas: 1 selector: matchLabels: app: fedora-pod template: metadata: labels: app: fedora-pod spec: containers: - name: fedora image: registry.example.com:5000/python:latest command: - python3 args: - "-m" - "http.server" - "80" imagePullPolicy: IfNotPresent securityContext: runAsUser: 0 capabilities: add: - "SETFCAP" --- apiVersion: v1 kind: Service metadata: name: shell-demo spec: selector: app: fedora-pod ports: - protocol: TCP port: 80 targetPort: 80 type: LoadBalancer EOF oc apply -f deployment.yaml ~~~ Test the service: ~~~ [root@openshift-jumpserver-0 ~]# ip r a 10.1.1.67 via 192.168.123.220 [root@openshift-jumpserver-0 ~]# ip r | grep 10.1.1.67 10.1.1.67 via 192.168.123.220 dev eth0 [root@openshift-jumpserver-0 ~]# curl 10.1.1.67:80 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Directory listing for /</title> </head> <body> <h1>Directory listing for /</h1> <hr> <ul> <li><a href="bin/">bin@</a></li> <li><a href="boot/">boot/</a></li> <li><a href="dev/">dev/</a></li> <li><a href="etc/">etc/</a></li> <li><a href="home/">home/</a></li> <li><a href="lib/">lib@</a></li> <li><a href="lib64/">lib64@</a></li> <li><a href="lost%2Bfound/">lost+found/</a></li> <li><a href="media/">media/</a></li> <li><a href="mnt/">mnt/</a></li> <li><a href="opt/">opt/</a></li> <li><a href="proc/">proc/</a></li> <li><a href="root/">root/</a></li> <li><a href="run/">run/</a></li> <li><a href="sbin/">sbin@</a></li> <li><a href="srv/">srv/</a></li> <li><a href="sys/">sys/</a></li> <li><a href="tmp/">tmp/</a></li> <li><a href="usr/">usr/</a></li> <li><a href="var/">var/</a></li> </ul> <hr> </body> </html> ~~~ Check ovnkube logs for both workers: ~~~ [root@openshift-jumpserver-0 ~]# oc logs -n openshift-ovn-kubernetes ovnkube-node-7n5qd -c ovnkube-node | grep test/shell-demo W0305 12:47:29.195368 4214 port_claim.go:191] PortClaim for svc: test/shell-demo on port: 80, err: listen tcp :80: bind: address already in use E0305 12:47:29.195384 4214 port_claim.go:60] Error updating port claim for service: test/shell-demo: listen tcp :80: bind: address already in use I0305 12:47:29.195445 4214 event.go:278] Event(v1.ObjectReference{Kind:"Service", Namespace:"test", Name:"shell-demo", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'PortClaim' Service: test/shell-demo requires port: 80 to be opened on node, but port cannot be opened, err: listen tcp :80: bind: address already in use [root@openshift-jumpserver-0 ~]# oc logs -n openshift-ovn-kubernetes ovnkube-node-jf8zs -c ovnkube-node | grep test/shell-demo [root@openshift-jumpserver-0 ~]# ~~~ The problem here is that ovnkube binds to all ports: ~~~ [root@openshift-jumpserver-0 ~]# oc exec -it -n openshift-ovn-kubernetes ovnkube-node-jf8zs -- ss -lntp | grep :80 Defaulting container name to ovn-controller. Use 'oc describe pod/ovnkube-node-jf8zs -n openshift-ovn-kubernetes' to see all of the containers in this pod. LISTEN 0 128 *:80 *:* users:(("ovnkube",pid=4231,fd=8)) ~~~ Now, scale out the ingress controller: ~~~ oc scale ingresscontrollers -n openshift-ingress-operator --replicas=2 default ~~~ The IngressController will not be able to spawn: ~~~ <none> [root@openshift-jumpserver-0 ~]# oc get pods -A -o wide | grep ingress openshift-ingress-operator ingress-operator-6cfd945dfb-qc8bd 2/2 Running 0 26m 172.26.0.36 openshift-master-2 <none> <none> openshift-ingress router-default-6d6d869656-bd7zl 0/1 Running 1 103s 192.168.123.220 openshift-worker-0 <none> <none> openshift-ingress router-default-6d6d869656-sqj4h 1/1 Running 0 31m 192.168.123.221 openshift-worker-1 <none> <none> ~~~ ~~~ [root@openshift-jumpserver-0 ~]# oc logs -n openshift-ingress router-default-6d6d869656-bd7zl | tail -n 20 I0305 12:53:52.058358 1 template.go:403] router "msg"="starting router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 0ced824c9667a259b75e963a16f3dda4b5d781f6\nversionFromGit: 4.0.0-232-g0ced824\ngitTreeState: clean\nbuildDate: 2021-02-13T02:16:38Z\n" I0305 12:53:52.059973 1 metrics.go:154] metrics "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936" I0305 12:53:52.065260 1 router.go:185] template "msg"="creating a new template router" "writeDir"="/var/lib/haproxy" I0305 12:53:52.065329 1 router.go:263] template "msg"="router will coalesce reloads within an interval of each other" "interval"="5s" I0305 12:53:52.065720 1 router.go:325] template "msg"="watching for changes" "path"="/etc/pki/tls/private" I0305 12:53:52.065779 1 router.go:262] router "msg"="router is including routes in all namespaces" E0305 12:53:52.173533 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory E0305 12:53:52.193445 1 limiter.go:165] error reloading router: exit status 1 [ALERT] 063/125352 (36) : Starting frontend public: cannot bind socket [0.0.0.0:80] E0305 12:53:57.173618 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: connection refused E0305 12:53:57.191593 1 limiter.go:165] error reloading router: exit status 1 [ALERT] 063/125357 (39) : Starting frontend public: cannot bind socket [0.0.0.0:80] I0305 12:54:25.803756 1 template.go:657] router "msg"="Shutdown requested, waiting 45s for new connections to cease" [root@openshift-jumpserver-0 ~]# ~~~ This can also be reproduced by ACPI shutting down the entire cluster and bringing it back up. Or, in a cluster with 2 workers and 2 ingress routers, you can simply reboot the worker node. When the node comes up, ovnkube will open the port before the ingress haproxy can. The problem is that if we configure this a configuration issue, then this configuration issue can go unnoticed for weeks or months. At some point, when a customer had to restart the Ingress router and it tries to come up on the same node, this will fail. Why is it required that ovnkube bind to 0.0.0.0:80 for an ExternalIP service, anyway? It seems that all of this is implemented as an OVN loadbalancer internally, so binding to :80 seems to be unnecessary? I also tested with OCP 4.7.0 and behavior is exactly the same Hi Andreas Could you provide me with a reproducer for this? I've tried reproducing on AWS, but I don't see the same behavior. OVN-Kubernetes will perform port claims for NodePort and LoadBalancer type services, but will bind the port to the nodePort defined for those services, so it should not bind to port 80 in your case. OVN-Kubernetes will also bind ports for ExternalIP type services, but for those services it will specifically bind to the $EXTERNAL_IP:$PORT, not 0.0.0.0:$PORT. See my reproducer below: $ oc project Using project "test" on server "https://api.ci-ln-hb93yw2-d5d6b.origin-ci-int-aws.dev.rhcloud.com:6443". aconstan@localhost ~ $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cluster-ip-service LoadBalancer 172.30.82.243 aa255dab2e6e54cbdb633287010e861c-835744214.us-west-2.elb.amazonaws.com 80:30132/TCP 9m43s aconstan@localhost ~ $ oc get ep NAME ENDPOINTS AGE cluster-ip-service 10.128.10.14:8080 9m48s aconstan@localhost ~ $ oc get pod NAME READY STATUS RESTARTS AGE netserver-66c995678-fdpkn 1/1 Running 0 12m aconstan@localhost ~ $ oc get svc -o yaml apiVersion: v1 items: - apiVersion: v1 kind: Service metadata: creationTimestamp: "2021-03-10T08:57:00Z" name: cluster-ip-service namespace: test resourceVersion: "30486" uid: a255dab2-e6e5-4cbd-b633-287010e861c9 spec: clusterIP: 172.30.82.243 clusterIPs: - 172.30.82.243 externalTrafficPolicy: Cluster ports: - name: tcp nodePort: 30132 port: 80 protocol: TCP targetPort: 8080 selector: deployment: "true" sessionAffinity: None type: LoadBalancer status: loadBalancer: ingress: - hostname: aa255dab2e6e54cbdb633287010e861c-835744214.us-west-2.elb.amazonaws.com kind: List metadata: resourceVersion: "" selfLink: "" aconstan@localhost ~ $ oc exec -tic ovnkube-node ovnkube-node-n557f -n openshift-ovn-kubernetes -- bash [root@ip-10-0-193-68 ~]# ss -lntp | grep :30132 LISTEN 0 128 *:30132 *:* users:(("ovnkube",pid=2549,fd=9)) [root@ip-10-0-193-68 ~]# ss -lntp | grep :80 [root@ip-10-0-193-68 ~]# If you could provide me with a kubeconfig to your setup, I'll have a look at it. I might have overlooked something when trying to reproduce. /Alex Hi, I'll work on a reproducer. Did you use the baremetal ExternalIP feature for this? The problem IMO lies there, and not in Loadbalancer type services per se: * Configure ExternalIP feature https://docs.openshift.com/container-platform/4.6/networking/configuring_ingress_cluster_traffic/configuring-externalip.html ~~~ oc get networks.config cluster -o jsonpath-as-json='{.spec.externalIP}' [ { "autoAssignCIDRs": [ "10.1.1.64/27" ] } ] ~~~ - Andreas Hi I understand, and I agree that the problem most likely lies in those subtleties. However OVN-Kubernetes makes no distinction of that for what concerns the service specification, i.e: if there is an external IP defined for a service then it will bind the port to the $EXTERNAL_IP:PORT (at least on 4.7, on 4.6 this is not the case and does not work, see: 4.7 code, https://github.com/openshift/ovn-kubernetes/blob/release-4.7/go-controller/pkg/node/port_claim.go#L189) I am thus wondering if there's another component modifying the service specification on your baremetal cluster with this loadBalancer type service, to something we didn't expect. /Alex Fresh install OCP 4.7.1 with UPI: ~~~ [root@openshift-jumpserver-0 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.1 True False 3h49m Cluster version is 4.7.1 ~~~ ~~~ [root@openshift-jumpserver-0 ~]# oc edit networks.config cluster network.config.openshift.io/cluster edited [root@openshift-jumpserver-0 ~]# oc get networks.config cluster -o jsonpath-as-json='{.spec.externalIP}' [ { "autoAssignCIDRs": [ "10.1.1.64/27" ] } ] ~~~ ~~~ [root@openshift-jumpserver-0 ~]# oc scale -n openshift-ingress-operator ingresscontroller default --replicas=1 ingresscontroller.operator.openshift.io/default scaled ~~~ ~~~ [root@openshift-jumpserver-0 ~]# oc new-project test Error from server (AlreadyExists): project.project.openshift.io "test" already exists [root@openshift-jumpserver-0 ~]# oc project test Already on project "test" on server "https://api.cluster.example.com:6443". [root@openshift-jumpserver-0 ~]# oc adm policy add-scc-to-user privileged -z default clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "default" [root@openshift-jumpserver-0 ~]# cat <<'EOF' > deployment.yaml > apiVersion: apps/v1 > kind: Deployment > metadata: > name: fedora-deployment > labels: > app: fedora-deployment > spec: > replicas: 1 > selector: > matchLabels: > app: fedora-pod > template: > metadata: > labels: > app: fedora-pod > spec: > containers: > - name: fedora > image: registry.example.com:5000/python:latest > command: > - python3 > args: > - "-m" > - "http.server" > - "80" > imagePullPolicy: IfNotPresent > securityContext: > runAsUser: 0 > capabilities: > add: > - "SETFCAP" > --- > apiVersion: v1 > kind: Service > metadata: > name: shell-demo > spec: > selector: > app: fedora-pod > ports: > - protocol: TCP > port: 80 > targetPort: 80 > type: LoadBalancer > EOF [root@openshift-jumpserver-0 ~]# oc apply -f deployment.yaml oc edeployment.apps/fedora-deployment created teservice/shell-demo created [root@openshift-jumpserver-0 ~]# oc get pods NAME READY STATUS RESTARTS AGE fedora-deployment-68c46ccfd6-jdb42 1/1 Running 0 4s [root@openshift-jumpserver-0 ~]# oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE shell-demo LoadBalancer 172.30.94.243 10.1.1.72 80:32536/TCP 8s [root@openshift-jumpserver-0 ~]# ip r a 10.1.1.72 via 192.168.123.220 [root@openshift-jumpserver-0 ~]# curl 10.1.1.72:80 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>Directory listing for /</title> </head> <body> <h1>Directory listing for /</h1> <hr> <ul> <li><a href="bin/">bin@</a></li> <li><a href="boot/">boot/</a></li> <li><a href="dev/">dev/</a></li> <li><a href="etc/">etc/</a></li> <li><a href="home/">home/</a></li> <li><a href="lib/">lib@</a></li> <li><a href="lib64/">lib64@</a></li> <li><a href="lost%2Bfound/">lost+found/</a></li> <li><a href="media/">media/</a></li> <li><a href="mnt/">mnt/</a></li> <li><a href="opt/">opt/</a></li> <li><a href="proc/">proc/</a></li> <li><a href="root/">root/</a></li> <li><a href="run/">run/</a></li> <li><a href="sbin/">sbin@</a></li> <li><a href="srv/">srv/</a></li> <li><a href="sys/">sys/</a></li> <li><a href="tmp/">tmp/</a></li> <li><a href="usr/">usr/</a></li> <li><a href="var/">var/</a></li> </ul> <hr> </body> </html> ~~~ Wait until the Ingress Pods are done terminating and until the new pod is up: ~~~ [root@openshift-jumpserver-0 ~]# oc get pods -A | grep ingress openshift-ingress-canary ingress-canary-5t64b 1/1 Running 0 4h5m openshift-ingress-canary ingress-canary-qn6vl 1/1 Running 0 4h5m openshift-ingress-operator ingress-operator-67f8fdc58f-hj2n2 2/2 Running 3 4h30m openshift-ingress router-default-766754d55f-mj6rf 1/1 Running 0 99s ~~~ ~~~ [root@openshift-jumpserver-0 ~]# oc get pods -A -o wide | grep ingress openshift-ingress-canary ingress-canary-5t64b 1/1 Running 0 4h10m 172.24.2.7 openshift-worker-0 <none> <none> openshift-ingress-canary ingress-canary-qn6vl 1/1 Running 0 4h10m 172.27.0.7 openshift-worker-1 <none> <none> openshift-ingress-operator ingress-operator-67f8fdc58f-hj2n2 2/2 Running 3 4h35m 172.26.0.13 openshift-master-2 <none> <none> openshift-ingress router-default-766754d55f-mj6rf 1/1 Running 0 6m42s 192.168.123.221 openshift-worker-1 <none> <none> [root@openshift-jumpserver-0 ~]# oc get pods -n openshift-ovn-kubernetes -o wide | grep ovnkube | grep worker ovnkube-node-bdz5v 3/3 Running 0 4h11m 192.168.123.220 openshift-worker-0 <none> <none> ovnkube-node-rhwzb 3/3 Running 0 4h11m 192.168.123.221 openshift-worker-1 <none> <none> ~~~ Interesting: ~~~ [root@openshift-jumpserver-0 ~]# oc exec -it -n openshift-ovn-kubernetes ovnkube-node-bdz5v -c ovn-controller -- ss -lntp | grep :80 [root@openshift-jumpserver-0 ~]# oc exec -it -n openshift-ovn-kubernetes ovnkube-node-bdz5v -c ovnkube-node -- ss -lntp | grep :80 [root@openshift-jumpserver-0 ~]# [root@openshift-jumpserver-0 ~]# ~~~ ~~~ [root@openshift-jumpserver-0 ~]# oc scale -n openshift-ingress-operator ingresscontroller default --replicas=2 ingresscontroller.operator.openshift.io/default scaled [root@openshift-jumpserver-0 ~]# oc get pods -A -o wide | grep ingress openshift-ingress-canary ingress-canary-5t64b 1/1 Running 0 4h14m 172.24.2.7 openshift-worker-0 <none> <none> openshift-ingress-canary ingress-canary-qn6vl 1/1 Running 0 4h14m 172.27.0.7 openshift-worker-1 <none> <none> openshift-ingress-operator ingress-operator-67f8fdc58f-hj2n2 2/2 Running 3 4h40m 172.26.0.13 openshift-master-2 <none> <none> openshift-ingress router-default-766754d55f-mj6rf 1/1 Running 0 11m 192.168.123.221 openshift-worker-1 <none> <none> openshift-ingress router-default-766754d55f-mwzcb 1/1 Running 0 14s 192.168.123.220 openshift-worker-0 <none> <none> ~~~ So either my earlier test with OCP 4.7.0 was off, or this was fixed with OCP 4.7.1 ... (?). I could definitely reproduce this with 4.6.18 Did https://github.com/ovn-org/ovn-kubernetes/commit/4efbb59969223c4090c572be2c99d7280a871c8e only recently make it downstream? Is it possible that this only affects OCP 4.6 and not 4.7? So, given that it's been verified as working on 4.7 (see #comment 8) we can safely assume that we'e missing the commit mentioned in #comment 9 on 4.6. I am thus using this bug as a backport bug for it. Given this somewhat strange situation, I will need to file master bugs against 4.7 and 4.8 and set them directly to CLOSED ERRATA (as this is not a problem on those versions). Tested and verified in 4.6.0-0.nightly-2021-03-21-131139 [weliang@weliang Config]$ oc exec -it ovnkube-node-hffqt -c ovn-controller -- ss -lntp | grep :80 [weliang@weliang Config]$ oc exec -it ovnkube-node-hffqt -c ovnkube-node -- ss -lntp | grep :80 [weliang@weliang Config]$ oc exec -it ovnkube-node-h5bcl -c ovn-controller -- ss -lntp | grep :80 [weliang@weliang Config]$ oc exec -it ovnkube-node-h5bcl -c ovnkube-node -- ss -lntp | grep :80 [weliang@weliang Config]$ oc exec -it ovnkube-node-55ps7 -c ovn-controller -- ss -lntp | grep :80 [weliang@weliang Config]$ oc exec -it ovnkube-node-55ps7 -c ovnkube-node -- ss -lntp | grep :80 [weliang@weliang Config]$ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE externalip-svc ClusterIP 172.30.94.49 3.139.113.65 80/TCP 6m25s hello-service1 LoadBalancer 172.30.121.21 aa630cad0f2e64b5da8abe56fb3a0830-1691815349.us-east-2.elb.amazonaws.com 80:31261/TCP 11m [weliang@weliang Config]$ oc get pod -n openshift-ingress -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-576fdf88d6-852cj 1/1 Running 0 24h 10.0.99.130 weliang-224-w5tzb-compute-1 <none> <none> router-default-576fdf88d6-jlzwf 0/1 Running 43 130m 10.0.97.15 weliang-224-w5tzb-compute-0 <none> <none> router-default-576fdf88d6-qrtnj 1/1 Running 0 130m 10.0.98.203 weliang-224-w5tzb-compute-2 <none> <none> [weliang@weliang Config]$ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.23 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0952 |