Hide Forgot
Description of problem: When creating a non-default IngressController, the dependent deployment never achieves the default (2) available replicas. Version-Release number of selected component (if applicable): 4.0.0-0.alpha-2019-03-12-024440 How reproducible: Always Steps to Reproduce: 1. Install OpenShift 2. Create a clusteringress: kind: IngressController apiVersion: operator.openshift.io/v1 metadata: name: test0 namespace: openshift-ingress-operator spec: domain: tests0.<YOUR_INGRESS_DOMAIN> 3. Check the ingresscontroller: $ oc get ingresscontroller/test0 -n openshift-ingress-operator -o yaml | grep Actual results: availableReplicas availableReplicas: 1 Expected results: Note: The default number of replicas for an ingresscontroller is 2. availableReplicas availableReplicas: 2 Additional info: $ oc logs deploy/router-test0 -n openshift-ingress Found 2 pods, using pod/router-test0-566cfb6db8-gvl4z I0312 15:07:03.471696 1 template.go:299] Starting template router (4.0.0-20-g80b8c3d) I0312 15:07:03.475628 1 metrics.go:147] Router health and metrics port listening at 0.0.0.0:1936 on HTTP and HTTPS E0312 15:07:03.491998 1 haproxy.go:392] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory I0312 15:07:03.515420 1 router.go:482] Router reloaded: - Proxy protocol on, checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). I0312 15:07:03.515451 1 router.go:255] Router is including routes in all namespaces E0312 15:07:03.519758 1 reflector.go:205] github.com/openshift/router/pkg/router/controller/factory/factory.go:112: Failed to list *v1.Route: Unauthorized E0312 15:07:04.532099 1 reflector.go:322] github.com/openshift/router/pkg/router/controller/factory/factory.go:112: Failed to watch *v1.Route: the server has asked for the client to provide credentials (get routes.route.openshift.io) E0312 15:07:04.720898 1 status.go:171] Unable to write router status for openshift-monitoring/prometheus-k8s: Unauthorized I0312 15:07:04.750700 1 router.go:482] Router reloaded: - Proxy protocol on, checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). E0312 15:07:05.544733 1 reflector.go:322] github.com/openshift/router/pkg/router/controller/factory/factory.go:112: Failed to watch *v1.Route: the server has asked for the client to provide credentials (get routes.route.openshift.io) E0312 15:07:06.548378 1 reflector.go:205] github.com/openshift/router/pkg/router/controller/factory/factory.go:112: Failed to list *v1.Route: Unauthorized E0312 15:07:07.551659 1 reflector.go:205] github.com/openshift/router/pkg/router/controller/factory/factory.go:112: Failed to list *v1.Route: Unauthorized E0312 15:07:07.723293 1 status.go:171] Unable to write router status for openshift-monitoring/prometheus-k8s: Unauthorized E0312 15:07:08.553918 1 reflector.go:205] github.com/openshift/router/pkg/router/controller/factory/factory.go:112: Failed to list *v1.Route: Unauthorized E0312 15:07:09.556289 1 reflector.go:205] github.com/openshift/router/pkg/router/controller/factory/factory.go:112: Failed to list *v1.Route: Unauthorized I0312 15:07:09.756840 1 router.go:482] Router reloaded: - Proxy protocol on, checking http://localhost:80 ... - Health check ok : 0 retry attempt(s).
After looking at a 'describe' for the pod in question, it does not get scheduled due to anti-affinity rules: $ oc describe po/router-test0-566cfb6db8-zjfsf -n openshift-ingress <SNIP> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 2h (x751 over 3h) default-scheduler 0/6 nodes are available: 3 node(s) didn't match node selector, 3 node(s) didn't match pod affinity/anti-affinity, 3 node(s) didn't satisfy existing pods anti-affinity rules. Is it required that router pods from different ingress controllers NOT be scheduled to the same nodes?
Yeah, the anti-affinity rule is incomplete. It needs an additional selector to ensure anti-affinity is scoped to a particular ingresscontroller.
We change the anti-affinity rule to be preferred rather than required, which should enable horizontal scaling but also allow for surge pods to be scheduled on nodes during a deployment.
will verify with next nightly build which contains the fix.
verified with 4.0.0-0.nightly-2019-03-23-222829 the issue has been fixed. $ oc get ingresscontrollers.operator.openshift.io test0 -n openshift-ingress-operator -o yaml --- status: availableReplicas: 2 --- $ oc get pod -n openshift-ingress -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE router-default-65dc774d97-6hw5z 1/1 Running 0 13m 10.129.2.12 ip-172-31-134-125.ec2.internal <none> router-default-65dc774d97-b8wh2 1/1 Running 0 13m 10.131.0.12 ip-172-31-151-75.ec2.internal <none> router-default-65dc774d97-mkvmm 1/1 Running 0 12m 10.128.2.10 ip-172-31-162-21.ec2.internal <none> router-test0-649fd8d759-rtgj8 1/1 Running 0 98s 10.131.0.13 ip-172-31-151-75.ec2.internal <none> router-test0-649fd8d759-zcpqs 1/1 Running 0 98s 10.128.2.11 ip-172-31-162-21.ec2.internal <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758