Description of problem: When two pods scheduled to run on the same node (different listen IPs set through env variables), majority of requests to what ever router are failed with error 503. Version-Release number of selected component (if applicable): How reproducible: Start two router pods on a node Steps to Reproduce: 1. 2. 3. Actual results: lots of requests fail Expected results: all requests forwarded to the application Additional info: - I did have this behavior when two router pods are running on same node: > for i in {1..10}; do curl -sSLko /dev/null -w '%{http_code}\n' https://hello-world-cake.apps.alko.lab:11443/; done | grep 200 | wc -l 2 > for i in {1..10}; do curl -sSLko /dev/null -w '%{http_code}\n' https://hello-world-cake.apps.alko.lab:11443/; done | grep 200 | wc -l 7 - the below output suggests that there could be race conditions while binding to ports 10443 and 10444 # netstat -nlp4 | grep haproxy | sort tcp 0 0 0.0.0.0:11080 0.0.0.0:* LISTEN 2468/haproxy tcp 0 0 0.0.0.0:11443 0.0.0.0:* LISTEN 2468/haproxy tcp 0 0 0.0.0.0:1936 0.0.0.0:* LISTEN 59318/haproxy tcp 0 0 0.0.0.0:1938 0.0.0.0:* LISTEN 2468/haproxy tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 59318/haproxy tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 59318/haproxy tcp 0 0 127.0.0.1:10443 0.0.0.0:* LISTEN 59318/haproxy tcp 0 0 127.0.0.1:10444 0.0.0.0:* LISTEN 59318/haproxy - I found out that standard template does have these ports defined. So, I altered configs for one of my routers: root@master1 # oc get pods NAME READY STATUS RESTARTS AGE docker-registry-2-qy76m 1/1 Running 0 8d router-4-2lxqd 1/1 Running 0 8d router-two-4-v0i7b 1/1 Running 0 9m root@master1 # oc exec router-two-4-v0i7b -- cat haproxy.config| grep -P "^\s+bind" bind :11080 bind :11443 bind 127.0.0.1:20444 ssl no-sslv3 crt /var/lib/haproxy/conf/default_pub_keys.pem crt /var/lib/containers/router/certs accept-proxy bind 127.0.0.1:20443 ssl no-sslv3 crt /var/lib/haproxy/conf/default_pub_keys.pem accept-proxy root@master1 # oc exec router-4-2lxqd -- cat haproxy.config| grep -P "^\s+bind" bind :80 bind :443 bind 127.0.0.1:10444 ssl no-sslv3 crt /var/lib/haproxy/conf/default_pub_keys.pem crt /var/lib/containers/router/certs accept-proxy bind 127.0.0.1:10443 ssl no-sslv3 crt /var/lib/haproxy/conf/default_pub_keys.pem accept-proxy - As a result i have now: root@worknode1 # netstat -nlp4 | grep haproxy tcp 0 0 0.0.0.0:11080 0.0.0.0:* LISTEN 76799/haproxy tcp 0 0 127.0.0.1:10443 0.0.0.0:* LISTEN 76777/haproxy tcp 0 0 127.0.0.1:10444 0.0.0.0:* LISTEN 76777/haproxy tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 76777/haproxy tcp 0 0 0.0.0.0:1936 0.0.0.0:* LISTEN 76777/haproxy tcp 0 0 0.0.0.0:1938 0.0.0.0:* LISTEN 76799/haproxy tcp 0 0 0.0.0.0:11443 0.0.0.0:* LISTEN 76799/haproxy tcp 0 0 127.0.0.1:20443 0.0.0.0:* LISTEN 76799/haproxy tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 76777/haproxy tcp 0 0 127.0.0.1:20444 0.0.0.0:* LISTEN 76799/haproxy alko@localhost > for i in {1..10}; do curl -sSLko /dev/null -w '%{http_code}\n' https://hello-world-cake.apps.alko.lab/; done | grep 200 | wc -l 10
Hi. Florian have fixed it like this. https://github.com/git001/openshift_custom_haproxy_ext/pull/1 BR Aleks
(In reply to Aleks Lazic from comment #1) > Hi. > > Florian have fixed it like this. > > https://github.com/git001/openshift_custom_haproxy_ext/pull/1 > > BR Aleks I have added a PR to origin. https://github.com/openshift/origin/pull/9175 BR Aleks
There is a similar report in bug 1268904: it's for a different pair of ports, but essentially the same thing I believe. Wondering if we should mark this as a duplicate and make 1268904 handle all the hardcoded values.
Well it it's get faster fixed I'm in.
No, 1268904 has already merged and is slightly different. This PR has been reviewed and should be merged shortly, so let's keep this as a separate bug for now.
Ben / Ram, I think we can move this to POST? As https://github.com/openshift/origin/commit/5d25a1da3da43bdb74decf641e91ce0245490438 is merged upstream, and is deigned to fix this?
(In reply to Eric Rich from comment #7) > Ben / Ram, > > I think we can move this to POST? As > https://github.com/openshift/origin/commit/ > 5d25a1da3da43bdb74decf641e91ce0245490438 is merged upstream, and is deigned > to fix this? That is correct.
does this mean that we can expect this template in Openshift Enterprise with the next update?! More concrete question. What does POST means for the end-users like the RH OSE Customers out there?
Hello, Can we have an ETA as to when this is expected to fixed. Regards, Jaspreet
It should be in 3.3. As a work-around, on 3.2 you can replace the template in a router without rebuilding an image. You can do that by making a ConfigMap that contains the changed template and then changing the router DC. So, you'd pull the current router image and then apply the change in https://github.com/openshift/origin/commit/5d25a1da3da43bdb74decf641e91ce0245490438 to the new template. A guide: https://github.com/openshift/openshift-docs/blob/master/install_config/install/deploy_router.adoc#using-configmap-replace-template
verified this bug in # openshift version openshift v3.3.0.21 kubernetes v1.3.0+507d3a7 etcd 2.3.0+git $ for i in {1..10} ; do curl --resolve test-service-default.0816-j34.qe.rhcloud.com:10443:172.18.7.237 https://test-service-default.0816-j34.qe.rhcloud.com:10443 -k ; done Hello OpenShift! Hello OpenShift! Hello OpenShift! Hello OpenShift! Hello OpenShift! Hello OpenShift! Hello OpenShift! Hello OpenShift! Hello OpenShift! Hello OpenShift!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933