Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1343083 - not possible to start two router pods on same node
not possible to start two router pods on same node
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing (Show other bugs)
3.2.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Ben Bennett
zhaozhanqi
:
Depends On:
Blocks: 1267746
  Show dependency treegraph
 
Reported: 2016-06-06 09:12 EDT by Alexander Koksharov
Modified: 2017-03-08 13 EST (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Added the ability to set the internal SNI port with an environment variable. This allows all ports to be changed so that multiple routers can be run on a single node. Reason: Multiple routers may be needed to support different features (sharding). Result:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-27 05:33:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1933 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.3 Release Advisory 2016-09-27 09:24:36 EDT

  None (edit)
Description Alexander Koksharov 2016-06-06 09:12:56 EDT
Description of problem:

When two pods scheduled to run on the same node (different listen IPs set through env variables), majority of requests to what ever router are failed with error 503.

Version-Release number of selected component (if applicable):


How reproducible:
Start two router pods on a node

Steps to Reproduce:
1.
2.
3.

Actual results:
lots of requests fail

Expected results:
all requests forwarded to the application

Additional info:

- I did have this behavior when two router pods are running on same node:
> for i in {1..10}; do curl -sSLko /dev/null -w '%{http_code}\n' https://hello-world-cake.apps.alko.lab:11443/; done | grep  200 |  wc -l
2
> for i in {1..10}; do curl -sSLko /dev/null -w '%{http_code}\n' https://hello-world-cake.apps.alko.lab:11443/; done | grep  200 |  wc -l
7

- the below output suggests that there could be race conditions while binding to ports 10443 and 10444
# netstat -nlp4 | grep haproxy | sort
tcp        0      0 0.0.0.0:11080           0.0.0.0:*               LISTEN      2468/haproxy        
tcp        0      0 0.0.0.0:11443           0.0.0.0:*               LISTEN      2468/haproxy        
tcp        0      0 0.0.0.0:1936            0.0.0.0:*               LISTEN      59318/haproxy       
tcp        0      0 0.0.0.0:1938            0.0.0.0:*               LISTEN      2468/haproxy        
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      59318/haproxy       
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      59318/haproxy       
tcp        0      0 127.0.0.1:10443         0.0.0.0:*               LISTEN      59318/haproxy       
tcp        0      0 127.0.0.1:10444         0.0.0.0:*               LISTEN      59318/haproxy       

- I found out that standard template does have these ports defined. So, I altered configs for one of my routers:
root@master1 # oc get pods
NAME                      READY     STATUS    RESTARTS   AGE
docker-registry-2-qy76m   1/1       Running   0          8d
router-4-2lxqd            1/1       Running   0          8d
router-two-4-v0i7b        1/1       Running   0          9m

root@master1 # oc exec router-two-4-v0i7b  -- cat haproxy.config| grep -P "^\s+bind"
  bind :11080
  bind :11443
  bind 127.0.0.1:20444 ssl no-sslv3 crt /var/lib/haproxy/conf/default_pub_keys.pem crt /var/lib/containers/router/certs accept-proxy
  bind 127.0.0.1:20443 ssl no-sslv3 crt /var/lib/haproxy/conf/default_pub_keys.pem accept-proxy

root@master1 # oc exec router-4-2lxqd  -- cat haproxy.config| grep -P "^\s+bind"
  bind :80
  bind :443
  bind 127.0.0.1:10444 ssl no-sslv3 crt /var/lib/haproxy/conf/default_pub_keys.pem crt /var/lib/containers/router/certs accept-proxy
  bind 127.0.0.1:10443 ssl no-sslv3 crt /var/lib/haproxy/conf/default_pub_keys.pem accept-proxy

- As a result i have now:
root@worknode1 # netstat -nlp4 | grep haproxy
tcp        0      0 0.0.0.0:11080           0.0.0.0:*               LISTEN      76799/haproxy       
tcp        0      0 127.0.0.1:10443         0.0.0.0:*               LISTEN      76777/haproxy       
tcp        0      0 127.0.0.1:10444         0.0.0.0:*               LISTEN      76777/haproxy       
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      76777/haproxy       
tcp        0      0 0.0.0.0:1936            0.0.0.0:*               LISTEN      76777/haproxy       
tcp        0      0 0.0.0.0:1938            0.0.0.0:*               LISTEN      76799/haproxy       
tcp        0      0 0.0.0.0:11443           0.0.0.0:*               LISTEN      76799/haproxy       
tcp        0      0 127.0.0.1:20443         0.0.0.0:*               LISTEN      76799/haproxy       
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      76777/haproxy       
tcp        0      0 127.0.0.1:20444         0.0.0.0:*               LISTEN      76799/haproxy       

alko@localhost > for i in {1..10}; do curl -sSLko /dev/null -w '%{http_code}\n' https://hello-world-cake.apps.alko.lab/; done | grep  200 |  wc -l
10
Comment 1 Aleks Lazic 2016-06-06 09:16:49 EDT
Hi.

Florian have fixed it like this.

https://github.com/git001/openshift_custom_haproxy_ext/pull/1

BR Aleks
Comment 2 Aleks Lazic 2016-06-06 09:43:40 EDT
(In reply to Aleks Lazic from comment #1)
> Hi.
> 
> Florian have fixed it like this.
> 
> https://github.com/git001/openshift_custom_haproxy_ext/pull/1
> 
> BR Aleks

I have added a PR to origin.

https://github.com/openshift/origin/pull/9175

BR Aleks
Comment 3 Josep 'Pep' Turro Mauri 2016-06-07 03:41:05 EDT
There is a similar report in bug 1268904: it's for a different pair of ports, but essentially the same thing I believe. Wondering if we should mark this as a duplicate and make 1268904 handle all the hardcoded values.
Comment 4 Aleks Lazic 2016-06-07 04:04:02 EDT
Well it it's get faster fixed I'm in.
Comment 5 Ben Bennett 2016-06-08 13:55:01 EDT
No, 1268904 has already merged and is slightly different.  This PR has been reviewed and should be merged shortly, so let's keep this as a separate bug for now.
Comment 7 Eric Rich 2016-07-06 08:50:14 EDT
Ben / Ram, 

I think we can move this to POST? As https://github.com/openshift/origin/commit/5d25a1da3da43bdb74decf641e91ce0245490438 is merged upstream, and is deigned to fix this?
Comment 9 Ben Bennett 2016-07-08 13:46:48 EDT
(In reply to Eric Rich from comment #7)
> Ben / Ram, 
> 
> I think we can move this to POST? As
> https://github.com/openshift/origin/commit/
> 5d25a1da3da43bdb74decf641e91ce0245490438 is merged upstream, and is deigned
> to fix this?

That is correct.
Comment 10 Aleks Lazic 2016-07-11 15:15:03 EDT
does this mean that we can expect this template in Openshift Enterprise with the next update?!

More concrete question.
What does POST means for the end-users like the RH OSE Customers out there?
Comment 11 Jaspreet Kaur 2016-07-22 06:19:38 EDT
Hello,

Can we have an ETA as to when this is expected to fixed.

Regards,
Jaspreet
Comment 13 Ben Bennett 2016-07-22 09:59:42 EDT
It should be in 3.3.

As a work-around, on 3.2 you can replace the template in a router without rebuilding an image.  You can do that by making a ConfigMap that contains the changed template and then changing the router DC.  So, you'd pull the current router image and then apply the change in https://github.com/openshift/origin/commit/5d25a1da3da43bdb74decf641e91ce0245490438 to the new template.

A guide:
https://github.com/openshift/openshift-docs/blob/master/install_config/install/deploy_router.adoc#using-configmap-replace-template
Comment 14 zhaozhanqi 2016-08-16 05:08:33 EDT
verified this bug in 

# openshift version
openshift v3.3.0.21
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git


$ for i in {1..10} ; do curl --resolve test-service-default.0816-j34.qe.rhcloud.com:10443:172.18.7.237 https://test-service-default.0816-j34.qe.rhcloud.com:10443 -k ; done
Hello OpenShift!
Hello OpenShift!
Hello OpenShift!
Hello OpenShift!
Hello OpenShift!
Hello OpenShift!
Hello OpenShift!
Hello OpenShift!
Hello OpenShift!
Hello OpenShift!
Comment 16 errata-xmlrpc 2016-09-27 05:33:43 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.