Bug 1390641

Summary: receiving unexpected SSL certificate responses with multiple OpenShift routers running
Product: OpenShift Container Platform Reporter: Joel Diaz <jdiaz>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, bmeng, gburges, jgoulding, spurtell, twiest
Version: 3.2.1   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-01 19:32:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Joel Diaz 2016-11-01 15:09:03 UTC
Description of problem:
After setting up a second OpenShift router listening on non-default ports and serving up a separate SSL certificate, accessing a route over the default/original router now intermittently receive responses with the SSL certificate of the new OpenShift router.

Scaling the new OpenShift router down to zero makes the issue go away.

Version-Release number of selected component (if applicable):
[root@ip-172-31-48-106 ~]# rpm -qa | grep atomic-openshift
atomic-openshift-clients-3.2.1.17-1.git.0.6d01b60.el7.x86_64
atomic-openshift-master-3.2.1.17-1.git.0.6d01b60.el7.x86_64
tuned-profiles-atomic-openshift-node-3.2.1.17-1.git.0.6d01b60.el7.x86_64
atomic-openshift-3.2.1.17-1.git.0.6d01b60.el7.x86_64
atomic-openshift-node-3.2.1.17-1.git.0.6d01b60.el7.x86_64
atomic-openshift-sdn-ovs-3.2.1.17-1.git.0.6d01b60.el7.x86_64

How reproducible:
The intermittently wrong SSL responses are close to 50% of responses.


Steps to Reproduce:
1. Define second OpenShift router (with distinct SSL certificates):
oadm router router2 --default-cert=... --ports='8080:8080,8443:8443' --replicas=0 --stats-port=1937 
2. Update router2 deploymentconfig fixing up the 'ROUTER_SERVICE_HTTP_PORT' and 'ROUTER_SERVICE_HTTPS_PORT' values, and add env var: 'ROUTE_LABELS' set to 'route=external'.
3. Scale up router2, and start to curl services over the original OpenShift router

Actual results:

[root@ip-172-31-48-106 ~]# while `true` ; do curl -i -v https://nodejs-ssl-sharding.ab49.opstest.openshiftapps.com 2>&1 | grep subject ; sleep 2 ; done         
*       subject: CN=*.d234.opstest.openshiftapps.com,OU=RHC Cloud Operations,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.d234.opstest.openshiftapps.com,OU=RHC Cloud Operations,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.d234.opstest.openshiftapps.com,OU=RHC Cloud Operations,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.d234.opstest.openshiftapps.com,OU=RHC Cloud Operations,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
^C
[root@ip-172-31-48-106 ~]# oc scale --replicas=0 rc router2-1
replicationcontroller "router2-1" scaled
[root@ip-172-31-48-106 ~]# while `true` ; do curl -i -v https://nodejs-ssl-sharding.ab49.opstest.openshiftapps.com 2>&1 | grep subject ; sleep 2 ; done
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
^C


Expected results:

curling the original services with the second OpenShift router running should only see SSL cert responses with the original router's certificate.


Additional info:
[root@ip-172-31-48-106 ~]# oc describe route -n sharding nodejs-ssl
Name:                   nodejs-ssl
Created:                16 minutes ago
Labels:                 template=nodejs-example
Annotations:            openshift.io/host.generated=true
Requested Host:         nodejs-ssl-sharding.ab49.opstest.openshiftapps.com
                          exposed on router router 16 minutes ago
Path:                   <none>
TLS Termination:        edge
Insecure Policy:        <none>
Service:                nodejs-example
Endpoint Port:          web
Endpoints:              10.1.2.2:8080

Comment 1 Ben Bennett 2016-11-01 15:45:54 UTC
You also need to set the ROUTER_SERVICE_SNI_PORT and ROUTER_SERVICE_NO_SNI_PORT variables otherwise the two will get tangled.

Documented in PR https://github.com/openshift/openshift-docs/pull/3090, but it hasn't landed yet.

Comment 2 Joel Diaz 2016-11-01 16:31:18 UTC
So I set the environment vars, and I'm still seeing the intermittent wrong SSL cert responses.

[root@ip-172-31-48-106 ~]# oc get dc router -o yaml | grep SNI -A1
        - name: ROUTER_SERVICE_NO_SNI_PORT
          value: "10443"
        - name: ROUTER_SERVICE_SNI_PORT
          value: "10444"
[root@ip-172-31-48-106 ~]# oc get dc router2 -o yaml | grep SNI -A1
        - name: ROUTER_SERVICE_NO_SNI_PORT
          value: "11443"
        - name: ROUTER_SERVICE_SNI_PORT
          value: "11444"
[root@ip-172-31-48-106 ~]# oc get dc router2 -o yaml | grep haproxy-router
        image: registry.ops.openshift.com/openshift3/ose-haproxy-router:v3.2.1.17
[root@ip-172-31-48-106 ~]# curl -i -v https://nodejs-ssl-sharding.ab49.opstest.openshiftapps.com 2>&1 | grep subject
*       subject: CN=*.ab49.opstest.openshiftapps.com,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US
[root@ip-172-31-48-106 ~]# curl -i -v https://nodejs-ssl-sharding.ab49.opstest.openshiftapps.com 2>&1 | grep subject
*       subject: CN=*.d234.opstest.openshiftapps.com,OU=RHC Cloud Operations,O=Red Hat Inc.,L=Raleigh,ST=North Carolina,C=US

Comment 3 Joel Diaz 2016-11-01 18:28:29 UTC
FYI, tried this on a 3.3 cluster, and after adding the environment vars, I haven't seen the wrong SSL cert responses.

[root@ip-172-31-55-141 ~]# rpm -qa | grep atomic-openshift
tuned-profiles-atomic-openshift-node-3.3.1.3-1.git.0.86dc49a.el7.x86_64
atomic-openshift-clients-3.3.1.3-1.git.0.86dc49a.el7.x86_64
atomic-openshift-master-3.3.1.3-1.git.0.86dc49a.el7.x86_64
atomic-openshift-3.3.1.3-1.git.0.86dc49a.el7.x86_64
atomic-openshift-node-3.3.1.3-1.git.0.86dc49a.el7.x86_64
atomic-openshift-sdn-ovs-3.3.1.3-1.git.0.86dc49a.el7.x86_64
[root@ip-172-31-55-141 ~]# oc get dc router -n default -o yaml | grep SNI -A1   
[root@ip-172-31-55-141 ~]# oc get dc router2 -n default -o yaml | grep SNI -A1  
        - name: ROUTER_SERVICE_NO_SNI_PORT
          value: "11443"
        - name: ROUTER_SERVICE_SNI_PORT
          value: "11444"
[root@ip-172-31-55-141 ~]# oc get dc router2 -n default -o yaml | grep haproxy  
        image: registry.ops.openshift.com/openshift3/ose-haproxy-router:v3.3.1.3

Comment 4 Ben Bennett 2016-11-01 19:32:59 UTC
Ah, sorry... we added those in 3.3.

Your best option is to upgrade.  Failing that you could use a 3.3 router image with a 3.2 release.