Bug 1387597

Summary: [ocp-on-osp] OpenShift would not work if the first master is down when selecting External loadbalancer for the stack
Product: OpenShift Container Platform Reporter: Gan Huang <ghuang>
Component: InstallerAssignee: Jan Provaznik <jprovazn>
Status: CLOSED CURRENTRELEASE QA Contact: Gan Huang <ghuang>
Severity: high Docs Contact:
Priority: high    
Version: 3.3.0CC: aos-bugs, jokerman, jprovazn, mmccomas
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-20 08:37:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Gan Huang 2016-10-21 10:52:41 UTC
Description of problem:
Create a heat stack with external loadbalancer, the whole cluster would be down if the first master is down somehow. The root cause is that the certificate data is created with the first master by default if openshift_master_cluster_hostname and openshift_master_cluster_public_hostname are not specified in inventory hosts. So the cluster would not work again once the fist master(the whole instance or the atomic-openshift-master-api service) is down.

Version-Release number of selected component (if applicable):
v0.9.4

How reproducible:
always

Steps to Reproduce:
1.Create a heat stack with HA master + external loadbalancer
2.Shutdown the first master (usually named with "*master-0")
3.

Actual results:
# oc get po
Unable to connect to the server: dial tcp 192.168.10.7:8443: i/o timeout

Note: 192.168.10.7 is the first master ip.

On the nodes, the client-certificate-data is created with first master
[root@ghuang-test6-ha-ocp-node-42q26tq1 ~]# cat /etc/origin/node/system\:node\:ghuang-test6-ha-ocp-node-42q26tq1.test.com.kubeconfig 
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM1akNDQWRDZ0F3SUJBZ0lCQVRBTEJna3Foa2lHOXcwQkFRc3dKakVrTUNJR0ExVUVBd3diYjNCbGJuTm8KYVdaMExYTnBaMjVsY2tBeE5EYzNNRFEwTURreE1CNFhEVEUyTVRBeU1URXdNREV6TWxvWERUSXhNVEF5TURFdwpNREV6TTFvd0pqRWtNQ0lHQTFVRUF3d2JiM0JsYm5Ob2FXWjBMWE5wWjI1bGNrQXhORGMzTURRME1Ea3hNSUlCCklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUFzWDlKbVdpWDVMcUVkRXRyZjVVU3R3VzMKWTBEa2hhZGFPbC9FWmRjRTVKbFdveGJCRGJweWQzOC9XNW1EWTl0cGgvTUlpdkxJU054b1ZtVTQwSXBMK0EwQgoyNEcrR0RQK25lMlR3Y1JkSS83RXk1bVhETER3cWlvN1RXMlBXY2lGRWdyQTNheUJhbWh3N1BiazVhZ21ESWE2ClFvaU1rVHlNOHViZy9XSy9TS0crMHkzQklGelE4Rk8rWlVtK25QNmMwVnl0NEVXdW5WS0RKc3RIZlp0eVJjUWgKTFlSWXI3UDl6bFp3NlRtN0tTTHVnOXk5WVJGKzA3L0VaMFU1a254MEhnVFFTK2ZERHk1S2I1eUVnbWtGRit4ZgpwT2tuWFhIb1FVSXlUT1I0T1BSSG5yQ25DQWZmb3I4K05LWElUcWJlQkNOQ2lqc3Z2Q3M5YzNxRGw0MkRTUUlECkFRQUJveU13SVRBT0JnTlZIUThCQWY4RUJBTUNBS1F3RHdZRFZSMFRBUUgvQkFVd0F3RUIvekFMQmdrcWhraUcKOXcwQkFRc0RnZ0VCQUNyZ3RXS3hmR25Da3g0RlVvM2xnc0doRnpwUCtKUzY4aU03cVJOWC83eXcwRUp0RWYrcgpxZXRJVjNRempJTi92dnlLNWhLUTU3OUd0TjYrcWdBMHNMa1J1TGIwSnJpTUh3bGdTN3dJY29IQUhxUHd2eG1xCjRTbWlidzZEY3J3YkZ1QWtqRTdRS0gwRGM1NWxzbWkrWEZnRHkzVlJhbm16NW1HbzJXaWZvblNDcjJCL2JQLysKMTVVcG5HOHlLMS9uYVJyS2tvL0xnTEpXcm9pb0QvbE11eTdPNFJOQlp0eFB6S29rY1MvNGpUdUxkK0Qya1Q2TgpkWFZGM3RidE1HMzBVVkRmZHMwS0ZFRDVScHYycjFGVE1WZ2NMaHJteFJzR0YzRHpZS0IwN2lycHZ4anRDb2VECjhsdmJyR29ROTl6RHNkYTdnSTZQZTlIcEdqUWVpZjBYeWRVPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    server: https://ghuang-test6-ha-ocp-master-0.test.com:8443
  name: ghuang-test6-ha-ocp-master-0-test-com:8443
contexts:
- context:
    cluster: ghuang-test6-ha-ocp-master-0-test-com:8443
    namespace: default
    user: system:node:ghuang-test6-ha-ocp-node-42q26tq1.test.com/ghuang-test6-ha-ocp-master-0-test-com:8443
  name: default/ghuang-test6-ha-ocp-master-0-test-com:8443/system:node:ghuang-test6-ha-ocp-node-42q26tq1.test.com
current-context: default/ghuang-test6-ha-ocp-master-0-test-com:8443/system:node:ghuang-test6-ha-ocp-node-42q26tq1.test.com
kind: Config
preferences: {}
users:
- name: system:node:ghuang-test6-ha-ocp-node-42q26tq1.test.com/ghuang-test6-ha-ocp-master-0-test-com:8443
  user:
    client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURLakNDQWhTZ0F3SUJBZ0lCRlRBTEJna3Foa2lHOXcwQkFRc3dKakVrTUNJR0ExVUVBd3diYjNCbGJuTm8KYVdaMExYTnBaMjVsY2tBeE5EYzNNRFEwTURreE1CNFhEVEUyTVRBeU1URXdNRFl6TlZvWERURTRNVEF5TVRFdwpNRFl6Tmxvd1dERVZNQk1HQTFVRUNoTU1jM2x6ZEdWdE9tNXZaR1Z6TVQ4d1BRWURWUVFERXpaemVYTjBaVzA2CmJtOWtaVHBuYUhWaGJtY3RkR1Z6ZERZdGFHRXRiMk53TFc1dlpHVXROREp4TWpaMGNURXVkR1Z6ZEM1amIyMHcKZ2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLQW9JQkFRQ3JMN3k5eEQ5OHhVT3FrNDZIQlhSMgp3S1VyN25meHAwWXJZellKdVBWV1FYT25jS0lKNnpubWxDTStKSzZ4QkFXMWF2U2lwNWxoN1FtVFcyNXZQK25TCldEanJ5TmpJaUJEbXI4eDE1RUtFSDQ0bVIwYVQ1YkRlSzh0TTNicE1jTjBEWllnM2RockF2d0xXdmJBZU9vakEKTjRWRUcxVmtUWFE3L2JsMHJYQ3kvNnpHT1ZWNHFxRTZ4aE8vcXBweFkwUHZORDM0MEdYQmtGUVM5QW9lSzk2eApPMG1zOUllL3liTjExSjFGVGhha01IS2FGNGNLdHN1b1FhSjNHSWZvZjV4NWVXcWNaN1o3RGVsS2tIYVlkYk92CjhON3RkdS9BNUtwQUpTRVI0YW5nVTZvcHhaSHJMNk4yOC9GUWxZZGRVQmV6YjI0bkhQdmJSWVVUK0dOaUdxdjUKQWdNQkFBR2pOVEF6TUE0R0ExVWREd0VCL3dRRUF3SUFvREFUQmdOVkhTVUVEREFLQmdnckJnRUZCUWNEQWpBTQpCZ05WSFJNQkFmOEVBakFBTUFzR0NTcUdTSWIzRFFFQkN3T0NBUUVBVndQRHg4dWljQUZEeUNpTEhwNG5WMUZnCnBwczFRSTFvSGZiM1lrL1VsYzFESWtGQzVRUStyR0RjWEs1TXJlZDdrMEFDdjJlZ0djM0t1OWdpKytRcC9UTHEKcmd3bTh0dllMZ1lQblNtUjNLMkJhbW9EaHBBS2ZFbmZ4bU1KaUpWTzcraFpZL3R6RElmMTBOOTVmcWpEZzBXZQo0MjJVL1Z5UmJ2cTlkaXVjTGpVNjYzU3dqb3J4YkdwUnU5bXlwTEtQYmdoSHFzQlpoSHlzZVF5Vk42UUF5dmJ0CjdRNkdIYXplUTZObnl0U09VNTFoUVVTaElwZ0tUcmEvMU9ZOVhCRkFyaGtaTEFlQ2hwM0tpRlo5L1FHYlZNRlUKeUpZL3ViYU1IUE53MENTaTB3VG5CZUlCZ0thaFJVREEzZFNwQ1BudG1DZi9XREgwREJHc241YkIxaVRhVVE9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb2dJQkFBS0NBUUVBcXkrOHZjUS9mTVZEcXBPT2h3VjBkc0NsSys1MzhhZEdLMk0yQ2JqMVZrRnpwM0NpCkNlczU1cFFqUGlTdXNRUUZ0V3Iwb3FlWlllMEprMXR1YnovcDBsZzQ2OGpZeUlnUTVxL01kZVJDaEIrT0prZEcKaytXdzNpdkxUTjI2VEhEZEEyV0lOM1lhd0w4QzFyMndIanFJd0RlRlJCdFZaRTEwTy8yNWRLMXdzditzeGpsVgplS3FoT3NZVHY2cWFjV05EN3pROStOQmx3WkJVRXZRS0hpdmVzVHRKclBTSHY4bXpkZFNkUlU0V3BEQnltaGVICkNyYkxxRUdpZHhpSDZIK2NlWGxxbkdlMmV3M3BTcEIybUhXenIvRGU3WGJ2d09TcVFDVWhFZUdwNEZPcUtjV1IKNnkramR2UHhVSldIWFZBWHMyOXVKeHo3MjBXRkUvaGpZaHFyK1FJREFRQUJBb0lCQUNrN09VR1h5QmJjU0gwSQpSMWI4R0Y0VjduS1RZRzVpOU1LMGhhcDMweGV3Y2hQTlRDb0pid3U3ZUhXYVRqMHlrOUZyYm5yUzFWM0J3d0dzCkR3QmFxNDNQVS81dWhOQmYvWG9pczZOZGxDdlFrZU5rWFhwMzQwN1B5NHE3Q1FrcVVnRmtiaGUxcWFIdEg5anIKSFVWYW9kOXlQL1gwZzIvQ1BCSEsvZVU5ZFJ5WGxTeUpFMXJmdndDUSs2RHZOOFNmNmpKcDQ5aHFpNkZOSWJncQpHeGpPN1FVZUFOZ3dMTFZnd1FETDE1eTJLeTg2ZnJMRmdlYXpiSTBPWFVRaW9QWU1pMDhVS1BOeVpvMm5IL1VxCmg5QjROR3FGUUZXWXNWcHljMCtFUjJOa2RwcU9oOUw0dEVmWllnTk9wWlNrblZIbDI4QlpYOUNZZGJRRnN4SksKb3RiQ0xCVUNnWUVBd0wzV1J3eWtNNzVVclpBSkVSZUZVN2JsWnFMOXpHQkpZcURQaHFSSWVad1RmM3hJRjc2RQpKS2FHL1VCcWwyY2ZhUFBNUjRCUjdEeHZtNGNWNVJ1TmZseW1SamVSdWdVckE3RzRWTjcvbkdTTzhPazh5SW5uCmJYeDNpR0MrSVhIcTRnNTdwODNqRlo5UmRtc2E1OW5RNjhvU2VvUllwZzhBSlJrbWtIQ3dRd2NDZ1lFQTQxN1gKTmY2WjRoSmNtc2c2WE9aM0hwMDI2bDdBbk5wbGYxVHJ0dExQa0NnaFNYdEdsekhhWmxwMG4yQ3JZbnBMWC9EYwpVcWg5UG9QcjhDalN5cDd0WkhTenZtZmM2VVlZNkt4NTdBOHdYem0xM0NGdmFlYVJVOE1OWVk1ckt1YjBlZ3FqCi9CdDg4MXI1RmVzN3NyazlkbERwbTIxTzVYTFNsVGt1cTJXMjJQOENnWUEwMzE2Nm10TW9ocHZBQ1BVVHhUb0QKM3ZaTEU0Yy8yMklHTmtyM2luVi9OcnQ2aTJOVGNDWGJ6L3JUMmluallweVJNOS9qOVdXRHdvaHpSN2xQNGlFTQpldW41OVNCNndSUXRyVUQ5dHphemRqcG9CL051cDdYZXFQZzVaeUNCR0Rqd3pqeEpxZ2NUVldNSmN4UXNhZW9QCjVKenhFd0VtZkpMem1sU2o1dVhUWFFLQmdDS1B4aEwxRXBza3cyTGIwTk5TVFFVZ1RMcXZrSVBIUnVwbUZEYUUKTVB6dXZMQ1l4cEF4Q2N2Sk1EVVIwcnR6YjRXejdTbTdadDViMno5MFZTWnJwaFpCRHhtQVhEb3haNVBtczluSQpMVWdzVTVLVW1vVDBnVjdFSllLUXpZV0YrZCtiUW5ZT0Q1NUdVOXFiR1VYL2xuSW50bnJqMEx4Y0NkcVpDSmtSCkt3d3RBb0dBYWZ1dXQra0dYOHE1d09wMUdJVjIva3RwTXk1ekFlNUZJQUhsSjlMTERqZTl2Z3FDZHJPaHpxTHQKWFhLc3g0Um9mSlViRmovL3E2dTZldUFXWHVmQi9SazUrLzVLVkVTa2lrb3dYTUVOeUJwWlNqcmlmZVM3TjRvNQpIUUVnN2ZGM0F4QUptWDNzT0pJeWJQWDlJMjlEUGxyMHphU1ZmdjRsU25SdmtGaktBdnc9Ci0tLS0tRU5EIFJTQSBQUklWQVRFIEtFWS0tLS0tCg==


Recover the first master, openshift continue working.

Expected results:


Additional info:

Comment 1 Jan Provaznik 2016-10-25 18:10:00 UTC
Fixed in this PR:
https://github.com/redhat-openstack/openshift-on-openstack/pull/294

Comment 2 Jan Provaznik 2016-10-28 17:55:35 UTC
Fixed in 0.9.5

Comment 3 Gan Huang 2016-11-03 10:37:54 UTC
verified with v0.9.5

1. Create a stack which was using external loadbalancer (the hostname can be resloved via the dns nameserver)

2. app can be created successfully, and route can be accessed

3. Shutdown the first master after creating the stack

4. app can be created successfully, and route can be accessed

5. Scaling up a node

6. app can be created successfully, and route can be accessed

7. Scaling down a node

8. app can be created successfully, and route can be accessed

9. recover the first master

10. app can be created successfully, and route can be accessed