Bug 1387597 - [ocp-on-osp] OpenShift would not work if the first master is down when selecting External loadbalancer for the stack
Summary: [ocp-on-osp] OpenShift would not work if the first master is down when select...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Jan Provaznik
QA Contact: Gan Huang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-21 10:52 UTC by Gan Huang
Modified: 2017-03-20 08:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-20 08:37:42 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Gan Huang 2016-10-21 10:52:41 UTC
Description of problem:
Create a heat stack with external loadbalancer, the whole cluster would be down if the first master is down somehow. The root cause is that the certificate data is created with the first master by default if openshift_master_cluster_hostname and openshift_master_cluster_public_hostname are not specified in inventory hosts. So the cluster would not work again once the fist master(the whole instance or the atomic-openshift-master-api service) is down.

Version-Release number of selected component (if applicable):
v0.9.4

How reproducible:
always

Steps to Reproduce:
1.Create a heat stack with HA master + external loadbalancer
2.Shutdown the first master (usually named with "*master-0")
3.

Actual results:
# oc get po
Unable to connect to the server: dial tcp 192.168.10.7:8443: i/o timeout

Note: 192.168.10.7 is the first master ip.

On the nodes, the client-certificate-data is created with first master
[root@ghuang-test6-ha-ocp-node-42q26tq1 ~]# cat /etc/origin/node/system\:node\:ghuang-test6-ha-ocp-node-42q26tq1.test.com.kubeconfig 
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM1akNDQWRDZ0F3SUJBZ0lCQVRBTEJna3Foa2lHOXcwQkFRc3dKakVrTUNJR0ExVUVBd3diYjNCbGJuTm8KYVdaMExYTnBaMjVsY2tBeE5EYzNNRFEwTURreE1CNFhEVEUyTVRBeU1URXdNREV6TWxvWERUSXhNVEF5TURFdwpNREV6TTFvd0pqRWtNQ0lHQTFVRUF3d2JiM0JsYm5Ob2FXWjBMWE5wWjI1bGNrQXhORGMzTURRME1Ea3hNSUlCCklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUFzWDlKbVdpWDVMcUVkRXRyZjVVU3R3VzMKWTBEa2hhZGFPbC9FWmRjRTVKbFdveGJCRGJweWQzOC9XNW1EWTl0cGgvTUlpdkxJU054b1ZtVTQwSXBMK0EwQgoyNEcrR0RQK25lMlR3Y1JkSS83RXk1bVhETER3cWlvN1RXMlBXY2lGRWdyQTNheUJhbWh3N1BiazVhZ21ESWE2ClFvaU1rVHlNOHViZy9XSy9TS0crMHkzQklGelE4Rk8rWlVtK25QNmMwVnl0NEVXdW5WS0RKc3RIZlp0eVJjUWgKTFlSWXI3UDl6bFp3NlRtN0tTTHVnOXk5WVJGKzA3L0VaMFU1a254MEhnVFFTK2ZERHk1S2I1eUVnbWtGRit4ZgpwT2tuWFhIb1FVSXlUT1I0T1BSSG5yQ25DQWZmb3I4K05LWElUcWJlQkNOQ2lqc3Z2Q3M5YzNxRGw0MkRTUUlECkFRQUJveU13SVRBT0JnTlZIUThCQWY4RUJBTUNBS1F3RHdZRFZSMFRBUUgvQkFVd0F3RUIvekFMQmdrcWhraUcKOXcwQkFRc0RnZ0VCQUNyZ3RXS3hmR25Da3g0RlVvM2xnc0doRnpwUCtKUzY4aU03cVJOWC83eXcwRUp0RWYrcgpxZXRJVjNRempJTi92dnlLNWhLUTU3OUd0TjYrcWdBMHNMa1J1TGIwSnJpTUh3bGdTN3dJY29IQUhxUHd2eG1xCjRTbWlidzZEY3J3YkZ1QWtqRTdRS0gwRGM1NWxzbWkrWEZnRHkzVlJhbm16NW1HbzJXaWZvblNDcjJCL2JQLysKMTVVcG5HOHlLMS9uYVJyS2tvL0xnTEpXcm9pb0QvbE11eTdPNFJOQlp0eFB6S29rY1MvNGpUdUxkK0Qya1Q2TgpkWFZGM3RidE1HMzBVVkRmZHMwS0ZFRDVScHYycjFGVE1WZ2NMaHJteFJzR0YzRHpZS0IwN2lycHZ4anRDb2VECjhsdmJyR29ROTl6RHNkYTdnSTZQZTlIcEdqUWVpZjBYeWRVPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    server: https://ghuang-test6-ha-ocp-master-0.test.com:8443
  name: ghuang-test6-ha-ocp-master-0-test-com:8443
contexts:
- context:
    cluster: ghuang-test6-ha-ocp-master-0-test-com:8443
    namespace: default
    user: system:node:ghuang-test6-ha-ocp-node-42q26tq1.test.com/ghuang-test6-ha-ocp-master-0-test-com:8443
  name: default/ghuang-test6-ha-ocp-master-0-test-com:8443/system:node:ghuang-test6-ha-ocp-node-42q26tq1.test.com
current-context: default/ghuang-test6-ha-ocp-master-0-test-com:8443/system:node:ghuang-test6-ha-ocp-node-42q26tq1.test.com
kind: Config
preferences: {}
users:
- name: system:node:ghuang-test6-ha-ocp-node-42q26tq1.test.com/ghuang-test6-ha-ocp-master-0-test-com:8443
  user:
    client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURLakNDQWhTZ0F3SUJBZ0lCRlRBTEJna3Foa2lHOXcwQkFRc3dKakVrTUNJR0ExVUVBd3diYjNCbGJuTm8KYVdaMExYTnBaMjVsY2tBeE5EYzNNRFEwTURreE1CNFhEVEUyTVRBeU1URXdNRFl6TlZvWERURTRNVEF5TVRFdwpNRFl6Tmxvd1dERVZNQk1HQTFVRUNoTU1jM2x6ZEdWdE9tNXZaR1Z6TVQ4d1BRWURWUVFERXpaemVYTjBaVzA2CmJtOWtaVHBuYUhWaGJtY3RkR1Z6ZERZdGFHRXRiMk53TFc1dlpHVXROREp4TWpaMGNURXVkR1Z6ZEM1amIyMHcKZ2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLQW9JQkFRQ3JMN3k5eEQ5OHhVT3FrNDZIQlhSMgp3S1VyN25meHAwWXJZellKdVBWV1FYT25jS0lKNnpubWxDTStKSzZ4QkFXMWF2U2lwNWxoN1FtVFcyNXZQK25TCldEanJ5TmpJaUJEbXI4eDE1RUtFSDQ0bVIwYVQ1YkRlSzh0TTNicE1jTjBEWllnM2RockF2d0xXdmJBZU9vakEKTjRWRUcxVmtUWFE3L2JsMHJYQ3kvNnpHT1ZWNHFxRTZ4aE8vcXBweFkwUHZORDM0MEdYQmtGUVM5QW9lSzk2eApPMG1zOUllL3liTjExSjFGVGhha01IS2FGNGNLdHN1b1FhSjNHSWZvZjV4NWVXcWNaN1o3RGVsS2tIYVlkYk92CjhON3RkdS9BNUtwQUpTRVI0YW5nVTZvcHhaSHJMNk4yOC9GUWxZZGRVQmV6YjI0bkhQdmJSWVVUK0dOaUdxdjUKQWdNQkFBR2pOVEF6TUE0R0ExVWREd0VCL3dRRUF3SUFvREFUQmdOVkhTVUVEREFLQmdnckJnRUZCUWNEQWpBTQpCZ05WSFJNQkFmOEVBakFBTUFzR0NTcUdTSWIzRFFFQkN3T0NBUUVBVndQRHg4dWljQUZEeUNpTEhwNG5WMUZnCnBwczFRSTFvSGZiM1lrL1VsYzFESWtGQzVRUStyR0RjWEs1TXJlZDdrMEFDdjJlZ0djM0t1OWdpKytRcC9UTHEKcmd3bTh0dllMZ1lQblNtUjNLMkJhbW9EaHBBS2ZFbmZ4bU1KaUpWTzcraFpZL3R6RElmMTBOOTVmcWpEZzBXZQo0MjJVL1Z5UmJ2cTlkaXVjTGpVNjYzU3dqb3J4YkdwUnU5bXlwTEtQYmdoSHFzQlpoSHlzZVF5Vk42UUF5dmJ0CjdRNkdIYXplUTZObnl0U09VNTFoUVVTaElwZ0tUcmEvMU9ZOVhCRkFyaGtaTEFlQ2hwM0tpRlo5L1FHYlZNRlUKeUpZL3ViYU1IUE53MENTaTB3VG5CZUlCZ0thaFJVREEzZFNwQ1BudG1DZi9XREgwREJHc241YkIxaVRhVVE9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
    client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb2dJQkFBS0NBUUVBcXkrOHZjUS9mTVZEcXBPT2h3VjBkc0NsSys1MzhhZEdLMk0yQ2JqMVZrRnpwM0NpCkNlczU1cFFqUGlTdXNRUUZ0V3Iwb3FlWlllMEprMXR1YnovcDBsZzQ2OGpZeUlnUTVxL01kZVJDaEIrT0prZEcKaytXdzNpdkxUTjI2VEhEZEEyV0lOM1lhd0w4QzFyMndIanFJd0RlRlJCdFZaRTEwTy8yNWRLMXdzditzeGpsVgplS3FoT3NZVHY2cWFjV05EN3pROStOQmx3WkJVRXZRS0hpdmVzVHRKclBTSHY4bXpkZFNkUlU0V3BEQnltaGVICkNyYkxxRUdpZHhpSDZIK2NlWGxxbkdlMmV3M3BTcEIybUhXenIvRGU3WGJ2d09TcVFDVWhFZUdwNEZPcUtjV1IKNnkramR2UHhVSldIWFZBWHMyOXVKeHo3MjBXRkUvaGpZaHFyK1FJREFRQUJBb0lCQUNrN09VR1h5QmJjU0gwSQpSMWI4R0Y0VjduS1RZRzVpOU1LMGhhcDMweGV3Y2hQTlRDb0pid3U3ZUhXYVRqMHlrOUZyYm5yUzFWM0J3d0dzCkR3QmFxNDNQVS81dWhOQmYvWG9pczZOZGxDdlFrZU5rWFhwMzQwN1B5NHE3Q1FrcVVnRmtiaGUxcWFIdEg5anIKSFVWYW9kOXlQL1gwZzIvQ1BCSEsvZVU5ZFJ5WGxTeUpFMXJmdndDUSs2RHZOOFNmNmpKcDQ5aHFpNkZOSWJncQpHeGpPN1FVZUFOZ3dMTFZnd1FETDE1eTJLeTg2ZnJMRmdlYXpiSTBPWFVRaW9QWU1pMDhVS1BOeVpvMm5IL1VxCmg5QjROR3FGUUZXWXNWcHljMCtFUjJOa2RwcU9oOUw0dEVmWllnTk9wWlNrblZIbDI4QlpYOUNZZGJRRnN4SksKb3RiQ0xCVUNnWUVBd0wzV1J3eWtNNzVVclpBSkVSZUZVN2JsWnFMOXpHQkpZcURQaHFSSWVad1RmM3hJRjc2RQpKS2FHL1VCcWwyY2ZhUFBNUjRCUjdEeHZtNGNWNVJ1TmZseW1SamVSdWdVckE3RzRWTjcvbkdTTzhPazh5SW5uCmJYeDNpR0MrSVhIcTRnNTdwODNqRlo5UmRtc2E1OW5RNjhvU2VvUllwZzhBSlJrbWtIQ3dRd2NDZ1lFQTQxN1gKTmY2WjRoSmNtc2c2WE9aM0hwMDI2bDdBbk5wbGYxVHJ0dExQa0NnaFNYdEdsekhhWmxwMG4yQ3JZbnBMWC9EYwpVcWg5UG9QcjhDalN5cDd0WkhTenZtZmM2VVlZNkt4NTdBOHdYem0xM0NGdmFlYVJVOE1OWVk1ckt1YjBlZ3FqCi9CdDg4MXI1RmVzN3NyazlkbERwbTIxTzVYTFNsVGt1cTJXMjJQOENnWUEwMzE2Nm10TW9ocHZBQ1BVVHhUb0QKM3ZaTEU0Yy8yMklHTmtyM2luVi9OcnQ2aTJOVGNDWGJ6L3JUMmluallweVJNOS9qOVdXRHdvaHpSN2xQNGlFTQpldW41OVNCNndSUXRyVUQ5dHphemRqcG9CL051cDdYZXFQZzVaeUNCR0Rqd3pqeEpxZ2NUVldNSmN4UXNhZW9QCjVKenhFd0VtZkpMem1sU2o1dVhUWFFLQmdDS1B4aEwxRXBza3cyTGIwTk5TVFFVZ1RMcXZrSVBIUnVwbUZEYUUKTVB6dXZMQ1l4cEF4Q2N2Sk1EVVIwcnR6YjRXejdTbTdadDViMno5MFZTWnJwaFpCRHhtQVhEb3haNVBtczluSQpMVWdzVTVLVW1vVDBnVjdFSllLUXpZV0YrZCtiUW5ZT0Q1NUdVOXFiR1VYL2xuSW50bnJqMEx4Y0NkcVpDSmtSCkt3d3RBb0dBYWZ1dXQra0dYOHE1d09wMUdJVjIva3RwTXk1ekFlNUZJQUhsSjlMTERqZTl2Z3FDZHJPaHpxTHQKWFhLc3g0Um9mSlViRmovL3E2dTZldUFXWHVmQi9SazUrLzVLVkVTa2lrb3dYTUVOeUJwWlNqcmlmZVM3TjRvNQpIUUVnN2ZGM0F4QUptWDNzT0pJeWJQWDlJMjlEUGxyMHphU1ZmdjRsU25SdmtGaktBdnc9Ci0tLS0tRU5EIFJTQSBQUklWQVRFIEtFWS0tLS0tCg==


Recover the first master, openshift continue working.

Expected results:


Additional info:

Comment 1 Jan Provaznik 2016-10-25 18:10:00 UTC
Fixed in this PR:
https://github.com/redhat-openstack/openshift-on-openstack/pull/294

Comment 2 Jan Provaznik 2016-10-28 17:55:35 UTC
Fixed in 0.9.5

Comment 3 Gan Huang 2016-11-03 10:37:54 UTC
verified with v0.9.5

1. Create a stack which was using external loadbalancer (the hostname can be resloved via the dns nameserver)

2. app can be created successfully, and route can be accessed

3. Shutdown the first master after creating the stack

4. app can be created successfully, and route can be accessed

5. Scaling up a node

6. app can be created successfully, and route can be accessed

7. Scaling down a node

8. app can be created successfully, and route can be accessed

9. recover the first master

10. app can be created successfully, and route can be accessed


Note You need to log in before you can comment on or make changes to this bug.