Description of problem: See the following details. Version-Release number of selected component (if applicable): openshift-ansible-3.6.132-1.git.0.0d0f54a.el7.noarch How reproducible: Always Steps to Reproduce: 1. prepare inventory host file, 3 master + 2 node + enable service catalog 2. trigger installation 3. Actual results: installer failed at the following task: <--snip--> TASK [openshift_service_catalog : Create api service] ************************** Tuesday 04 July 2017 09:40:02 +0000 (0:00:00.722) 0:43:14.729 ********** fatal: [qe-jialiu-master-etcd-1.0704-w2m.qe.rhcloud.com]: FAILED! => { "changed": false, "failed": true } MSG: {u'returncode': 1, u'cmd': u'/usr/bin/oc create -f /tmp/apisvcout-u8ycgn -n kube-service-catalog', u'results': {}, u'stderr': u'error: unable to recognize "/tmp/apisvcout-u8ycgn": no matches for apiregistration.k8s.io/, Kind=APIService\n', u'stdout': u''} <--snip--> That is because in this multiple masters HA env, installer only update the 1st master, then restart it, once the 1st master is restarted, the other passive master will take over the job as the active master. But the new active master does not set aggregatorConfig stuff. # diff the-2nd-master-config.yaml the-1st-master-config.yaml > aggregatorConfig: > proxyClientInfo: > certFile: aggregator-front-proxy.crt > keyFile: aggregator-front-proxy.key > authConfig: > requestHeader: > clientCA: front-proxy-ca.crt > clientCommonNames: > - aggregator-front-proxy > extraHeaderPrefixes: > - X-Remote-Extra- > groupHeaders: > - X-Remote-Group > usernameHeaders: > - X-Remote-User In playbooks/common/openshift-cluster/service_catalog.yml: - name: Service Catalog hosts: oo_first_master roles: - openshift_service_catalog - ansible_service_broker Obviously in service catalog deployment, all the playbooks are running on the 1st master, especially wire_aggregator.yml, this is not enough. Expected results: installer should deploy service catalog successfully in multiple masters HA env. Additional info:
Currently the latest puddle QE get is AtomicOpenShift/3.6/2017-07-07.2, but its openshift-ansible version openshift-ansible-3.6.126.14-1.git.0.efd80ab.el7, does not have this PR merged.
Re-test this bug with openshift-ansible-roles-3.6.138-1.git.0.2c647a9.el7.noarch, FAIL. On the 2nd master, api is failed to be restarted with the following error: TASK [openshift_service_catalog : restart master api] ************************** fatal: [openshift-141.lab.sjc.redhat.com]: FAILED! => { "changed": false, "failed": true } MSG: Unable to restart service atomic-openshift-master-api: Job for atomic-openshift-master-api.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master-api.service" and "journalctl -xe" for details. api log: Jul 10 06:03:42 openshift-141.lab.sjc.redhat.com atomic-openshift-master-api[46245]: F0710 06:03:42.134499 46245 start_api.go:67] Error building front proxy auth config: error reading /etc/origin/master/front-proxy-ca.crt: read /etc/origin/master/front-proxy-ca.crt: is a directory # pwd /etc/origin/master # ll <--snip--> drwxr-xr-x. 3 root root 40 Jul 10 05:50 aggregator-front-proxy.crt drwxr-xr-x. 3 root root 40 Jul 10 05:50 aggregator-front-proxy.key drwxr-xr-x. 3 root root 47 Jul 10 05:50 aggregator-front-proxy.kubeconfig drwxr-xr-x. 3 root root 32 Jul 10 05:50 front-proxy-ca.crt drwxr-xr-x. 3 root root 32 Jul 10 05:50 front-proxy-ca.key <--snip--> # tree aggregator-front-proxy.crt aggregator-front-proxy.crt └── aggregator-front-proxy.crt └── openshift-141.lab.sjc.redhat.com └── etc └── origin └── master └── aggregator-front-proxy.crt 5 directories, 1 file # tree front-proxy-ca.crt front-proxy-ca.crt └── front-proxy-ca.crt └── openshift-141.lab.sjc.redhat.com └── etc └── origin └── master └── front-proxy-ca.crt 5 directories, 1 file
Verified this bug with openshift-ansible-3.6.144-1.git.0.50e12bf.el7.noarch, and PASS. service catalog are deployed successfully on multiple-master HA cluster.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716