1467602 – installer only update master-config.yaml on the 1st master when deploying service catatlog

Bug 1467602 - installer only update master-config.yaml on the 1st master when deploying service catatlog

Summary: installer only update master-config.yaml on the 1st master when deploying ser...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	ewolinet
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-04 10:27 UTC by Johnny Liu
Modified:	2017-08-16 19:51 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	This bug originates from a new feature (service catalog deployment) in 3.6
Clone Of:
Environment:
Last Closed:	2017-08-10 05:29:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2017:1716	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.6 RPM Release Advisory	2017-08-10 09:02:50 UTC

Description Johnny Liu 2017-07-04 10:27:56 UTC

Description of problem:
See the following details.

Version-Release number of selected component (if applicable):
openshift-ansible-3.6.132-1.git.0.0d0f54a.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. prepare inventory host file, 3 master + 2 node + enable service catalog
2. trigger installation
3.

Actual results:
installer failed at the following task:
<--snip-->
TASK [openshift_service_catalog : Create api service] **************************
Tuesday 04 July 2017  09:40:02 +0000 (0:00:00.722)       0:43:14.729 ********** 

fatal: [qe-jialiu-master-etcd-1.0704-w2m.qe.rhcloud.com]: FAILED! => {
    "changed": false, 
    "failed": true
}

MSG:

{u'returncode': 1, u'cmd': u'/usr/bin/oc create -f /tmp/apisvcout-u8ycgn -n kube-service-catalog', u'results': {}, u'stderr': u'error: unable to recognize "/tmp/apisvcout-u8ycgn": no matches for apiregistration.k8s.io/, Kind=APIService\n', u'stdout': u''}
<--snip-->

That is because in this multiple masters HA env, installer only update the 1st master, then restart it, once the 1st master is restarted, the other passive master will take over the job as the active master. But the new active master does not set aggregatorConfig stuff.

# diff the-2nd-master-config.yaml the-1st-master-config.yaml
> aggregatorConfig:
>   proxyClientInfo:
>     certFile: aggregator-front-proxy.crt
>     keyFile: aggregator-front-proxy.key


> authConfig:
>   requestHeader:
>     clientCA: front-proxy-ca.crt
>     clientCommonNames:
>     - aggregator-front-proxy
>     extraHeaderPrefixes:
>     - X-Remote-Extra-
>     groupHeaders:
>     - X-Remote-Group
>     usernameHeaders:
>     - X-Remote-User

In playbooks/common/openshift-cluster/service_catalog.yml:
- name: Service Catalog
  hosts: oo_first_master
  roles:
  - openshift_service_catalog
  - ansible_service_broker

Obviously in service catalog deployment, all the playbooks are running on the 1st master, especially wire_aggregator.yml, this is not enough.

Expected results:
installer should deploy service catalog successfully in multiple masters HA env.

Additional info:

Comment 3 Johnny Liu 2017-07-07 11:25:32 UTC

Currently the latest puddle QE get is AtomicOpenShift/3.6/2017-07-07.2, but its openshift-ansible version openshift-ansible-3.6.126.14-1.git.0.efd80ab.el7, does not have this PR merged.

Comment 5 Johnny Liu 2017-07-10 10:19:07 UTC

Re-test this bug with openshift-ansible-roles-3.6.138-1.git.0.2c647a9.el7.noarch, FAIL.


On the 2nd master, api is failed to be restarted with the following error:
TASK [openshift_service_catalog : restart master api] **************************
fatal: [openshift-141.lab.sjc.redhat.com]: FAILED! => {
    "changed": false, 
    "failed": true
}

MSG:

Unable to restart service atomic-openshift-master-api: Job for atomic-openshift-master-api.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master-api.service" and "journalctl -xe" for details.


api log:
Jul 10 06:03:42 openshift-141.lab.sjc.redhat.com atomic-openshift-master-api[46245]: F0710 06:03:42.134499   46245 start_api.go:67] Error building front proxy auth config: error reading /etc/origin/master/front-proxy-ca.crt: read /etc/origin/master/front-proxy-ca.crt: is a directory


# pwd
/etc/origin/master

# ll
<--snip-->
drwxr-xr-x. 3 root root     40 Jul 10 05:50 aggregator-front-proxy.crt
drwxr-xr-x. 3 root root     40 Jul 10 05:50 aggregator-front-proxy.key
drwxr-xr-x. 3 root root     47 Jul 10 05:50 aggregator-front-proxy.kubeconfig
drwxr-xr-x. 3 root root     32 Jul 10 05:50 front-proxy-ca.crt
drwxr-xr-x. 3 root root     32 Jul 10 05:50 front-proxy-ca.key
<--snip-->


# tree aggregator-front-proxy.crt
aggregator-front-proxy.crt
└── aggregator-front-proxy.crt
    └── openshift-141.lab.sjc.redhat.com
        └── etc
            └── origin
                └── master
                    └── aggregator-front-proxy.crt

5 directories, 1 file

# tree front-proxy-ca.crt
front-proxy-ca.crt
└── front-proxy-ca.crt
    └── openshift-141.lab.sjc.redhat.com
        └── etc
            └── origin
                └── master
                    └── front-proxy-ca.crt

5 directories, 1 file

Comment 10 Johnny Liu 2017-07-14 10:04:14 UTC

Verified this bug with openshift-ansible-3.6.144-1.git.0.50e12bf.el7.noarch, and PASS.

service catalog are deployed successfully on multiple-master HA cluster.

Comment 12 errata-xmlrpc 2017-08-10 05:29:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716

Note You need to log in before you can comment on or make changes to this bug.