Bug 1467602

Summary: installer only update master-config.yaml on the 1st master when deploying service catatlog
Product: OpenShift Container Platform Reporter: Johnny Liu <jialiu>
Component: InstallerAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.0CC: aos-bugs, jokerman, mmccomas, sdodson, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
This bug originates from a new feature (service catalog deployment) in 3.6
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-10 05:29:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Johnny Liu 2017-07-04 10:27:56 UTC
Description of problem:
See the following details.

Version-Release number of selected component (if applicable):
openshift-ansible-3.6.132-1.git.0.0d0f54a.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. prepare inventory host file, 3 master + 2 node + enable service catalog
2. trigger installation
3.

Actual results:
installer failed at the following task:
<--snip-->
TASK [openshift_service_catalog : Create api service] **************************
Tuesday 04 July 2017  09:40:02 +0000 (0:00:00.722)       0:43:14.729 ********** 

fatal: [qe-jialiu-master-etcd-1.0704-w2m.qe.rhcloud.com]: FAILED! => {
    "changed": false, 
    "failed": true
}

MSG:

{u'returncode': 1, u'cmd': u'/usr/bin/oc create -f /tmp/apisvcout-u8ycgn -n kube-service-catalog', u'results': {}, u'stderr': u'error: unable to recognize "/tmp/apisvcout-u8ycgn": no matches for apiregistration.k8s.io/, Kind=APIService\n', u'stdout': u''}
<--snip-->

That is because in this multiple masters HA env, installer only update the 1st master, then restart it, once the 1st master is restarted, the other passive master will take over the job as the active master. But the new active master does not set aggregatorConfig stuff.

# diff the-2nd-master-config.yaml the-1st-master-config.yaml
> aggregatorConfig:
>   proxyClientInfo:
>     certFile: aggregator-front-proxy.crt
>     keyFile: aggregator-front-proxy.key


> authConfig:
>   requestHeader:
>     clientCA: front-proxy-ca.crt
>     clientCommonNames:
>     - aggregator-front-proxy
>     extraHeaderPrefixes:
>     - X-Remote-Extra-
>     groupHeaders:
>     - X-Remote-Group
>     usernameHeaders:
>     - X-Remote-User

In playbooks/common/openshift-cluster/service_catalog.yml:
- name: Service Catalog
  hosts: oo_first_master
  roles:
  - openshift_service_catalog
  - ansible_service_broker

Obviously in service catalog deployment, all the playbooks are running on the 1st master, especially wire_aggregator.yml, this is not enough.

Expected results:
installer should deploy service catalog successfully in multiple masters HA env.

Additional info:

Comment 3 Johnny Liu 2017-07-07 11:25:32 UTC
Currently the latest puddle QE get is AtomicOpenShift/3.6/2017-07-07.2, but its openshift-ansible version openshift-ansible-3.6.126.14-1.git.0.efd80ab.el7, does not have this PR merged.

Comment 5 Johnny Liu 2017-07-10 10:19:07 UTC
Re-test this bug with openshift-ansible-roles-3.6.138-1.git.0.2c647a9.el7.noarch, FAIL.


On the 2nd master, api is failed to be restarted with the following error:
TASK [openshift_service_catalog : restart master api] **************************
fatal: [openshift-141.lab.sjc.redhat.com]: FAILED! => {
    "changed": false, 
    "failed": true
}

MSG:

Unable to restart service atomic-openshift-master-api: Job for atomic-openshift-master-api.service failed because the control process exited with error code. See "systemctl status atomic-openshift-master-api.service" and "journalctl -xe" for details.


api log:
Jul 10 06:03:42 openshift-141.lab.sjc.redhat.com atomic-openshift-master-api[46245]: F0710 06:03:42.134499   46245 start_api.go:67] Error building front proxy auth config: error reading /etc/origin/master/front-proxy-ca.crt: read /etc/origin/master/front-proxy-ca.crt: is a directory


# pwd
/etc/origin/master

# ll
<--snip-->
drwxr-xr-x. 3 root root     40 Jul 10 05:50 aggregator-front-proxy.crt
drwxr-xr-x. 3 root root     40 Jul 10 05:50 aggregator-front-proxy.key
drwxr-xr-x. 3 root root     47 Jul 10 05:50 aggregator-front-proxy.kubeconfig
drwxr-xr-x. 3 root root     32 Jul 10 05:50 front-proxy-ca.crt
drwxr-xr-x. 3 root root     32 Jul 10 05:50 front-proxy-ca.key
<--snip-->


# tree aggregator-front-proxy.crt
aggregator-front-proxy.crt
└── aggregator-front-proxy.crt
    └── openshift-141.lab.sjc.redhat.com
        └── etc
            └── origin
                └── master
                    └── aggregator-front-proxy.crt

5 directories, 1 file

# tree front-proxy-ca.crt
front-proxy-ca.crt
└── front-proxy-ca.crt
    └── openshift-141.lab.sjc.redhat.com
        └── etc
            └── origin
                └── master
                    └── front-proxy-ca.crt

5 directories, 1 file

Comment 10 Johnny Liu 2017-07-14 10:04:14 UTC
Verified this bug with openshift-ansible-3.6.144-1.git.0.50e12bf.el7.noarch, and PASS.

service catalog are deployed successfully on multiple-master HA cluster.

Comment 12 errata-xmlrpc 2017-08-10 05:29:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716