Bug 1311038

Summary: Couldn't deploy pod on HA master env when not the first_master is working
Product: OpenShift Container Platform Reporter: Gaoyun Pei <gpei>
Component: InstallerAssignee: Andrew Butcher <abutcher>
Status: CLOSED ERRATA QA Contact: Ma xiaoqiang <xiama>
Severity: high Docs Contact:
Priority: high    
Version: 3.2.0CC: aos-bugs, bleanhar, ghuang, jialiu, jokerman, mmccomas, xtian
Target Milestone: ---Keywords: Regression, TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openshift-ansible-3.0.47-1.git.59.b3c4104.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-12 16:38:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Gaoyun Pei 2016-02-23 08:49:19 UTC
Description of problem:
When deploy pod on a native HA master env, the deployer pod failed to get running with error "the server has asked for the client to provide credentials".


Version-Release number of selected component (if applicable):
https://github.com/openshift/openshift-ansible master
AtomicOpenShift/3.2/2016-02-22.3

How reproducible:
Always

Steps to Reproduce:
1.Set up a native HA master env of ose-3.2 with ansible plabook
2.After installation, check the router pod 

Actual results:
[root@openshift-126 ~]# oc get pod
NAME                       READY     STATUS             RESTARTS   AGE
router-1-deploy            0/1       Error              0          1h
[root@openshift-126 ~]# oc logs router-1-deploy
F0223 01:54:43.693355       1 deployer.go:69] couldn't get deployment default/router-1: the server has asked for the client to provide credentials (get replicationControllers router-1)

Tried re-deploy router, still got the same error.


Expected results:
Should deploy router successfully.

Additional info:
It works well on single master env.

Comment 5 Gaoyun Pei 2016-02-26 08:34:36 UTC
Seems only the first master works well in an HA master env, which is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1245176, caused by incorrect master certificates on the other master. Noticed that the masters' certificates were generated differently after commit dc8938e01202db0464e54becf4812c3191ce2d51 was merged.

So when stop atomic-openshift-master-controllers service on the first master, then try to deploy docker-registry pod, would get the error in Comment 1

Comment 6 Andrew Butcher 2016-02-26 19:38:25 UTC
Proposed fix: https://github.com/openshift/openshift-ansible/pull/1506

Comment 8 Gaoyun Pei 2016-02-29 07:26:21 UTC
Verify this bug with openshift-ansible-3.0.47-1.git.59.b3c4104.el7.noarch

After installing a native HA master env using the openshift-ansible, docker-registry and router pod both could be deployed.

The following test scenarios all passed.
Stop the atomic-openshift-master-controllers service on first master, re-deploy the pods
Change the controllers lease back to the first master, re-deploy the pods

Comment 10 errata-xmlrpc 2016-05-12 16:38:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1065