Bug 1311038 - Couldn't deploy pod on HA master env when not the first_master is working
Summary: Couldn't deploy pod on HA master env when not the first_master is working
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Andrew Butcher
QA Contact: Ma xiaoqiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-23 08:49 UTC by Gaoyun Pei
Modified: 2016-05-12 16:38 UTC (History)
7 users (show)

Fixed In Version: openshift-ansible-3.0.47-1.git.59.b3c4104.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-12 16:38:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1065 0 normal SHIPPED_LIVE Red Hat OpenShift Enterprise atomic-openshift-utils bug fix update 2016-05-12 20:32:56 UTC

Description Gaoyun Pei 2016-02-23 08:49:19 UTC
Description of problem:
When deploy pod on a native HA master env, the deployer pod failed to get running with error "the server has asked for the client to provide credentials".


Version-Release number of selected component (if applicable):
https://github.com/openshift/openshift-ansible master
AtomicOpenShift/3.2/2016-02-22.3

How reproducible:
Always

Steps to Reproduce:
1.Set up a native HA master env of ose-3.2 with ansible plabook
2.After installation, check the router pod 

Actual results:
[root@openshift-126 ~]# oc get pod
NAME                       READY     STATUS             RESTARTS   AGE
router-1-deploy            0/1       Error              0          1h
[root@openshift-126 ~]# oc logs router-1-deploy
F0223 01:54:43.693355       1 deployer.go:69] couldn't get deployment default/router-1: the server has asked for the client to provide credentials (get replicationControllers router-1)

Tried re-deploy router, still got the same error.


Expected results:
Should deploy router successfully.

Additional info:
It works well on single master env.

Comment 5 Gaoyun Pei 2016-02-26 08:34:36 UTC
Seems only the first master works well in an HA master env, which is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1245176, caused by incorrect master certificates on the other master. Noticed that the masters' certificates were generated differently after commit dc8938e01202db0464e54becf4812c3191ce2d51 was merged.

So when stop atomic-openshift-master-controllers service on the first master, then try to deploy docker-registry pod, would get the error in Comment 1

Comment 6 Andrew Butcher 2016-02-26 19:38:25 UTC
Proposed fix: https://github.com/openshift/openshift-ansible/pull/1506

Comment 8 Gaoyun Pei 2016-02-29 07:26:21 UTC
Verify this bug with openshift-ansible-3.0.47-1.git.59.b3c4104.el7.noarch

After installing a native HA master env using the openshift-ansible, docker-registry and router pod both could be deployed.

The following test scenarios all passed.
Stop the atomic-openshift-master-controllers service on first master, re-deploy the pods
Change the controllers lease back to the first master, re-deploy the pods

Comment 10 errata-xmlrpc 2016-05-12 16:38:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1065


Note You need to log in before you can comment on or make changes to this bug.