Bug 1311038 - Couldn't deploy pod on HA master env when not the first_master is working
Couldn't deploy pod on HA master env when not the first_master is working
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.2.0
Unspecified Unspecified
high Severity high
: ---
: ---
Assigned To: Andrew Butcher
Ma xiaoqiang
: Regression, TestBlocker
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-23 03:49 EST by Gaoyun Pei
Modified: 2016-05-12 12:38 EDT (History)
7 users (show)

See Also:
Fixed In Version: openshift-ansible-3.0.47-1.git.59.b3c4104.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-12 12:38:11 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gaoyun Pei 2016-02-23 03:49:19 EST
Description of problem:
When deploy pod on a native HA master env, the deployer pod failed to get running with error "the server has asked for the client to provide credentials".


Version-Release number of selected component (if applicable):
https://github.com/openshift/openshift-ansible master
AtomicOpenShift/3.2/2016-02-22.3

How reproducible:
Always

Steps to Reproduce:
1.Set up a native HA master env of ose-3.2 with ansible plabook
2.After installation, check the router pod 

Actual results:
[root@openshift-126 ~]# oc get pod
NAME                       READY     STATUS             RESTARTS   AGE
router-1-deploy            0/1       Error              0          1h
[root@openshift-126 ~]# oc logs router-1-deploy
F0223 01:54:43.693355       1 deployer.go:69] couldn't get deployment default/router-1: the server has asked for the client to provide credentials (get replicationControllers router-1)

Tried re-deploy router, still got the same error.


Expected results:
Should deploy router successfully.

Additional info:
It works well on single master env.
Comment 5 Gaoyun Pei 2016-02-26 03:34:36 EST
Seems only the first master works well in an HA master env, which is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1245176, caused by incorrect master certificates on the other master. Noticed that the masters' certificates were generated differently after commit dc8938e01202db0464e54becf4812c3191ce2d51 was merged.

So when stop atomic-openshift-master-controllers service on the first master, then try to deploy docker-registry pod, would get the error in Comment 1
Comment 6 Andrew Butcher 2016-02-26 14:38:25 EST
Proposed fix: https://github.com/openshift/openshift-ansible/pull/1506
Comment 8 Gaoyun Pei 2016-02-29 02:26:21 EST
Verify this bug with openshift-ansible-3.0.47-1.git.59.b3c4104.el7.noarch

After installing a native HA master env using the openshift-ansible, docker-registry and router pod both could be deployed.

The following test scenarios all passed.
Stop the atomic-openshift-master-controllers service on first master, re-deploy the pods
Change the controllers lease back to the first master, re-deploy the pods
Comment 10 errata-xmlrpc 2016-05-12 12:38:11 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1065

Note You need to log in before you can comment on or make changes to this bug.