1311038 – Couldn't deploy pod on HA master env when not the first_master is working

Bug 1311038 - Couldn't deploy pod on HA master env when not the first_master is working

Summary: Couldn't deploy pod on HA master env when not the first_master is working

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Andrew Butcher
QA Contact:	Ma xiaoqiang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-02-23 08:49 UTC by Gaoyun Pei
Modified:	2016-05-12 16:38 UTC (History)
CC List:	7 users (show)
Fixed In Version:	openshift-ansible-3.0.47-1.git.59.b3c4104.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-05-12 16:38:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1065	0	normal	SHIPPED_LIVE	Red Hat OpenShift Enterprise atomic-openshift-utils bug fix update	2016-05-12 20:32:56 UTC

Description Gaoyun Pei 2016-02-23 08:49:19 UTC

Description of problem:
When deploy pod on a native HA master env, the deployer pod failed to get running with error "the server has asked for the client to provide credentials".


Version-Release number of selected component (if applicable):
https://github.com/openshift/openshift-ansible master
AtomicOpenShift/3.2/2016-02-22.3

How reproducible:
Always

Steps to Reproduce:
1.Set up a native HA master env of ose-3.2 with ansible plabook
2.After installation, check the router pod 

Actual results:
[root@openshift-126 ~]# oc get pod
NAME                       READY     STATUS             RESTARTS   AGE
router-1-deploy            0/1       Error              0          1h
[root@openshift-126 ~]# oc logs router-1-deploy
F0223 01:54:43.693355       1 deployer.go:69] couldn't get deployment default/router-1: the server has asked for the client to provide credentials (get replicationControllers router-1)

Tried re-deploy router, still got the same error.


Expected results:
Should deploy router successfully.

Additional info:
It works well on single master env.

Comment 5 Gaoyun Pei 2016-02-26 08:34:36 UTC

Seems only the first master works well in an HA master env, which is similar to https://bugzilla.redhat.com/show_bug.cgi?id=1245176, caused by incorrect master certificates on the other master. Noticed that the masters' certificates were generated differently after commit dc8938e01202db0464e54becf4812c3191ce2d51 was merged.

So when stop atomic-openshift-master-controllers service on the first master, then try to deploy docker-registry pod, would get the error in Comment 1

Comment 6 Andrew Butcher 2016-02-26 19:38:25 UTC

Proposed fix: https://github.com/openshift/openshift-ansible/pull/1506

Comment 8 Gaoyun Pei 2016-02-29 07:26:21 UTC

Verify this bug with openshift-ansible-3.0.47-1.git.59.b3c4104.el7.noarch

After installing a native HA master env using the openshift-ansible, docker-registry and router pod both could be deployed.

The following test scenarios all passed.
Stop the atomic-openshift-master-controllers service on first master, re-deploy the pods
Change the controllers lease back to the first master, re-deploy the pods

Comment 10 errata-xmlrpc 2016-05-12 16:38:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1065

Note You need to log in before you can comment on or make changes to this bug.