Bug 1306011

Summary:	Deployer pods incorrectly using the host entry from openshiftLoopbackKubeconfig
Product:	OpenShift Container Platform	Reporter:	Jason DeTiberus <jdetiber>
Component:	openshift-controller-manager	Assignee:	Dan Mace <dmace>
Status:	CLOSED ERRATA	QA Contact:	Johnny Liu <jialiu>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.1.0	CC:	aos-bugs, dmace, dsafford, jdetiber, sdodson, tdawson
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	v3.1.1.902	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-05-12 16:28:35 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jason DeTiberus 2016-02-09 20:16:15 UTC

Description of problem:
When a multi-master deployment has updated the context of the openshiftLoopbackKubeconfig to avoid traversing the load-balancer deployer pods attempt to contact the master using the updated hostname.

Version-Release number of selected component (if applicable):
3.1.1.6

How reproducible:
Everytime

Steps to Reproduce:
1. Block node -> master traffic on port 8443
2. Install a native HA install using openshift-ansible
3. Update /etc/origin/master/openshift-master.kubeconfig with a new cluster, context and updating the current-context for directly connecting to itself rather than the clustered hostname.
4. Initiate a deployment

Actual results:
deployer pod fails attempting to contact the master directly rather than through the load balancer, causing the request to fail/timeout.

Expected results:
The deployer pod should contact the master over the internal clustered hostname and the deployment should succeed.

Additional info:

Comment 1 Jason DeTiberus 2016-02-09 20:19:01 UTC

I discussed this with deads2k and liggitt on #openshift-dev about this. The deployer pod should be using a serviceaccount and dns for config rather than the ENV variables injected from the openshiftLoopbackKubeconfig like it currently is.The deployment should be successful

Comment 2 Dan Mace 2016-02-10 14:21:04 UTC

I agree with David and Jordan's assessment. The cited deployer code predates the user of service accounts and was never updated when SAs were introduced.

Comment 3 Dan Mace 2016-02-10 18:47:52 UTC

https://github.com/openshift/origin/pull/7197

Comment 4 zhou ying 2016-02-15 06:08:47 UTC

Since this bug is OSE's bug, could you please provide the puddle or package version. thanks.

Comment 5 Dan Mace 2016-02-15 19:54:50 UTC

Still waiting for https://github.com/openshift/origin/pull/7197 to be incorporated into OSE. Once I have a tag containing the fix, I'll update the bz and clear the needinfo. Thanks!

Comment 7 Jason DeTiberus 2016-02-19 22:01:36 UTC

Steps for updating the context:

> oc config view

Take note of the name value of the user listed under users, it will start with 'system:admin/'

> oc config set-cluster --certificate-authority=/etc/origin/master/ca.crt \
  --embed-certs=true --server=https://<openshift_hostname>:<api port> \
  my_loopback_cluster --config=/etc/origin/master/openshift-master.kubeconfig

> oc config set-context --cluster=my_loopback_cluster --namespace=default \
  --user=<system admin name> my_loopback_context \
  --config=/etc/origin/master/openshift-master.kubeconfig

> oc config use-context my_loopback_context \
  --config=/etc/origin/master/openshift-master.kubeconfig

Comment 16 errata-xmlrpc 2016-05-12 16:28:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064