Bug 1306011

Summary: Deployer pods incorrectly using the host entry from openshiftLoopbackKubeconfig
Product: OpenShift Container Platform Reporter: Jason DeTiberus <jdetiber>
Component: openshift-controller-managerAssignee: Dan Mace <dmace>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.0CC: aos-bugs, dmace, dsafford, jdetiber, sdodson, tdawson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v3.1.1.902 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-12 16:28:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jason DeTiberus 2016-02-09 20:16:15 UTC
Description of problem:
When a multi-master deployment has updated the context of the openshiftLoopbackKubeconfig to avoid traversing the load-balancer deployer pods attempt to contact the master using the updated hostname.

Version-Release number of selected component (if applicable):
3.1.1.6

How reproducible:
Everytime

Steps to Reproduce:
1. Block node -> master traffic on port 8443
2. Install a native HA install using openshift-ansible
3. Update /etc/origin/master/openshift-master.kubeconfig with a new cluster, context and updating the current-context for directly connecting to itself rather than the clustered hostname.
4. Initiate a deployment

Actual results:
deployer pod fails attempting to contact the master directly rather than through the load balancer, causing the request to fail/timeout.

Expected results:
The deployer pod should contact the master over the internal clustered hostname and the deployment should succeed.

Additional info:

Comment 1 Jason DeTiberus 2016-02-09 20:19:01 UTC
I discussed this with deads2k and liggitt on #openshift-dev about this. The deployer pod should be using a serviceaccount and dns for config rather than the ENV variables injected from the openshiftLoopbackKubeconfig like it currently is.The deployment should be successful

Comment 2 Dan Mace 2016-02-10 14:21:04 UTC
I agree with David and Jordan's assessment. The cited deployer code predates the user of service accounts and was never updated when SAs were introduced.

Comment 3 Dan Mace 2016-02-10 18:47:52 UTC
https://github.com/openshift/origin/pull/7197

Comment 4 zhou ying 2016-02-15 06:08:47 UTC
Since this bug is OSE's bug, could you please provide the puddle or package version. thanks.

Comment 5 Dan Mace 2016-02-15 19:54:50 UTC
Still waiting for https://github.com/openshift/origin/pull/7197 to be incorporated into OSE. Once I have a tag containing the fix, I'll update the bz and clear the needinfo. Thanks!

Comment 7 Jason DeTiberus 2016-02-19 22:01:36 UTC
Steps for updating the context:

> oc config view

Take note of the name value of the user listed under users, it will start with 'system:admin/'

> oc config set-cluster --certificate-authority=/etc/origin/master/ca.crt \
  --embed-certs=true --server=https://<openshift_hostname>:<api port> \
  my_loopback_cluster --config=/etc/origin/master/openshift-master.kubeconfig

> oc config set-context --cluster=my_loopback_cluster --namespace=default \
  --user=<system admin name> my_loopback_context \
  --config=/etc/origin/master/openshift-master.kubeconfig

> oc config use-context my_loopback_context \
  --config=/etc/origin/master/openshift-master.kubeconfig

Comment 16 errata-xmlrpc 2016-05-12 16:28:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064