Bug 1631353 - kubeconfig on master node changes context everytime playbook is run or after oc login
Summary: kubeconfig on master node changes context everytime playbook is run or after...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.1
Hardware: All
OS: All
urgent
urgent
Target Milestone: ---
: 3.9.z
Assignee: Scott Dodson
QA Contact: ge liu
URL:
Whiteboard:
: 1638757 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-20 11:58 UTC by Andre Costa
Modified: 2019-02-20 08:47 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Certain tasks did not pass --config option to the oc client. Consequence: If the ansible_user's kubeconfig had been modified it may have caused those tasks to fail. Fix: All tasks now pass --config option pointing at the proper admin kubeconfig which should never be modified. Result: Installation and upgrades in scenarios where ansible_user's kubeconfig has been modified now work as expected.
Clone Of:
Environment:
Last Closed: 2019-02-20 08:46:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
kubeconfig.files (21.22 KB, application/zip)
2018-09-20 11:58 UTC, Andre Costa
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1467775 0 high CLOSED Unable to perform initial IP allocation check for container OCP 2021-12-10 15:08:01 UTC
Red Hat Bugzilla 1631597 0 high CLOSED Fail to upgrade ocp when user not authenticated as admin 2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution) 3620131 0 None None None 2018-09-24 18:37:41 UTC
Red Hat Product Errata RHBA-2019:0331 0 None None None 2019-02-20 08:47:02 UTC

Internal Links: 1467775 1631597

Description Andre Costa 2018-09-20 11:58:37 UTC
Created attachment 1485112 [details]
kubeconfig.files

Description of problem:
Customer is using one of the masters as a bastion, to login with his cluster-admin user and running the playbooks.
During the upgrade of Service Catalog I realized that they were having an issue with kubeconfig with some of the tasks that rely on this file to have the system:admin user, was failing getting system:anonymous not being able to apply or change objects.
Deleted kubeconfig and recreated several times but saw that every time he did oc login or ran a playbook the kubeconfig was getting populated automatically with a custom (created by the customer) serviceaccount from logging project.
I have checked sosreport and after some tests I can't see what might be causing and I'm not able to replicate this.

Version-Release number of selected component (if applicable):
ocp v3.7.57

How reproducible:
Everytime oc login or ansible-playbook is used (on customer side)

Steps to Reproduce:
1. Delete /root/.kube/config
2. Recreate new kubeconfig with cp /etc/origin/master/admin.kubeconfig /root/.kube/config
3. Do oc login -u <user> https://<masterPublicURL:8443> or run ansible-playbook
4. cat /root/.kube/config gets context system:serviceaccount:logging:logicmonitor
5. ansible <master-hosts-group> -m shell -a 'oc get nodes'
master 1 will give an error system:anonymous can not get ...

Actual results:
Context changes to serviceaccount user which makes playbooks dependent on this kibueconfig file to fail with system:anonymous can not do <some-task>

Expected results:
Even when using normal authenticated users the kubeconfig shouldn't be getting serviceaccounts from the cluster.

Additional info:
I'm not uploading sosreport, but I have in case you need some info from it.

Comment 2 Maciej Szulik 2018-09-26 07:33:24 UTC
I can't speak about ansible playbook and what it does, I guess the installer team would have to dive into it. But if you're saying you can reproduce this with plain oc login, I'd like to see the full output of oc login with --loglevel=9 which is producing the unexpected kubeconfig.

Comment 13 Scott Dodson 2018-10-02 17:35:57 UTC
I had previous work where I'd started addressing this in master branch already and given the scope of the changes between 3.9 and 3.10 there's no hope of being able to cherrypick things between those releases.

So, I'm starting with 3.9 for this bug and I'll pick this back into release-3.7 once it's through.

https://github.com/openshift/openshift-ansible/pull/10303

Comment 21 Scott Dodson 2018-10-31 20:07:20 UTC
*** Bug 1638757 has been marked as a duplicate of this bug. ***

Comment 25 ge liu 2018-12-06 06:21:15 UTC
Reproduced with: oc v3.7.57
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

1). oc login with user: geliu, and create project lgproj
2). create sa logicmagic, # oc create serviceaccount logicmagic
3). get token for sa, #oc sa get-token logicmagic
4). #oc login --token=$TOKEN 
5). # grep  logicmagic /root/.kube/config 
..............
/system:serviceaccount:lgproj:logicmagic
current-context: /ec2-35-175-214-159-compute-1-amazonaws-com:443
.....
6). Run below command on slaver(cp inventory file from master to slave firstly):
# ansible -i 37 masters -m shell -a "oc get node"
ec2-35-175-214-159.compute-1.amazonaws.com | FAILED | rc=1 >>
Error from server (Forbidden): User "system:serviceaccount:lgproj:logicmagic" cannot list nodes at the cluster scope: User "system:serviceaccount:lgproj:logicmagic" cannot list all nodes in the cluster (get nodes)non-zero return code


Fail to verify with: oc v3.9.57
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

# ansible -i 39 masters  -m shell -a 'oc get nodes'
 [WARNING] Ansible is in a world writable directory (/tmp), ignoring it as an ansible.cfg source.
qe-geliu39master-etcd-1.1206-nmz.qe.rhcloud.com | FAILED | rc=1 >>
Error from server (Forbidden): nodes is forbidden: User "system:serviceaccount:lgproj:logicmonitor" cannot list nodes at the cluster scope: User "system:serviceaccount:lgproj:logicmonitor" cannot list all nodes in the clusternon-zero return code


Tried to recover /root/.kube/config(remove current-context '/system:serviceaccount:lgproj:logicmagic...' indeedly), then it works well:

]# ansible -i 39 masters  -m shell -a 'oc get nodes'
 [WARNING] Ansible is in a world writable directory (/tmp), ignoring it as an ansible.cfg source.
qe-geliu39master-etcd-1.1206-nmz.qe.rhcloud.com | SUCCESS | rc=0 >>
NAME                               STATUS    ROLES     AGE       VERSION
qe-geliu39master-etcd-1            Ready     master    4h        v1.9.1+a0ce1bc657
qe-geliu39node-registry-router-1   Ready     compute   4h        v1.9.1+a0ce1bc657


So this issue still exist in oc v3.9.57

Comment 27 Scott Dodson 2019-01-24 15:16:59 UTC
Ge,

The problem isn't that `oc login` mutates the kubeconfig, that's completely expected. The problem was that certain tasks in openshift-ansible were not properly setting their kubeconfig to the admin kubeconfig. That has been resolved.

Comment 28 ge liu 2019-01-29 09:06:01 UTC
Verified, tried # ansible -i 39 masters  -m shell -a 'oc get nodes --config=/etc/origin/master/admin.kubeconfig', it works well, Then make sure ./kube/config be changed, then run playbook:...roles/openshift_prometheus/tasks/install_prometheus.yaml(by running ..../openshift-prometheus/config.yml), the prometheus be installed successfully.
Version: openshift-ansible-3.9.66-1.git.0.358f2aa.el7.noarch

Comment 30 errata-xmlrpc 2019-02-20 08:46:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0331


Note You need to log in before you can comment on or make changes to this bug.