Created attachment 1485112 [details] kubeconfig.files Description of problem: Customer is using one of the masters as a bastion, to login with his cluster-admin user and running the playbooks. During the upgrade of Service Catalog I realized that they were having an issue with kubeconfig with some of the tasks that rely on this file to have the system:admin user, was failing getting system:anonymous not being able to apply or change objects. Deleted kubeconfig and recreated several times but saw that every time he did oc login or ran a playbook the kubeconfig was getting populated automatically with a custom (created by the customer) serviceaccount from logging project. I have checked sosreport and after some tests I can't see what might be causing and I'm not able to replicate this. Version-Release number of selected component (if applicable): ocp v3.7.57 How reproducible: Everytime oc login or ansible-playbook is used (on customer side) Steps to Reproduce: 1. Delete /root/.kube/config 2. Recreate new kubeconfig with cp /etc/origin/master/admin.kubeconfig /root/.kube/config 3. Do oc login -u <user> https://<masterPublicURL:8443> or run ansible-playbook 4. cat /root/.kube/config gets context system:serviceaccount:logging:logicmonitor 5. ansible <master-hosts-group> -m shell -a 'oc get nodes' master 1 will give an error system:anonymous can not get ... Actual results: Context changes to serviceaccount user which makes playbooks dependent on this kibueconfig file to fail with system:anonymous can not do <some-task> Expected results: Even when using normal authenticated users the kubeconfig shouldn't be getting serviceaccounts from the cluster. Additional info: I'm not uploading sosreport, but I have in case you need some info from it.
I can't speak about ansible playbook and what it does, I guess the installer team would have to dive into it. But if you're saying you can reproduce this with plain oc login, I'd like to see the full output of oc login with --loglevel=9 which is producing the unexpected kubeconfig.
I had previous work where I'd started addressing this in master branch already and given the scope of the changes between 3.9 and 3.10 there's no hope of being able to cherrypick things between those releases. So, I'm starting with 3.9 for this bug and I'll pick this back into release-3.7 once it's through. https://github.com/openshift/openshift-ansible/pull/10303
*** Bug 1638757 has been marked as a duplicate of this bug. ***
Reproduced with: oc v3.7.57 kubernetes v1.7.6+a08f5eeb62 features: Basic-Auth GSSAPI Kerberos SPNEGO 1). oc login with user: geliu, and create project lgproj 2). create sa logicmagic, # oc create serviceaccount logicmagic 3). get token for sa, #oc sa get-token logicmagic 4). #oc login --token=$TOKEN 5). # grep logicmagic /root/.kube/config .............. /system:serviceaccount:lgproj:logicmagic current-context: /ec2-35-175-214-159-compute-1-amazonaws-com:443 ..... 6). Run below command on slaver(cp inventory file from master to slave firstly): # ansible -i 37 masters -m shell -a "oc get node" ec2-35-175-214-159.compute-1.amazonaws.com | FAILED | rc=1 >> Error from server (Forbidden): User "system:serviceaccount:lgproj:logicmagic" cannot list nodes at the cluster scope: User "system:serviceaccount:lgproj:logicmagic" cannot list all nodes in the cluster (get nodes)non-zero return code Fail to verify with: oc v3.9.57 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO # ansible -i 39 masters -m shell -a 'oc get nodes' [WARNING] Ansible is in a world writable directory (/tmp), ignoring it as an ansible.cfg source. qe-geliu39master-etcd-1.1206-nmz.qe.rhcloud.com | FAILED | rc=1 >> Error from server (Forbidden): nodes is forbidden: User "system:serviceaccount:lgproj:logicmonitor" cannot list nodes at the cluster scope: User "system:serviceaccount:lgproj:logicmonitor" cannot list all nodes in the clusternon-zero return code Tried to recover /root/.kube/config(remove current-context '/system:serviceaccount:lgproj:logicmagic...' indeedly), then it works well: ]# ansible -i 39 masters -m shell -a 'oc get nodes' [WARNING] Ansible is in a world writable directory (/tmp), ignoring it as an ansible.cfg source. qe-geliu39master-etcd-1.1206-nmz.qe.rhcloud.com | SUCCESS | rc=0 >> NAME STATUS ROLES AGE VERSION qe-geliu39master-etcd-1 Ready master 4h v1.9.1+a0ce1bc657 qe-geliu39node-registry-router-1 Ready compute 4h v1.9.1+a0ce1bc657 So this issue still exist in oc v3.9.57
Ge, The problem isn't that `oc login` mutates the kubeconfig, that's completely expected. The problem was that certain tasks in openshift-ansible were not properly setting their kubeconfig to the admin kubeconfig. That has been resolved.
Verified, tried # ansible -i 39 masters -m shell -a 'oc get nodes --config=/etc/origin/master/admin.kubeconfig', it works well, Then make sure ./kube/config be changed, then run playbook:...roles/openshift_prometheus/tasks/install_prometheus.yaml(by running ..../openshift-prometheus/config.yml), the prometheus be installed successfully. Version: openshift-ansible-3.9.66-1.git.0.358f2aa.el7.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0331