Bug 1596818
Summary: | Upgrade playbook fails on play Drain and upgrade nodes | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Vikas Laad <vlaad> | ||||
Component: | Cluster Version Operator | Assignee: | Scott Dodson <sdodson> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Vikas Laad <vlaad> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 3.10.0 | CC: | abutcher, aos-bugs, bbennett, ccoleman, gpei, jeder, jiajliu, jokerman, mifiedle, mmccomas, pprakash, sdodson, sjenning, vlaad, wmeng, wsun, xtian | ||||
Target Milestone: | --- | Keywords: | TestBlocker | ||||
Target Release: | 3.10.0 | Flags: | jiajliu:
needinfo-
|
||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-07-11 13:54:15 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Vikas Laad
2018-06-29 18:27:21 UTC
Created attachment 1455548 [details]
ansible log with -vvv
In this environment the username field on the client side certificate field has a username of `system:node:ip-172-31-8-53.us-west-2.compute.internal` where as in all other environments we see that the the username field is `system:admin`. We're trying to figure out why this is happening. *** Bug 1597219 has been marked as a duplicate of this bug. *** Seth / Clayton, The comment here says that the client cert should have a username of the service account, however it seems during upgrades it's got "username": "system:node:ip-172-31-8-53.us-west-2.compute.internal" https://github.com/openshift/openshift-ansible/commit/19cd3550718ad466e5e0a648d611efe45ac5cafa#diff-f9afe10ac5cdf6294e144f2853bf9d05R1609 Is this expected? If I delete everything in /etc/origin/node/certificates and restart the kubelet $ kubectl get csr NAME AGE REQUESTOR CONDITION node-csr-SOBked3RehDQ0okkusaCqi0ul-1qt-hbh8XT0jvc3x4 43s system:serviceaccount:openshift-infra:node-bootstrapper Pending [centos@master ~]$ oc adm certificate approve node-csr-SOBked3RehDQ0okkusaCqi0ul-1qt-hbh8XT0jvc3x4 certificatesigningrequest.certificates.k8s.io/node-csr-SOBked3RehDQ0okkusaCqi0ul-1qt-hbh8XT0jvc3x4 approved [centos@master ~]$ kubectl get csr NAME AGE REQUESTOR CONDITION csr-s9btc 2s system:node:node.lab.variantweb.net Pending node-csr-SOBked3RehDQ0okkusaCqi0ul-1qt-hbh8XT0jvc3x4 1m system:serviceaccount:openshift-infra:node-bootstrapper Approved,Issued [centos@master ~]$ oc adm certificate approve csr-s9btc certificatesigningrequest.certificates.k8s.io/csr-s9btc approved [centos@master ~]$ kubectl get csr NAME AGE REQUESTOR CONDITION csr-s9btc 22s system:node:node.lab.variantweb.net Approved,Issued node-csr-SOBked3RehDQ0okkusaCqi0ul-1qt-hbh8XT0jvc3x4 1m system:serviceaccount:openshift-infra:node-bootstrapper Approved,Issued The CSR submitted by the node-bootstrapper is for the kubelet client cert. Once it is approved, the kubelet uses its own SA, identified by its newly signed client cert, to create a CSR for its server cert, which it uses to accept TLS connections from the API server. My understanding is the the kubelet uses its own client cert (i.e. its own serviceaccount) to rotate the client cert. It only uses the node-bootstrapper SA to obtain the initial client cert. If you want to force the kubelet to start the bootstrapping process from the beginning, you can remove everything in /etc/origin/node/certificates. Still not fix on latest v3.10.12. This bug blocked all upgrade tests. The PR https://github.com/openshift/openshift-ansible/pull/9098 has been merged to openshift-ansible-3.10.15-1 We did not hit the issue on v3.10.15. Move to verified per commet 19 |