Bug 1978193
Summary: | htpasswd provider for auth is not working as expected and give 401 error when user try to login | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Praveen Kumar <prkumar> | ||||
Component: | apiserver-auth | Assignee: | Standa Laznicka <slaznick> | ||||
Status: | CLOSED ERRATA | QA Contact: | liyao | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.8 | CC: | aos-bugs, cfergeau, mfojtik, surbania, xxia | ||||
Target Milestone: | --- | Keywords: | NeedsTestCase | ||||
Target Release: | 4.9.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause:
The CA for API server client certs was rotated early in the lifetime of a cluster which caused the authentication's operator logic to be unable to create a CSR because a previous CSR with the same name still existed.
Consequence:
The kube-apiserver was unable to authenticate itself to the oauth-apiserver when sending TokenReview requests, causing authentication to fail.
Fix:
Use generated names for creating CSRs in the authentication operator.
Result:
Early rotations of the CA for API server client certificates won't cause authentication failures for OpenShift users.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-10-18 17:37:23 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1997906 | ||||||
Attachments: |
|
Description
Praveen Kumar
2021-07-01 09:38:06 UTC
Is the default kubeadmin user still present on the cluster? @Standa Laznicka No we remove the default kubeadmin user as per https://docs.openshift.com/container-platform/4.7/authentication/remove-kubeadmin.html document after having the htpasswd configured and providing one the user cluster-admin role. Bit more debugging and looks like issue with valid cert for authenticator . I think we can able reproduce is with any cluster where we can force cert-rotation and wait till all csr are approved. As part of CRC we force a cert rotation to have 30 days validity and during this process we can have new csr for `node:bootstrapper` and for `node:<node-name>` but we don't have updated csr for `openshift-authentication-operator:authentication-operator` and that is the reason the auth operator pod logs says that csr exist but not a valid one. ``` $ oc get csr NAME AGE SIGNERNAME REQUESTOR CONDITION csr-5zmq5 26h kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-6kptk 2d3h kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-8f94s 26h kubernetes.io/kubelet-serving system:node:crc-rb86w-master-0 Approved,Issued csr-xmmvg 2d3h kubernetes.io/kubelet-serving system:node:crc-rb86w-master-0 Approved,Issued system:openshift:openshift-authenticator 2d3h kubernetes.io/kube-apiserver-client system:serviceaccount:openshift-authentication-operator:authentication-operator Approved,Issued $ oc delete csr system:openshift:openshift-authenticator certificatesigningrequest.certificates.k8s.io "system:openshift:openshift-authenticator" deleted $ oc logs authentication-operator-7d8d5485f9-fp4rr -n openshift-authentication-operator [...] E0702 05:38:23.236664 1 base_controller.go:264] "OpenShiftAuthenticatorCertRequester" controller failed to sync "csr-8f94s", err: certificatesigningrequests.certificates.k8s.io "system:openshift:openshift-authenticator" already exists I0702 05:38:24.657403 1 event.go:282] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-authentication-operator", Name:"authentication-operator", UID:"5b8aa10a-b814-4a64-bbc3-11cfe5e25458", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'NoValidCertificateFound' No valid client certificate for OpenShiftAuthenticatorCertRequester is found. Bootstrap is required [prkumar@prkumar-test snc]$ oc get csr NAME AGE SIGNERNAME REQUESTOR CONDITION csr-76b68 28m kubernetes.io/kubelet-serving system:node:crc-n9gwv-master-0 Approved,Issued csr-9df4b 30m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-lxpg7 2d kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper Approved,Issued csr-m9988 2d kubernetes.io/kubelet-serving system:node:crc-n9gwv-master-0 Approved,Issued system:openshift:openshift-authenticator 4s kubernetes.io/kube-apiserver-client system:serviceaccount:openshift-authentication-operator:authentication-operator Approved,Issued ``` I see, this is indeed going to be an issue with the forced cert-rotation combined with a CSR that's already existing because the rotation happened earlier than the CSR got autoremoved. sprint review: @QA: please submit sprint review status Tested in fresh cluster 4.9.0-0.nightly-2021-08-18-144658 and 4.8.0-0.nightly-2021-08-18-161850 1. delete secret openshift-authenticator-certs both in 4.8 env and 4.9 env $ oc delete secret -n openshift-oauth-apiserver openshift-authenticator-certs secret "openshift-authenticator-certs" deleted 2. in 4.9 env, check whether new created csr is using generateName with random suffix rather than previous fixed name 'system:openshift:openshift-authenticator' $ oc get csr NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION system:openshift:openshift-authenticator-cckdw 16s kubernetes.io/kube-apiserver-client system:serviceaccount:openshift-authentication-operator:authentication-operator <none> Approved,Issued 3. in 4.9 env, check labels 'authentication.openshift.io/csr: openshift-authenticator' is added to the new csr $ oc get csr system:openshift:openshift-authenticator-cckdw -o yaml | grep -A5 -B5 'labels' apiVersion: certificates.k8s.io/v1 kind: CertificateSigningRequest metadata: creationTimestamp: "2021-08-19T08:17:22Z" generateName: system:openshift:openshift-authenticator- labels: authentication.openshift.io/csr: openshift-authenticator name: system:openshift:openshift-authenticator-cckdw resourceVersion: "155347" uid: 91c634c2-a0b0-4247-99d4-c64141bd2616 spec: 4. delete secret openshift-authenticator-certs again both in 4.8 env and 4.9 env $ oc delete secret -n openshift-oauth-apiserver openshift-authenticator-certs secret "openshift-authenticator-certs" deleted 5. in 4.9 env, check whether new created csr is using different random suffix from the previous one $ oc get csr NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION system:openshift:openshift-authenticator-7c5zp 3s kubernetes.io/kube-apiserver-client system:serviceaccount:openshift-authentication-operator:authentication-operator <none> Approved,Issued system:openshift:openshift-authenticator-cckdw 49s kubernetes.io/kube-apiserver-client system:serviceaccount:openshift-authentication-operator:authentication-operator <none> Approved,Issued 6. check whether there are errors in authentication-operator pod logs $ oc logs <pod-name> -n openshift-authentication-operator # in 4.9 env, no error appear, which is expected with the fix # in 4.8 env, there are below errors constantly output in the authentication-operator pod logs as Comment 3, which means the bug is reproduced /****snipped****/ E0820 02:06:48.705837 1 base_controller.go:266] OpenShiftAuthenticatorCertRequester reconciliation failed: certificatesigningrequests.certificates.k8s.io "system:openshift:openshift-authenticator" already exists E0820 02:06:49.768551 1 base_controller.go:264] "OpenShiftAuthenticatorCertRequester" controller failed to sync "csr-j5xjr", err: certificatesigningrequests.certificates.k8s.io "system:openshift:openshift-authenticator" already exists Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |