Bug 1956810 - Unauthorized error after 24 hours of cluster use
Summary: Unauthorized error after 24 hours of cluster use
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oauth-proxy
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Standa Laznicka
QA Contact: scheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-04 13:10 UTC by Israel Pinto
Modified: 2021-05-06 09:47 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-06 09:47:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (3.08 MB, application/zip)
2021-05-04 13:11 UTC, Israel Pinto
no flags Details

Description Israel Pinto 2021-05-04 13:10:03 UTC
Description of problem:
We can't use kubeconfig to login the cluster after 24H, 
We need to re-login with admin and we can use again the kubeconfig.  

Version-Release number of selected component (if applicable):
OCP 4.7.1
Red Hat Enterprise Linux CoreOS 47.83.202103041352-0 (Ootpa)   
Kernel: 4.18.0-240.15.1.el8_3.x86_64

How reproducible:
In ~24H interval

Steps to Reproduce:
Running OCP 4.7.1 with CNV 2.6.2(after upgrade from 2.6.1)


Additional info:
# oc whoami 
error: You must be logged in to the server (Unauthorized)

--------------

The pods oauth-openshift are recreated. 
openshift-authentication-operator                  authentication-operator-7dcd4b9c7b-89jn8                                  1/1     Running     0          2d1h
openshift-authentication                           oauth-openshift-5b4b584ff6-9gcd8                                          1/1     Running     0          103m
openshift-authentication                           oauth-openshift-5b4b584ff6-zbnlb                                          1/1     Running     0          103m
openshift-oauth-apiserver                          apiserver-c7597f7f4-5ms8j                                                 1/1     Running     0          2d1h
openshift-oauth-apiserver                          apiserver-c7597f7f4-6dn9j                                                 1/1     Running     0          2d1h
openshift-oauth-apiserver                          apiserver-c7597f7f4-q7vrg                                                 1/1     Running     0          2d1h

Adding logs

Comment 1 Israel Pinto 2021-05-04 13:11:13 UTC
Created attachment 1779377 [details]
logs

Comment 4 Israel Pinto 2021-05-04 18:20:10 UTC
(In reply to Standa Laznicka from comment #3)
> https://docs.openshift.com/container-platform/4.7/authentication/configuring-
> internal-oauth.html#oauth-configuring-internal-oauth_configuring-internal-
> oauth

Hi Stabda
Please review the cluster settings,  access Token Max Age Seconds is set to: 604800 => 7 days 
We are not using the default setting. 

$ oc describe oauth.config.openshift.io/cluster
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  include.release.openshift.io/self-managed-high-availability: true
              include.release.openshift.io/single-node-developer: true
              release.openshift.io/create-only: true
API Version:  config.openshift.io/v1
Kind:         OAuth
Metadata:
  Creation Timestamp:  2021-05-02T08:39:33Z
  Generation:          57
  Managed Fields:
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:include.release.openshift.io/self-managed-high-availability:
          f:include.release.openshift.io/single-node-developer:
          f:release.openshift.io/create-only:
      f:spec:
    Manager:      cluster-version-operator
    Operation:    Update
    Time:         2021-05-02T08:39:33Z
    API Version:  config.openshift.io/v1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:identityProviders:
        f:tokenConfig:
          .:
          f:accessTokenMaxAgeSeconds:
    Manager:         OpenAPI-Generator
    Operation:       Update
    Time:            2021-05-04T11:10:59Z
  Resource Version:  2205553
  Self Link:         /apis/config.openshift.io/v1/oauths/cluster
  UID:               f37aa04c-98c4-4eed-8b2d-f2ad0341662b
Spec:
  Identity Providers:
    Htpasswd:
      File Data:
        Name:        htpass-secret-for-cnv-tests
    Mapping Method:  claim
    Name:            htpasswd_provider
    Type:            HTPasswd
  Token Config:
    Access Token Max Age Seconds:  604800
Events:                            <none>

Comment 5 Standa Laznicka 2021-05-05 07:42:51 UTC
Ok, that would be a good thing to mention in the bug description the next time.

Please provide must-gather.

Comment 7 Standa Laznicka 2021-05-05 09:14:40 UTC
That must-gather is missing most of the operator logs and contains no config objects whatsoever. Please try again and check that these are actually present.

Comment 8 Israel Pinto 2021-05-05 09:32:47 UTC
What logs you me to collect can you be more specific, this is what must-gather collected, you have diff image of must-gather i can run?

Comment 9 Standa Laznicka 2021-05-05 09:47:10 UTC
When you look at the must-gather you provided, you'll see that most namespaces have 0 bytes. You will also see that among the cluster-scoped objects, there are no objects of the config.openshift.io and operator.openshift.io API groups. That's not what a successful run of must-gather looks like. For some reason though there seem to be many objects from kubevirt.io that don't usually appear there. I am going to need a must-gather containing the objects that I mentioned in this comment.

Comment 11 Standa Laznicka 2021-05-05 11:01:58 UTC
I checked the audit logs and the configuration on the cluster. It seems to have all config properly observed and set in the oauth-server.

I can also see that the last access token that only lasted for a day was created on 3.5., so two days ago:
"""
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"RequestResponse","auditID":"d2a40074-5dfc-4806-818a-3b768d1cbf06","stage":"ResponseComplete","requestURI":"/apis/oauth.openshift.io/v1/oauthaccesstokens","verb":"create","user":{"username":"system:serviceaccount:openshift-authentication:oauth-openshift","groups":["system:serviceaccounts","system:serviceaccounts:openshift-authentication","system:authenticated"]},"sourceIPs":"10.1.156.10","10.129.0.1"],"userAgent":"oauth-server/v0.0.0 (linux/amd64) kubernetes/$Format","objectRef":{"resource":"oauthaccesstokens","name":"sha256~9Q-mqgqXgzguwTd7bkCUcQvLe8pp022VjzDIvH3aBRc","apiGroup":"oauth.openshift.io","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":201},"requestObject":{"kind":"OAuthAccescrsToken","apiVersion":"oauth.openshift.io/v1","metadata":{"name":"sha256~9Q-mqgqXgzguwTd7bkCUcQvLe8pp022VjzDIvH3aBRc","creationTimestamp":null},"clientName":"openshift-challenging-client","expiresIn":86400,"scopes":["user:full"],"redirectURI":"https://<redacted>/token/implicit","userName":"kube:admin","userUID":"WevBXRdJC3tKW4H3FNN9oeN40NupijQur7edabvwbYEUJvZ7NTIfKiBxRVsOscq-PkVBGcrlAVvXhqQTokJDUQ","authorizeToken":"sha256~gKadGaRqMqIGWfxUvth410sZF0y5qj22UNedvWLfcKo"},"responseObject":{"kind":"OAuthAccessToken","apiVersion":"oauth.openshift.io/v1","metadata":{"name":"sha256~9Q-mqgqXgzguwTd7bkCUcQvLe8pp022VjzDIvH3aBRc","uid":"eec4ebbb-dbcc-4f71-80d7-043ccc073fa5","resourceVersion":"1238933","creationTimestamp":"2021-05-03T12:19:08Z","managedFields":[{"manager":"oauth-server","operation":"Update","apiVersion":"oauth.openshift.iov1","time":"2021-05-03T12:19:08Z","fieldsType":"FieldsV1","fieldsV1":{"f:authorizeToken":{},"f:clientName":{},"f:expiresIn":{},"f:redirectURI":{},"f:scopes":{},"f:userName":{},"f:userUID":{}}}]},"clientName":"openshift-challenging-client","expiresIn":86400,"scopes":["user:full"],"redirectURI":"https://<redacted>/oauth/token/implicit","userName":"kube:admin","userUID":"WevBXRdJC3tKW4H3FNN9oeN40NupijQur7edabvwbYEUJvZ7NTIfKiBxRVsOscq-PkVBGcrlAVvXhqQTokJDUQ","authorizeToken":"sha256~gKadGaRqMqIGWfxUvth410sZF0y5qj22UNedvWLfcKo"},"requestReceivedTimestamp":"2021-05-03T12:19:08.404907Z","stageTimestamp":"2021-05-03T12:19:08.424597Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"system:openshift:openshift-authentication\" of ClusterRole \"cluster-admin\" to ServiceAccount \"oauth-openshift/openshift-authentication\""}}

"""

The token I cerated has expiration correctly set for a week

The authentication operator is reporting progressing AND degraded because the configuration is invalid. If this was the case back then as well, it wouldn't surprise me if the configuration of accessTokenMaxAge was disregarded because of user errors, but that could have been fixed and broken again in the meantime, and therefore the correct max age would start applying.

Comment 12 Israel Pinto 2021-05-06 09:47:34 UTC
Monitor the cluster for 24H and we can use old and new kubeconfig
Last login: Thu May 6 09:14:24 2021 from 10.40.195.52
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ ls /home/cnv-qe-jenkins/cnvqe2.lab.eng.rdu2.redhat.com/bm02-cnvqe2-rdu2/auth/kubeconfig.05052021               
/home/cnv-qe-jenkins/cnvqe2.lab.eng.rdu2.redhat.com/bm02-cnvqe2-rdu2/auth/kubeconfig.05052021
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ ^Chome/cnv-qe-jenkins/cnvqe2.lab.eng.rdu2.redhat.com/bm02-cnvqe2-rdu2/auth/kubeconfig.05052021  
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ export KUBECONFIG=/home/cnv-qe-jenkins/cnvqe2.lab.eng.rdu2.redhat.com/bm02-cnvqe2-rdu2/auth/kubeconfig
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get nodes
NAME                                            STATUS  ROLES   AGE    VERSION
cnv-qe-infra-08.cnvqe2.lab.eng.rdu2.redhat.com  Ready   master  4d     v1.20.0+5fbfd19
cnv-qe-infra-09.cnvqe2.lab.eng.rdu2.redhat.com  Ready   master  4d     v1.20.0+5fbfd19
cnv-qe-infra-10.cnvqe2.lab.eng.rdu2.redhat.com  Ready   master  4d     v1.20.0+5fbfd19
cnv-qe-infra-11.cnvqe2.lab.eng.rdu2.redhat.com  Ready   worker  3d23h  v1.20.0+5fbfd19
cnv-qe-infra-12.cnvqe2.lab.eng.rdu2.redhat.com  Ready   worker  3d23h  v1.20.0+5fbfd19
cnv-qe-infra-13.cnvqe2.lab.eng.rdu2.redhat.com  Ready   worker  3d23h  v1.20.0+5fbfd19
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ export KUBECONFIG=/home/cnv-qe-jenkins/cnvqe2.lab.eng.rdu2.redhat.com/bm02-cnvqe2-rdu2/auth/kubeconfig.05052021 
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get nodes
NAME                                            STATUS  ROLES   AGE    VERSION
cnv-qe-infra-08.cnvqe2.lab.eng.rdu2.redhat.com  Ready   master  4d     v1.20.0+5fbfd19
cnv-qe-infra-09.cnvqe2.lab.eng.rdu2.redhat.com  Ready   master  4d     v1.20.0+5fbfd19
cnv-qe-infra-10.cnvqe2.lab.eng.rdu2.redhat.com  Ready   master  4d     v1.20.0+5fbfd19
cnv-qe-infra-11.cnvqe2.lab.eng.rdu2.redhat.com  Ready   worker  3d23h  v1.20.0+5fbfd19
cnv-qe-infra-12.cnvqe2.lab.eng.rdu2.redhat.com  Ready   worker  3d23h  v1.20.0+5fbfd19
cnv-qe-infra-13.cnvqe2.lab.eng.rdu2.redhat.com  Ready   worker  3d23h  v1.20.0+5fbfd19


based on that we will close the bug.


Note You need to log in before you can comment on or make changes to this bug.