Bug 1668547

Summary:	Got "x509: certificate has expired or is not yet valid" after running one day
Product:	OpenShift Container Platform	Reporter:	Jian Zhang <jiazha>
Component:	Master	Assignee:	David Eads <deads>
Status:	CLOSED ERRATA	QA Contact:	Jian Zhang <jiazha>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.1.0	CC:	aos-bugs, chezhang, deads, dyan, jfan, jokerman, mmccomas, sponnaga, zitang
Target Milestone:	---	Keywords:	Reopened
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-04 10:42:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jian Zhang 2019-01-23 05:36:24 UTC

Description of problem:
After running the cluster for about one day, got below errors when running `oc project openshift-apiserver`:
 "the server is currently unable to handle the request (get projects.project.openshift.io openshift-apiserver)",

Check the logs of the apiserver pods, got below:
E0123 03:04:58.782273       1 memcache.go:147] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0123 03:05:01.903420       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid]
...

Version-Release number of selected component (if applicable):
Payload version: registry.svc.ci.openshift.org/openshift/origin-release:4.0.0-0.alpha-2019-01-22-015156
apiserver image:
registry.svc.ci.openshift.org/openshift/origin-v4.0-2019-01-23-022946@sha256:c89ba6c22943ba6793e067ac0287ddd7775020ce51e9ee6da71a738f9216ddfd

How reproducible:
often

Steps to Reproduce:
1. Create the 4.0 cluster by using the payload.
2. After running about one day.
3. Switch the namesapces. For example, `oc project openshift-apiserver`

Actual results:
[core@ip-10-0-14-64 ~]$ oc project openshift-cluster-version
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get projects.project.openshift.io openshift-cluster-version)
[core@ip-10-0-14-64 ~]$ oc whoami 
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get users.user.openshift.io ~)

Expected results:
Can switch namesapces successfully.

Additional info:
Still get the errors after restarted the apiserver pods.
[core@ip-10-0-14-64 ~]$ oc get pods -n  openshift-apiserver 
NAME              READY     STATUS    RESTARTS   AGE
apiserver-2rftt   1/1       Running   0          2h
apiserver-gwrgb   1/1       Running   0          2h
apiserver-zn8c8   1/1       Running   0          2h

Check the logs of the apiserver pods:
[core@ip-10-0-14-64 ~]$ oc logs ds/apiserver -n openshift-apiserver
Found 3 pods, using pod/apiserver-zn8c8
...
E0123 03:04:58.782273       1 memcache.go:147] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0123 03:05:01.903420       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid]
E0123 03:05:01.903713       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid]
...


[core@ip-10-0-14-64 ~]$ oc project openshift-apiserver --loglevel=8
I0123 03:12:48.581686   23706 loader.go:359] Config loaded from file /home/core/.kube/config
I0123 03:12:48.582367   23706 loader.go:359] Config loaded from file /home/core/.kube/config
I0123 03:12:48.583185   23706 round_trippers.go:383] GET https://preserve-jian-api.qe.devcluster.openshift.com:6443/apis/project.openshift.io/v1/projects/openshift-apiserver
I0123 03:12:48.583218   23706 round_trippers.go:390] Request Headers:
I0123 03:12:48.583234   23706 round_trippers.go:393]     User-Agent: oc/v1.11.0+406fc897d8 (linux/amd64) kubernetes/406fc89
I0123 03:12:48.583253   23706 round_trippers.go:393]     Accept: application/json, */*
I0123 03:12:48.606075   23706 round_trippers.go:408] Response Status: 503 Service Unavailable in 22 milliseconds
I0123 03:12:48.606104   23706 round_trippers.go:411] Response Headers:
I0123 03:12:48.606120   23706 round_trippers.go:414]     Audit-Id: 1d8e3a6e-6a81-4f8d-b71d-a6193fc3b933
I0123 03:12:48.606138   23706 round_trippers.go:414]     Cache-Control: no-store
I0123 03:12:48.606152   23706 round_trippers.go:414]     Content-Type: text/plain; charset=utf-8
I0123 03:12:48.606166   23706 round_trippers.go:414]     X-Content-Type-Options: nosniff
I0123 03:12:48.606179   23706 round_trippers.go:414]     Content-Length: 20
I0123 03:12:48.606193   23706 round_trippers.go:414]     Date: Wed, 23 Jan 2019 03:12:48 GMT
I0123 03:12:48.606246   23706 request.go:897] Response Body: service unavailable
I0123 03:12:48.606425   23706 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "the server is currently unable to handle the request (get projects.project.openshift.io openshift-apiserver)",
  "reason": "ServiceUnavailable",
  "details": {
    "name": "openshift-apiserver",
    "group": "project.openshift.io",
    "kind": "projects",
    "causes": [
      {
        "reason": "UnexpectedServerResponse",
        "message": "service unavailable"
      }
    ]
  },
  "code": 503
}]
F0123 03:12:48.606558   23706 helpers.go:119] Error from server (ServiceUnavailable): the server is currently unable to handle the request (get projects.project.openshift.io openshift-apiserver)

Comment 1 David Eads 2019-01-30 15:57:04 UTC

This is a known problem in the beta that was fixed over the course of several pulls.

To unstick yourself, try `oc -n openshift-apiserver delete pods --all`.

Hopefully I found the right closure reason.

Comment 2 Jian Zhang 2019-01-31 03:33:19 UTC

David,

No, I don't think so. As I described in "Additional info:" section, the issue still exists after `oc -n openshift-apiserver delete pods --all`.

Still get the errors after restarted the apiserver pods.
[core@ip-10-0-14-64 ~]$ oc get pods -n  openshift-apiserver 
NAME              READY     STATUS    RESTARTS   AGE
apiserver-2rftt   1/1       Running   0          2h
apiserver-gwrgb   1/1       Running   0          2h
apiserver-zn8c8   1/1       Running   0          2h

And, seems like the errors: "x509: certificate has expired or is not yet valid, x509: certificate has expired or is not yet valid" is different from we reported bug 1665842 for Beta-1. So reopen it. Please set the status to the "ON_QA" if you fixed it.

Comment 4 David Eads 2019-02-13 15:45:19 UTC

I don't see this in master

Comment 5 Jian Zhang 2019-02-15 08:36:05 UTC

I didn't hit issue again, it works as expected by now, verify it. Details as below:
[jzhang@dhcp-140-18 ocp-14]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-02-13-204401   True        False         26h       Cluster version is 4.0.0-0.nightly-2019-02-13-204401
[jzhang@dhcp-140-18 ocp-14]$ oc whoami
system:admin
[jzhang@dhcp-140-18 ocp-14]$ oc project openshift-apiserver 
Now using project "openshift-apiserver" on server "https://jian-14-api.qe.devcluster.openshift.com:6443"

Comment 8 errata-xmlrpc 2019-06-04 10:42:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758