Bug 1665842

Summary:	error: You must be logged in to the server (Unauthorized)
Product:	OpenShift Container Platform	Reporter:	Jian Zhang <jiazha>
Component:	Master	Assignee:	Michal Fojtik <mfojtik>
Status:	CLOSED ERRATA	QA Contact:	Xingxing Xia <xxia>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	4.1.0	CC:	aos-bugs, chezhang, chuyu, deads, dyan, erich, hongkliu, jfan, jiazha, jokerman, maszulik, mdame, mfojtik, mmccomas, nagrawal, sponnaga, vlaad, wking, wking, wsun, xxia, zitang
Target Milestone:	---	Keywords:	NeedsTestCase
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1665935 (view as bug list)		Environment:
Last Closed:	2019-06-04 10:41:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1665935

Description Jian Zhang 2019-01-14 08:56:04 UTC

Description of problem:
After running the OCP 4.0 cluster for one day, got below errors:
error: You must be logged in to the server (Unauthorized)


Version-Release number of selected component (if applicable):
clusterversion: 4.0.0-0.nightly-2019-01-12-000105

mac:~ jianzhang$ openshift-install version
openshift-install v0.9.1
How reproducible:
always

Steps to Reproduce:
1. Create the OCP 4.0 by using the openshift-installer.

2. After running a day.
3. run "oc project" or "oc whoami"

Actual results:
[core@ip-10-0-37-160 ~]$ oc project
error: You must be logged in to the server (Unauthorized)
[core@ip-10-0-37-160 ~]$ oc whoami 
error: You must be logged in to the server (Unauthorized)

Expected results:
Can output the correct user info.

Additional info:
[core@ip-10-0-37-160 ~]$ oc whoami --loglevel=8
I0114 06:37:34.885108    1801 loader.go:359] Config loaded from file /home/core/.kube/config
I0114 06:37:34.886843    1801 loader.go:359] Config loaded from file /home/core/.kube/config
I0114 06:37:34.887525    1801 round_trippers.go:383] GET https://qe-jian3-api.qe.devcluster.openshift.com:6443/apis/user.openshift.io/v1/users/~
I0114 06:37:34.887561    1801 round_trippers.go:390] Request Headers:
I0114 06:37:34.887591    1801 round_trippers.go:393]     Accept: application/json, */*
I0114 06:37:34.887606    1801 round_trippers.go:393]     User-Agent: oc/v1.11.0+406fc897d8 (linux/amd64) kubernetes/406fc89
I0114 06:37:34.911195    1801 round_trippers.go:408] Response Status: 401 Unauthorized in 23 milliseconds
I0114 06:37:34.911231    1801 round_trippers.go:411] Response Headers:
I0114 06:37:34.911247    1801 round_trippers.go:414]     Audit-Id: 729dbf77-cf0a-4e12-a492-22e3ee0ee626
I0114 06:37:34.911282    1801 round_trippers.go:414]     Cache-Control: no-store
I0114 06:37:34.911302    1801 round_trippers.go:414]     Cache-Control: no-store
I0114 06:37:34.911316    1801 round_trippers.go:414]     Content-Type: application/json
I0114 06:37:34.911346    1801 round_trippers.go:414]     Date: Mon, 14 Jan 2019 06:37:34 GMT
I0114 06:37:34.911361    1801 round_trippers.go:414]     Content-Length: 129
I0114 06:37:34.911407    1801 request.go:897] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
I0114 06:37:34.911882    1801 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}]
F0114 06:37:34.911960    1801 helpers.go:119] error: You must be logged in to the server (Unauthorized)

Check the logs of openshift-kube-apiserver:
[core@ip-10-0-37-160 ~]$ sudo crictl logs e9030e599470f
E0114 06:02:03.403729       1 reflector.go:136] github.com/openshift/origin/pkg/quota/generated/informers/internalversion/factory.go:101: Failed to list *quota.ClusterResourceQuota: Unauthorized
E0114 06:02:03.404807       1 reflector.go:136] github.com/openshift/client-go/security/informers/externalversions/factory.go:101: Failed to list *v1.SecurityContextConstraints: Unauthorized
E0114 06:02:03.406461       1 reflector.go:136] github.com/openshift/client-go/user/informers/externalversions/factory.go:101: Failed to list *v1.Group: Unauthorized
I0114 06:02:03.750760       1 logs.go:49] http: TLS handshake error from 10.0.10.180:19302: EOF
E0114 06:02:03.752768       1 oauth_apiserver.go:326] Unauthorized
I0114 06:02:03.899381       1 trace.go:76] Trace[102609404]: "GuaranteedUpdate etcd3: *core.ConfigMap" (started: 2019-01-14 06:01:53.898823636 +0000 UTC m=+9986.385772930) (total time: 10.00052286s):
$ oc logs apiserver-7cwpn -n openshift-apiserver
E0114 02:25:24.611324       1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority]

Comment 2 W. Trevor King 2019-01-14 14:37:07 UTC

> [core@ip-10-0-37-160 ~]$ oc whoami --loglevel=8
> I0114 06:37:34.885108 1801 loader.go:359] Config loaded from file /home/core/.kube/config 

Looks like you forgot to export KUBECONFIG [1]?

[1]: https://github.com/openshift/installer/blame/v0.9.1/README.md#L39

Comment 9 W. Trevor King 2019-01-15 07:09:54 UTC

Responding to some internal discussion:

> Use case of concern is if user closes terminal after initial interaction with config.

Folks familiar with the shell should recognize that this is going to clear their environment variables.  Folks not familiar with the shell probably have more to learn than we want to bite off in the logs that the installer prints by default.

> 1. Proper actionable error message telling user to export env variable while running oc command.

But in the example above, Jian already had a different kubeconfig in ~/.kube/config.

  F0114 06:37:34.911960    1801 helpers.go:119] error: You must be logged in to the server (Unauthorized)

could be resolved by "point me at a better kubeconfig via one of the usual approaches" (what we want in this case), or it could be "you pointed me at an insufficient kubeconfig, run 'oc login ...' to get me credentials for this cluster".  I dunno how oc would distinguish between those to give more specific advice.

> 2. Log should be a warning, not info that indicates what needs to be done.

The *installer* is logging the export suggestion at the info level, because it's information for folks who want to use oc locally with their new cluster.  We also log info-level entries for folks that want to log into the web console.  Folks who only want the web console shouldn't have to care about the kubeconfig, so a warning-level log about the export seems overly strong.

> mac:jian3 jianzhang$ env |grep KUBE
> KUBECONFIG=/Users/jianzhang/project/aws-ocp/jian3/auth/kubeconfig
> mac:jian3 jianzhang$ oc login -u kubeadmin -p $(cat auth/kubeadmin-password)
> Error from server (InternalError): Internal error occurred: unexpected response: 400

This 400 is different from your original 401s.

> The openshift api pods log, and the pods are in running status.
>
>   E0114 10:36:58.870286       1 memcache.go:147] couldn't get resource list for route.openshift.io/v1: Unauthorized

This seems like something different too, and may be the source of the login 400s.  It would be good to drill into this error more to see if we can find the cause, but it seems distinct from "You must be logged in to the server (Unauthorized)", because it's an internal error and not an oc error.

Comment 10 W. Trevor King 2019-01-15 08:43:23 UTC

>   E0114 10:36:58.870286       1 memcache.go:147] couldn't get resource list for route.openshift.io/v1: Unauthorized

Following up on this in a cluster provided by Shengsheng Cheng:

$ ssh -i libra.pem core@ec2-...
$ export PS1='$ '
$ export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-controller-manager-pod-1/secrets/controller-manager-kubeconfig/kubeconfig
$ grep admin $KUBECONFIG
    user: admin
  name: admin
current-context: admin
- name: admin
$ oc get clusterversion version
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-01-12-000105   False       True          16h       Unable to apply 4.0.0-0.nightly-2019-01-12-000105: could not authenticate to the server
$ oc get clusterversion -o yaml version
apiVersion: config.openshift.io/v1
kind: ClusterVersion
...
status:
  availableUpdates: null
  conditions:
  - lastTransitionTime: 2019-01-14T15:12:06Z
    status: "False"
    type: Available
  - lastTransitionTime: 2019-01-14T15:12:06Z
    message: 'Could not update imagestream "openshift/cli" (image.openshift.io/v1,
      132 of 209): could not authenticate to the server'
    status: "True"
    type: Failing
...
  current:
    payload: registry.svc.ci.openshift.org/ocp/release@sha256:72372c227945f4aedb98bf9bf7df6cda30fed528d24d83d3d5216027569b4394
    version: 4.0.0-0.nightly-2019-01-12-000105
...
$ oc get pods --all-namespaces | grep cluster-version
openshift-cluster-version                                 cluster-version-operator-7788d9dd67-p9njq                                     1/1       Running                      0          1d
$ oc get pod -o 'jsonpath={.spec.containers[].name}{"\n"}' -n openshift-cluster-version cluster-version-operator-7788d9dd67-p9njq
cluster-version-operator
$ oc logs -n openshift-cluster-version cluster-version-operator-7788d9dd67-p9njq -c cluster-version-operator
E0115 07:07:58.878850       1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209): Unauthorized
I0115 07:07:58.878968       1 cvo.go:279] Finished syncing cluster version "openshift-cluster-version/version" (27.251211055s)
I0115 07:07:58.878999       1 cvo.go:264] Error syncing operator openshift-cluster-version/version: Could not update imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209): could not authenticate to the server
I0115 07:07:58.879037       1 cvo.go:277] Started syncing cluster version "openshift-cluster-version/version" (2019-01-15 07:07:58.879032196 +0000 UTC m=+101112.059771833)
...
I0115 07:08:03.041858       1 sync.go:92] Done syncing for deployment "openshift-cluster-machine-approver/machine-approver" (apps/v1, 131 of 209)
I0115 07:08:03.041886       1 sync.go:73] Running sync for imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209)
I0115 07:08:03.065229       1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:03.065221608 +0000 UTC m=+101116.245961247)
I0115 07:08:03.065278       1 availableupdates.go:35] Available updates were recently retrieved, will try later.
I0115 07:08:03.065289       1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (62.461µs)
I0115 07:08:03.172410       1 request.go:485] Throttling request took 130.317878ms, request: GET:https://127.0.0.1:6443/apis/image.openshift.io/v1/namespaces/openshift/imagestreams/cli
E0115 07:08:03.174245       1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209): Unauthorized
I0115 07:08:04.550814       1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:04.55078857 +0000 UTC m=+101117.731528195)
I0115 07:08:04.551049       1 availableupdates.go:35] Available updates were recently retrieved, will try later.
I0115 07:08:04.551108       1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (312.738µs)
I0115 07:08:04.572983       1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:04.572976174 +0000 UTC m=+101117.753715818)
I0115 07:08:04.574611       1 availableupdates.go:35] Available updates were recently retrieved, will try later.
I0115 07:08:04.574625       1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (1.64406ms)
I0115 07:08:04.668172       1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:04.668156588 +0000 UTC m=+101117.848896212)
I0115 07:08:04.668339       1 availableupdates.go:35] Available updates were recently retrieved, will try later.
I0115 07:08:04.668456       1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (291.665µs)
I0115 07:08:06.380259       1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:06.380243141 +0000 UTC m=+101119.560982785)
I0115 07:08:06.380428       1 availableupdates.go:35] Available updates were recently retrieved, will try later.
I0115 07:08:06.380530       1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (280.003µs)
I0115 07:08:06.465172       1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:06.465163296 +0000 UTC m=+101119.645903012)
I0115 07:08:06.465220       1 availableupdates.go:35] Available updates were recently retrieved, will try later.
I0115 07:08:06.465232       1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (63.615µs)
E0115 07:08:08.030225       1 memcache.go:147] couldn't get resource list for apps.openshift.io/v1: Unauthorized
E0115 07:08:08.031681       1 memcache.go:147] couldn't get resource list for authorization.openshift.io/v1: Unauthorized
E0115 07:08:08.033393       1 memcache.go:147] couldn't get resource list for build.openshift.io/v1: Unauthorized
E0115 07:08:08.034814       1 memcache.go:147] couldn't get resource list for image.openshift.io/v1: Unauthorized
E0115 07:08:08.036194       1 memcache.go:147] couldn't get resource list for oauth.openshift.io/v1: Unauthorized
E0115 07:08:08.037604       1 memcache.go:147] couldn't get resource list for project.openshift.io/v1: Unauthorized
E0115 07:08:08.039073       1 memcache.go:147] couldn't get resource list for quota.openshift.io/v1: Unauthorized
E0115 07:08:08.040694       1 memcache.go:147] couldn't get resource list for route.openshift.io/v1: Unauthorized
E0115 07:08:08.042274       1 memcache.go:147] couldn't get resource list for security.openshift.io/v1: Unauthorized
E0115 07:08:08.043795       1 memcache.go:147] couldn't get resource list for template.openshift.io/v1: Unauthorized
E0115 07:08:08.045182       1 memcache.go:147] couldn't get resource list for user.openshift.io/v1: Unauthorized
I0115 07:08:09.290663       1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:09.290647678 +0000 UTC m=+101122.471387330)
I0115 07:08:09.290729       1 availableupdates.go:35] Available updates were recently retrieved, will try later.
I0115 07:08:09.290741       1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (88.621µs)
E0115 07:08:13.185634       1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209): Unauthorized
I0115 07:08:17.462949       1 leaderelection.go:209] successfully renewed lease openshift-cluster-version/version
I0115 07:08:24.292096       1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:24.292084671 +0000 UTC m=+101137.472824217)
I0115 07:08:24.292158       1 availableupdates.go:35] Available updates were recently retrieved, will try later.
I0115 07:08:24.292169       1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (80.49µs)
E0115 07:08:26.195013       1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209): Unauthorized
I0115 07:08:26.195106       1 cvo.go:279] Finished syncing cluster version "openshift-cluster-version/version" (27.316067289s)
...
$ oc get imagestreams
error: the server doesn't have a resource type "imagestreams"

Back on my dev box:

$ oc adm release extract --from=registry.svc.ci.openshift.org/ocp/release@sha256:72372c227945f4aedb98bf9bf7df6cda30fed528d24d83d3d5216027569b4394 --to=/tmp/manifests
Extracted release payload created at 2019-01-12T00:01:11Z
$ grep -r image.openshift.io/v1 /tmp/manifests/
/tmp/manifests/0000_70_cli_01_imagestream.yaml:apiVersion: image.openshift.io/v1
/tmp/manifests/image-references:  "apiVersion": "image.openshift.io/v1",
$ cat /tmp/manifests/0000_70_cli_01_imagestream.yaml
kind: ImageStream
apiVersion: image.openshift.io/v1
metadata:
  namespace: openshift
  name: cli
spec:
  tags:
  - name: latest
    from:
      kind: DockerImage
      name: registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-12-000105@sha256:6261ebb76cd3bb9ade22be884c0208a6d45c6d0ca93dc54276ed2f46168f1a25
$ grep -ir imagestream /tmp/manifests/
/tmp/manifests/0000_70_cli_01_imagestream.yaml:kind: ImageStream
/tmp/manifests/0000_30_06-rh-operators.configmap.yaml:                - imagestreams
/tmp/manifests/0000_30_06-rh-operators.configmap.yaml:                - imagestreams/status
/tmp/manifests/image-references:  "kind": "ImageStream",

So the issue seems to be the application of the OCP version of [1] on a cluster that for some reason lacks ImageStream resources, which blocks the CVO's update application, which could cause all sorts of knock-on issues.  I'm not clear on who's responsible for setting up ImageStreams; I'd have expected those to be either a CRD (which I can't find) or to be baked into the OpenShift API server (which I can't find either, but openshift/origin is a lot bigger than the update payload's manifest set).  Does anyone know off the top of their head who's in charge of registering the ImageStream type?

I'm also still unclear about why the CVO's attempt to push the ImageStream manifest is returning "could not authenticate to the server".  But I think it may be easier to sort out "why are ImageStreams missing?" first.

[1]: https://github.com/openshift/origin/blob/1f6568ce8ca9ba2c0a771a5cdc42e592311e2f8e/images/cli/manifests/01_imagestream.yaml

Comment 11 W. Trevor King 2019-01-15 09:10:35 UTC

Ah, actually it looks like the unauthorized-ness is the fundamental issue.  Back on the broken master:

$ oc get imagestreams --loglevel=5
I0115 09:03:19.535387   32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized
...
I0115 09:03:19.563035   32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized
I0115 09:03:19.563093   32051 shortcut.go:89] Error loading discovery information: unable to retrieve the complete list of server APIs: apps.openshift.io/v1: Unauthorized, authorization.openshift.io/v1: Unauthorized, build.openshift.io/v1: Unauthorized, image.openshift.io/v1: Unauthorized, oauth.openshift.io/v1: Unauthorized, project.openshift.io/v1: Unauthorized, quota.openshift.io/v1: Unauthorized, route.openshift.io/v1: Unauthorized, security.openshift.io/v1: Unauthorized, template.openshift.io/v1: Unauthorized, user.openshift.io/v1: Unauthorized
I0115 09:03:19.568037   32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized
...
I0115 09:03:19.606258   32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized
I0115 09:03:19.609855   32051 discovery.go:215] Invalidating discovery information
I0115 09:03:19.671234   32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized
...
I0115 09:03:19.711881   32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized
F0115 09:03:19.763705   32051 helpers.go:119] error: the server doesn't have a resource type "imagestreams"

Maybe KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-controller-manager-pod-1/secrets/controller-manager-kubeconfig/kubeconfig wasn't actually giving me admin access?  Checking to see about other kubeconfigs from under /etc/kubernetes:

$ for CONFIG in $(find /etc/kubernetes -name kubeconfig); do echo "${CONFIG}"; KUBECONFIG="${CONFIG}" oc whoami; done
find: ‘/etc/kubernetes/manifests’: Permission denied
/etc/kubernetes/static-pod-resources/kube-controller-manager-pod-1/secrets/controller-manager-kubeconfig/kubeconfig
error: You must be logged in to the server (Unauthorized)
/etc/kubernetes/static-pod-resources/kube-scheduler-pod-1/secrets/scheduler-kubeconfig/kubeconfig
error: You must be logged in to the server (Unauthorized)
/etc/kubernetes/kubeconfig
error: You must be logged in to the server (Unauthorized)

That's not very promising.  It would be good to use the known-good admin kubeconfig the installer drops into the asset directory.

Comment 12 W. Trevor King 2019-01-15 09:36:43 UTC

For comparison, here's what this looks like for a real admin kubeconfig (vs. a recent OKD update payload):

$ export KUBECONFIG=wking/auth/kubeconfig
$ oc version
oc v4.0.0-alpha.0+1f6568c-902-dirty
kubernetes v1.11.0+1f6568c
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://wking-api.devcluster.openshift.com:6443
kubernetes v1.11.0+1f6568c
$ oc --loglevel=8 whoami
I0115 01:27:57.190846   21388 loader.go:359] Config loaded from file wking/auth/kubeconfig
I0115 01:27:57.191399   21388 loader.go:359] Config loaded from file wking/auth/kubeconfig
I0115 01:27:57.191859   21388 round_trippers.go:383] GET https://wking-api.devcluster.openshift.com:6443/apis/user.openshift.io/v1/users/~
I0115 01:27:57.191871   21388 round_trippers.go:390] Request Headers:
I0115 01:27:57.191878   21388 round_trippers.go:393]     Accept: application/json, */*
I0115 01:27:57.191884   21388 round_trippers.go:393]     User-Agent: oc/v1.11.0+1f6568c (linux/amd64) kubernetes/1f6568c
I0115 01:27:57.394444   21388 round_trippers.go:408] Response Status: 200 OK in 202 milliseconds
I0115 01:27:57.394527   21388 round_trippers.go:411] Response Headers:
I0115 01:27:57.394570   21388 round_trippers.go:414]     Cache-Control: no-store
I0115 01:27:57.394625   21388 round_trippers.go:414]     Cache-Control: no-store
I0115 01:27:57.394665   21388 round_trippers.go:414]     Content-Type: application/json
I0115 01:27:57.394697   21388 round_trippers.go:414]     Date: Tue, 15 Jan 2019 09:27:57 GMT
I0115 01:27:57.394734   21388 round_trippers.go:414]     Content-Length: 242
I0115 01:27:57.394890   21388 request.go:897] Response Body: {"kind":"User","apiVersion":"user.openshift.io/v1","metadata":{"name":"system:admin","selfLink":"/apis/user.openshift.io/v1/users/system%3Aadmin","creationTimestamp":null},"identities":null,"groups":["system:authenticated","system:masters"]}
system:admin
$ oc get clusterversion version
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.alpha-2019-01-15-045324   False       True          8m        Unable to apply 4.0.0-0.alpha-2019-01-15-045324: the update could not be applied
$ oc get -o yaml clusterversion version
apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
  creationTimestamp: 2019-01-15T09:19:39Z
...
status:
  availableUpdates: null
  conditions:
  - lastTransitionTime: 2019-01-15T09:20:39Z
    status: "False"
    type: Available
  - lastTransitionTime: 2019-01-15T09:21:12Z
    message: 'Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator"
      (monitoring.coreos.com/v1, 212 of 218): the server does not recognize this resource,
      check extension API servers'
    reason: UpdatePayloadResourceTypeMissing
    status: "True"
    type: Failing
  - lastTransitionTime: 2019-01-15T09:20:39Z
    message: 'Unable to apply 4.0.0-0.alpha-2019-01-15-045324: a required extension
      is not available to update'
    reason: UpdatePayloadResourceTypeMissing
    status: "True"
    type: Progressing
  - lastTransitionTime: 2019-01-15T09:20:39Z
    message: 'Unable to retrieve available updates: Get http://localhost:8080/graph:
      dial tcp [::1]:8080: connect: connection refused'
    reason: RemoteFailed
    status: "False"
    type: RetrievedUpdates
  desired:
    payload: registry.svc.ci.openshift.org/openshift/origin-release@sha256:8ee1cca11db5341bf43aa18d61634a45a48a0fa632f83d80edbfc2822f0b649c
    version: 4.0.0-0.alpha-2019-01-15-045324
  generation: 1
  history:
  - completionTime: null
    payload: registry.svc.ci.openshift.org/openshift/origin-release@sha256:8ee1cca11db5341bf43aa18d61634a45a48a0fa632f83d80edbfc2822f0b649c
    startedTime: 2019-01-15T09:20:39Z
    state: Partial
    version: 4.0.0-0.alpha-2019-01-15-045324
  versionHash: ""
$ oc get pods --all-namespaces | grep cluster-version
openshift-cluster-version                    cluster-version-operator-75f586759-r49lz                                     1/1       Running       0          10m
$ oc logs -n openshift-cluster-version cluster-version-operator-75f586759-r49lz -c cluster-version-operator
...
E0115 09:29:30.193102       1 sync.go:126] error creating resourcebuilder for servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (monitoring.coreos.com/v1, 212 of 218): failed to get resource type: no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
...
$ oc get imagestreams
No resources found.

So that cluster isn't healthy either, but at least it can find my admin user and it has image streams.

Comment 13 W. Trevor King 2019-01-15 09:45:20 UTC

Another example (from a recent CI run), showing a temporary lack of image streams that subsequently resolves itself:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1064/pull-ci-openshift-installer-master-e2e-aws/2915/artifacts/e2e-aws/pods/openshift-cluster-version_cluster-version-operator-5cc9bfc9ff-jn792_cluster-version-operator.log.gz | gunzip | grep 'error running apply\|Payload for cluster version'
E0115 03:31:53.320096       1 sync.go:133] error running apply for clusteroperator "openshift-apiserver" (config.openshift.io/v1, 68 of 218): clusteroperators.config.openshift.io "openshift-apiserver" not found
E0115 03:33:03.485454       1 sync.go:133] error running apply for clusteroperator "openshift-apiserver" (config.openshift.io/v1, 68 of 218): timed out waiting for the condition
E0115 03:33:20.834764       1 sync.go:133] error running apply for clusteroperator "openshift-controller-manager-operator" (config.openshift.io/v1, 77 of 218): clusteroperators.config.openshift.io "openshift-controller-manager-operator" not found
E0115 03:33:32.724012       1 sync.go:133] error running apply for clusteroperator "openshift-controller-manager-operator" (config.openshift.io/v1, 77 of 218): clusteroperators.config.openshift.io "openshift-controller-manager-operator" not found
E0115 03:36:44.424586       1 sync.go:133] error running apply for clusteroperator "openshift-kube-apiserver-operator" (config.openshift.io/v1, 46 of 218): Get https://127.0.0.1:6443/apis/config.openshift.io/v1/clusteroperators/openshift-kube-apiserver-operator: dial tcp 127.0.0.1:6443: connect: connection refused
E0115 03:37:25.110368       1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 128 of 218): the server could not find the requested resource
E0115 03:37:35.496992       1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 128 of 218): the server could not find the requested resource
I0115 03:39:49.267497       1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced
I0115 03:42:40.979861       1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced
I0115 03:45:41.091330       1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced
I0115 03:48:57.962396       1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced
I0115 03:52:46.571608       1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced
I0115 03:55:42.047818       1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced
I0115 03:58:42.020924       1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced
I0115 04:01:58.341061       1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced
I0115 04:05:00.278205       1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced
I0115 04:08:03.444130       1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced

Comment 14 Jian Zhang 2019-01-15 09:54:02 UTC

Trevor,

> This 400 is different from your original 401s.

The 400 is the result of the `oc login -u kubeadmin -p $(cat auth/kubeadmin-password)`, but the 401 is the result of the `oc whoami --loglevel=8`.

mac:jian3 jianzhang$ oc whoami --loglevel=8
I0115 17:49:29.640105   95331 loader.go:359] Config loaded from file /Users/jianzhang/project/aws-ocp/jian3/auth/kubeconfig
I0115 17:49:29.641092   95331 loader.go:359] Config loaded from file /Users/jianzhang/project/aws-ocp/jian3/auth/kubeconfig
I0115 17:49:29.641606   95331 round_trippers.go:383] GET https://qe-jian3-api.qe.devcluster.openshift.com:6443/apis/user.openshift.io/v1/users/~
I0115 17:49:29.641618   95331 round_trippers.go:390] Request Headers:
I0115 17:49:29.641624   95331 round_trippers.go:393]     Accept: application/json, */*
I0115 17:49:29.641630   95331 round_trippers.go:393]     User-Agent: oc/v1.11.0+031e5ec2a7 (darwin/amd64) kubernetes/031e5ec
I0115 17:49:30.825584   95331 round_trippers.go:408] Response Status: 401 Unauthorized in 1183 milliseconds
I0115 17:49:30.825625   95331 round_trippers.go:411] Response Headers:
I0115 17:49:30.825640   95331 round_trippers.go:414]     Content-Length: 129
I0115 17:49:30.825651   95331 round_trippers.go:414]     Audit-Id: 1db6c5d1-3ea5-4d69-a4df-c567e6c4d729
I0115 17:49:30.825663   95331 round_trippers.go:414]     Cache-Control: no-store
I0115 17:49:30.825675   95331 round_trippers.go:414]     Cache-Control: no-store
I0115 17:49:30.825686   95331 round_trippers.go:414]     Content-Type: application/json
I0115 17:49:30.825698   95331 round_trippers.go:414]     Date: Tue, 15 Jan 2019 09:49:28 GMT
I0115 17:49:30.825758   95331 request.go:897] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}
I0115 17:49:30.826537   95331 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}]
F0115 17:49:30.826569   95331 helpers.go:119] error: You must be logged in to the server (Unauthorized)

Comment 15 Jian Zhang 2019-01-15 10:00:03 UTC

Trevor,

>  a recent OKD update payload

Could you have this checking with the OCP cluster? I have already attached the Kubeconfig, so you can debug on it to find the root cause.

Comment 16 W. Trevor King 2019-01-15 10:03:33 UTC

Ah, thanks for the follow-up.  And yeah, I can reproduce that with the admin kubeconfig for Shengsheng's cluster already (I expect it reproduces the same way with yours).  But I don't know where to go from here.  Hopefully someone from the OpenShift API server or auth teams has better luck than me ;).

Comment 17 Xiaoli Tian 2019-01-15 13:51:09 UTC

*** Bug 1665935 has been marked as a duplicate of this bug. ***

Comment 18 Erica von Buelow 2019-01-15 16:27:06 UTC

We suspect the issue involves the aggregator-client certificate rotation, based on the errors in the logs showing certificate problems, issues with openshift-apis, and that certificate rotation after 24h was added around the time things stopped working. We are investigating more and attempting to confirm our suspicions.

Comment 19 Neelesh Agrawal 2019-01-15 19:09:27 UTC

Adding Master team in CC, see if they agree to comment 18.

Comment 20 David Eads 2019-01-15 19:15:09 UTC

We think this problem was resolved by https://github.com/openshift/cluster-openshift-apiserver-operator/pull/107 .

Comment 25 David Eads 2019-01-15 20:56:21 UTC

If this happens again, deleting the pods in openshift-apiserver should rekick and clear it.  `oc -n openshift-apiserver delete pods --all` should cause new pods to start with the new config.

Comment 26 W. Trevor King 2019-01-15 22:09:46 UTC

> We think this problem was resolved by https://github.com/openshift/cluster-openshift-apiserver-operator/pull/107 .

This just made it into the 0.10.0 installer release [1]:

$ oc adm release info --pullspecs quay.io/openshift-release-dev/ocp-release:4.0.0-0.1 | grep openshift-apiserver
  cluster-openshift-apiserver-operator          quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ec66e1d488f16e852565e07319b4697222a4a5f7c16851efe0ed6dcb51e29c48
$ oc image info quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ec66e1d488f16e852565e07319b4697222a4a5f7c16851efe0ed6dcb51e29c48 | grep commit-url
             io.openshift.source-commit-url=https://github.com/openshift/cluster-openshift-apiserver-operator/commit/782e56ab163f79bd794f2343c8ab2a84bb825197

The follow commits are currently not part of that update payload:

$ git log --first-parent --oneline 782e56ab..origin/master
2f7c281 Merge pull request #109 from deads2k/fixworkload
3245112 Merge pull request #104 from bparees/reporting

[1]: https://github.com/openshift/installer/blob/v0.10.0/hack/build.sh#L5

Comment 27 Chuan Yu 2019-01-16 02:17:58 UTC

The workaround https://bugzilla.redhat.com/show_bug.cgi?id=1665842#c25 is workable for the affected cluster, after `oc -n openshift-apiserver delete pods --all`, the cluster back to normal.

Also the issue should be fixed in 4.0.0-0.nightly-2019-01-15-010905, cluster created with this build was running about 23 hours, no such issue found.

Comment 28 Chuan Yu 2019-01-16 03:35:08 UTC

The issue also fixed in quay.io/openshift-release-dev/ocp-release:4.0.0-0.1, don't need the workaround https://bugzilla.redhat.com/show_bug.cgi?id=1665842#c25 on it.

Comment 32 errata-xmlrpc 2019-06-04 10:41:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758