Hide Forgot
Description of problem: After running the OCP 4.0 cluster for one day, got below errors: error: You must be logged in to the server (Unauthorized) Version-Release number of selected component (if applicable): clusterversion: 4.0.0-0.nightly-2019-01-12-000105 mac:~ jianzhang$ openshift-install version openshift-install v0.9.1 How reproducible: always Steps to Reproduce: 1. Create the OCP 4.0 by using the openshift-installer. 2. After running a day. 3. run "oc project" or "oc whoami" Actual results: [core@ip-10-0-37-160 ~]$ oc project error: You must be logged in to the server (Unauthorized) [core@ip-10-0-37-160 ~]$ oc whoami error: You must be logged in to the server (Unauthorized) Expected results: Can output the correct user info. Additional info: [core@ip-10-0-37-160 ~]$ oc whoami --loglevel=8 I0114 06:37:34.885108 1801 loader.go:359] Config loaded from file /home/core/.kube/config I0114 06:37:34.886843 1801 loader.go:359] Config loaded from file /home/core/.kube/config I0114 06:37:34.887525 1801 round_trippers.go:383] GET https://qe-jian3-api.qe.devcluster.openshift.com:6443/apis/user.openshift.io/v1/users/~ I0114 06:37:34.887561 1801 round_trippers.go:390] Request Headers: I0114 06:37:34.887591 1801 round_trippers.go:393] Accept: application/json, */* I0114 06:37:34.887606 1801 round_trippers.go:393] User-Agent: oc/v1.11.0+406fc897d8 (linux/amd64) kubernetes/406fc89 I0114 06:37:34.911195 1801 round_trippers.go:408] Response Status: 401 Unauthorized in 23 milliseconds I0114 06:37:34.911231 1801 round_trippers.go:411] Response Headers: I0114 06:37:34.911247 1801 round_trippers.go:414] Audit-Id: 729dbf77-cf0a-4e12-a492-22e3ee0ee626 I0114 06:37:34.911282 1801 round_trippers.go:414] Cache-Control: no-store I0114 06:37:34.911302 1801 round_trippers.go:414] Cache-Control: no-store I0114 06:37:34.911316 1801 round_trippers.go:414] Content-Type: application/json I0114 06:37:34.911346 1801 round_trippers.go:414] Date: Mon, 14 Jan 2019 06:37:34 GMT I0114 06:37:34.911361 1801 round_trippers.go:414] Content-Length: 129 I0114 06:37:34.911407 1801 request.go:897] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401} I0114 06:37:34.911882 1801 helpers.go:201] server response object: [{ "metadata": {}, "status": "Failure", "message": "Unauthorized", "reason": "Unauthorized", "code": 401 }] F0114 06:37:34.911960 1801 helpers.go:119] error: You must be logged in to the server (Unauthorized) Check the logs of openshift-kube-apiserver: [core@ip-10-0-37-160 ~]$ sudo crictl logs e9030e599470f E0114 06:02:03.403729 1 reflector.go:136] github.com/openshift/origin/pkg/quota/generated/informers/internalversion/factory.go:101: Failed to list *quota.ClusterResourceQuota: Unauthorized E0114 06:02:03.404807 1 reflector.go:136] github.com/openshift/client-go/security/informers/externalversions/factory.go:101: Failed to list *v1.SecurityContextConstraints: Unauthorized E0114 06:02:03.406461 1 reflector.go:136] github.com/openshift/client-go/user/informers/externalversions/factory.go:101: Failed to list *v1.Group: Unauthorized I0114 06:02:03.750760 1 logs.go:49] http: TLS handshake error from 10.0.10.180:19302: EOF E0114 06:02:03.752768 1 oauth_apiserver.go:326] Unauthorized I0114 06:02:03.899381 1 trace.go:76] Trace[102609404]: "GuaranteedUpdate etcd3: *core.ConfigMap" (started: 2019-01-14 06:01:53.898823636 +0000 UTC m=+9986.385772930) (total time: 10.00052286s): $ oc logs apiserver-7cwpn -n openshift-apiserver E0114 02:25:24.611324 1 authentication.go:62] Unable to authenticate the request due to an error: [x509: certificate signed by unknown authority, x509: certificate signed by unknown authority]
> [core@ip-10-0-37-160 ~]$ oc whoami --loglevel=8 > I0114 06:37:34.885108 1801 loader.go:359] Config loaded from file /home/core/.kube/config Looks like you forgot to export KUBECONFIG [1]? [1]: https://github.com/openshift/installer/blame/v0.9.1/README.md#L39
Responding to some internal discussion: > Use case of concern is if user closes terminal after initial interaction with config. Folks familiar with the shell should recognize that this is going to clear their environment variables. Folks not familiar with the shell probably have more to learn than we want to bite off in the logs that the installer prints by default. > 1. Proper actionable error message telling user to export env variable while running oc command. But in the example above, Jian already had a different kubeconfig in ~/.kube/config. F0114 06:37:34.911960 1801 helpers.go:119] error: You must be logged in to the server (Unauthorized) could be resolved by "point me at a better kubeconfig via one of the usual approaches" (what we want in this case), or it could be "you pointed me at an insufficient kubeconfig, run 'oc login ...' to get me credentials for this cluster". I dunno how oc would distinguish between those to give more specific advice. > 2. Log should be a warning, not info that indicates what needs to be done. The *installer* is logging the export suggestion at the info level, because it's information for folks who want to use oc locally with their new cluster. We also log info-level entries for folks that want to log into the web console. Folks who only want the web console shouldn't have to care about the kubeconfig, so a warning-level log about the export seems overly strong. > mac:jian3 jianzhang$ env |grep KUBE > KUBECONFIG=/Users/jianzhang/project/aws-ocp/jian3/auth/kubeconfig > mac:jian3 jianzhang$ oc login -u kubeadmin -p $(cat auth/kubeadmin-password) > Error from server (InternalError): Internal error occurred: unexpected response: 400 This 400 is different from your original 401s. > The openshift api pods log, and the pods are in running status. > > E0114 10:36:58.870286 1 memcache.go:147] couldn't get resource list for route.openshift.io/v1: Unauthorized This seems like something different too, and may be the source of the login 400s. It would be good to drill into this error more to see if we can find the cause, but it seems distinct from "You must be logged in to the server (Unauthorized)", because it's an internal error and not an oc error.
> E0114 10:36:58.870286 1 memcache.go:147] couldn't get resource list for route.openshift.io/v1: Unauthorized Following up on this in a cluster provided by Shengsheng Cheng: $ ssh -i libra.pem core@ec2-... $ export PS1='$ ' $ export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-controller-manager-pod-1/secrets/controller-manager-kubeconfig/kubeconfig $ grep admin $KUBECONFIG user: admin name: admin current-context: admin - name: admin $ oc get clusterversion version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-01-12-000105 False True 16h Unable to apply 4.0.0-0.nightly-2019-01-12-000105: could not authenticate to the server $ oc get clusterversion -o yaml version apiVersion: config.openshift.io/v1 kind: ClusterVersion ... status: availableUpdates: null conditions: - lastTransitionTime: 2019-01-14T15:12:06Z status: "False" type: Available - lastTransitionTime: 2019-01-14T15:12:06Z message: 'Could not update imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209): could not authenticate to the server' status: "True" type: Failing ... current: payload: registry.svc.ci.openshift.org/ocp/release@sha256:72372c227945f4aedb98bf9bf7df6cda30fed528d24d83d3d5216027569b4394 version: 4.0.0-0.nightly-2019-01-12-000105 ... $ oc get pods --all-namespaces | grep cluster-version openshift-cluster-version cluster-version-operator-7788d9dd67-p9njq 1/1 Running 0 1d $ oc get pod -o 'jsonpath={.spec.containers[].name}{"\n"}' -n openshift-cluster-version cluster-version-operator-7788d9dd67-p9njq cluster-version-operator $ oc logs -n openshift-cluster-version cluster-version-operator-7788d9dd67-p9njq -c cluster-version-operator E0115 07:07:58.878850 1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209): Unauthorized I0115 07:07:58.878968 1 cvo.go:279] Finished syncing cluster version "openshift-cluster-version/version" (27.251211055s) I0115 07:07:58.878999 1 cvo.go:264] Error syncing operator openshift-cluster-version/version: Could not update imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209): could not authenticate to the server I0115 07:07:58.879037 1 cvo.go:277] Started syncing cluster version "openshift-cluster-version/version" (2019-01-15 07:07:58.879032196 +0000 UTC m=+101112.059771833) ... I0115 07:08:03.041858 1 sync.go:92] Done syncing for deployment "openshift-cluster-machine-approver/machine-approver" (apps/v1, 131 of 209) I0115 07:08:03.041886 1 sync.go:73] Running sync for imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209) I0115 07:08:03.065229 1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:03.065221608 +0000 UTC m=+101116.245961247) I0115 07:08:03.065278 1 availableupdates.go:35] Available updates were recently retrieved, will try later. I0115 07:08:03.065289 1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (62.461µs) I0115 07:08:03.172410 1 request.go:485] Throttling request took 130.317878ms, request: GET:https://127.0.0.1:6443/apis/image.openshift.io/v1/namespaces/openshift/imagestreams/cli E0115 07:08:03.174245 1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209): Unauthorized I0115 07:08:04.550814 1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:04.55078857 +0000 UTC m=+101117.731528195) I0115 07:08:04.551049 1 availableupdates.go:35] Available updates were recently retrieved, will try later. I0115 07:08:04.551108 1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (312.738µs) I0115 07:08:04.572983 1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:04.572976174 +0000 UTC m=+101117.753715818) I0115 07:08:04.574611 1 availableupdates.go:35] Available updates were recently retrieved, will try later. I0115 07:08:04.574625 1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (1.64406ms) I0115 07:08:04.668172 1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:04.668156588 +0000 UTC m=+101117.848896212) I0115 07:08:04.668339 1 availableupdates.go:35] Available updates were recently retrieved, will try later. I0115 07:08:04.668456 1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (291.665µs) I0115 07:08:06.380259 1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:06.380243141 +0000 UTC m=+101119.560982785) I0115 07:08:06.380428 1 availableupdates.go:35] Available updates were recently retrieved, will try later. I0115 07:08:06.380530 1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (280.003µs) I0115 07:08:06.465172 1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:06.465163296 +0000 UTC m=+101119.645903012) I0115 07:08:06.465220 1 availableupdates.go:35] Available updates were recently retrieved, will try later. I0115 07:08:06.465232 1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (63.615µs) E0115 07:08:08.030225 1 memcache.go:147] couldn't get resource list for apps.openshift.io/v1: Unauthorized E0115 07:08:08.031681 1 memcache.go:147] couldn't get resource list for authorization.openshift.io/v1: Unauthorized E0115 07:08:08.033393 1 memcache.go:147] couldn't get resource list for build.openshift.io/v1: Unauthorized E0115 07:08:08.034814 1 memcache.go:147] couldn't get resource list for image.openshift.io/v1: Unauthorized E0115 07:08:08.036194 1 memcache.go:147] couldn't get resource list for oauth.openshift.io/v1: Unauthorized E0115 07:08:08.037604 1 memcache.go:147] couldn't get resource list for project.openshift.io/v1: Unauthorized E0115 07:08:08.039073 1 memcache.go:147] couldn't get resource list for quota.openshift.io/v1: Unauthorized E0115 07:08:08.040694 1 memcache.go:147] couldn't get resource list for route.openshift.io/v1: Unauthorized E0115 07:08:08.042274 1 memcache.go:147] couldn't get resource list for security.openshift.io/v1: Unauthorized E0115 07:08:08.043795 1 memcache.go:147] couldn't get resource list for template.openshift.io/v1: Unauthorized E0115 07:08:08.045182 1 memcache.go:147] couldn't get resource list for user.openshift.io/v1: Unauthorized I0115 07:08:09.290663 1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:09.290647678 +0000 UTC m=+101122.471387330) I0115 07:08:09.290729 1 availableupdates.go:35] Available updates were recently retrieved, will try later. I0115 07:08:09.290741 1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (88.621µs) E0115 07:08:13.185634 1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209): Unauthorized I0115 07:08:17.462949 1 leaderelection.go:209] successfully renewed lease openshift-cluster-version/version I0115 07:08:24.292096 1 cvo.go:350] Started syncing available updates "openshift-cluster-version/version" (2019-01-15 07:08:24.292084671 +0000 UTC m=+101137.472824217) I0115 07:08:24.292158 1 availableupdates.go:35] Available updates were recently retrieved, will try later. I0115 07:08:24.292169 1 cvo.go:352] Finished syncing available updates "openshift-cluster-version/version" (80.49µs) E0115 07:08:26.195013 1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 132 of 209): Unauthorized I0115 07:08:26.195106 1 cvo.go:279] Finished syncing cluster version "openshift-cluster-version/version" (27.316067289s) ... $ oc get imagestreams error: the server doesn't have a resource type "imagestreams" Back on my dev box: $ oc adm release extract --from=registry.svc.ci.openshift.org/ocp/release@sha256:72372c227945f4aedb98bf9bf7df6cda30fed528d24d83d3d5216027569b4394 --to=/tmp/manifests Extracted release payload created at 2019-01-12T00:01:11Z $ grep -r image.openshift.io/v1 /tmp/manifests/ /tmp/manifests/0000_70_cli_01_imagestream.yaml:apiVersion: image.openshift.io/v1 /tmp/manifests/image-references: "apiVersion": "image.openshift.io/v1", $ cat /tmp/manifests/0000_70_cli_01_imagestream.yaml kind: ImageStream apiVersion: image.openshift.io/v1 metadata: namespace: openshift name: cli spec: tags: - name: latest from: kind: DockerImage name: registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-12-000105@sha256:6261ebb76cd3bb9ade22be884c0208a6d45c6d0ca93dc54276ed2f46168f1a25 $ grep -ir imagestream /tmp/manifests/ /tmp/manifests/0000_70_cli_01_imagestream.yaml:kind: ImageStream /tmp/manifests/0000_30_06-rh-operators.configmap.yaml: - imagestreams /tmp/manifests/0000_30_06-rh-operators.configmap.yaml: - imagestreams/status /tmp/manifests/image-references: "kind": "ImageStream", So the issue seems to be the application of the OCP version of [1] on a cluster that for some reason lacks ImageStream resources, which blocks the CVO's update application, which could cause all sorts of knock-on issues. I'm not clear on who's responsible for setting up ImageStreams; I'd have expected those to be either a CRD (which I can't find) or to be baked into the OpenShift API server (which I can't find either, but openshift/origin is a lot bigger than the update payload's manifest set). Does anyone know off the top of their head who's in charge of registering the ImageStream type? I'm also still unclear about why the CVO's attempt to push the ImageStream manifest is returning "could not authenticate to the server". But I think it may be easier to sort out "why are ImageStreams missing?" first. [1]: https://github.com/openshift/origin/blob/1f6568ce8ca9ba2c0a771a5cdc42e592311e2f8e/images/cli/manifests/01_imagestream.yaml
Ah, actually it looks like the unauthorized-ness is the fundamental issue. Back on the broken master: $ oc get imagestreams --loglevel=5 I0115 09:03:19.535387 32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized ... I0115 09:03:19.563035 32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized I0115 09:03:19.563093 32051 shortcut.go:89] Error loading discovery information: unable to retrieve the complete list of server APIs: apps.openshift.io/v1: Unauthorized, authorization.openshift.io/v1: Unauthorized, build.openshift.io/v1: Unauthorized, image.openshift.io/v1: Unauthorized, oauth.openshift.io/v1: Unauthorized, project.openshift.io/v1: Unauthorized, quota.openshift.io/v1: Unauthorized, route.openshift.io/v1: Unauthorized, security.openshift.io/v1: Unauthorized, template.openshift.io/v1: Unauthorized, user.openshift.io/v1: Unauthorized I0115 09:03:19.568037 32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized ... I0115 09:03:19.606258 32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized I0115 09:03:19.609855 32051 discovery.go:215] Invalidating discovery information I0115 09:03:19.671234 32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized ... I0115 09:03:19.711881 32051 cached_discovery.go:77] skipped caching discovery info due to Unauthorized F0115 09:03:19.763705 32051 helpers.go:119] error: the server doesn't have a resource type "imagestreams" Maybe KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-controller-manager-pod-1/secrets/controller-manager-kubeconfig/kubeconfig wasn't actually giving me admin access? Checking to see about other kubeconfigs from under /etc/kubernetes: $ for CONFIG in $(find /etc/kubernetes -name kubeconfig); do echo "${CONFIG}"; KUBECONFIG="${CONFIG}" oc whoami; done find: ‘/etc/kubernetes/manifests’: Permission denied /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-1/secrets/controller-manager-kubeconfig/kubeconfig error: You must be logged in to the server (Unauthorized) /etc/kubernetes/static-pod-resources/kube-scheduler-pod-1/secrets/scheduler-kubeconfig/kubeconfig error: You must be logged in to the server (Unauthorized) /etc/kubernetes/kubeconfig error: You must be logged in to the server (Unauthorized) That's not very promising. It would be good to use the known-good admin kubeconfig the installer drops into the asset directory.
For comparison, here's what this looks like for a real admin kubeconfig (vs. a recent OKD update payload): $ export KUBECONFIG=wking/auth/kubeconfig $ oc version oc v4.0.0-alpha.0+1f6568c-902-dirty kubernetes v1.11.0+1f6568c features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://wking-api.devcluster.openshift.com:6443 kubernetes v1.11.0+1f6568c $ oc --loglevel=8 whoami I0115 01:27:57.190846 21388 loader.go:359] Config loaded from file wking/auth/kubeconfig I0115 01:27:57.191399 21388 loader.go:359] Config loaded from file wking/auth/kubeconfig I0115 01:27:57.191859 21388 round_trippers.go:383] GET https://wking-api.devcluster.openshift.com:6443/apis/user.openshift.io/v1/users/~ I0115 01:27:57.191871 21388 round_trippers.go:390] Request Headers: I0115 01:27:57.191878 21388 round_trippers.go:393] Accept: application/json, */* I0115 01:27:57.191884 21388 round_trippers.go:393] User-Agent: oc/v1.11.0+1f6568c (linux/amd64) kubernetes/1f6568c I0115 01:27:57.394444 21388 round_trippers.go:408] Response Status: 200 OK in 202 milliseconds I0115 01:27:57.394527 21388 round_trippers.go:411] Response Headers: I0115 01:27:57.394570 21388 round_trippers.go:414] Cache-Control: no-store I0115 01:27:57.394625 21388 round_trippers.go:414] Cache-Control: no-store I0115 01:27:57.394665 21388 round_trippers.go:414] Content-Type: application/json I0115 01:27:57.394697 21388 round_trippers.go:414] Date: Tue, 15 Jan 2019 09:27:57 GMT I0115 01:27:57.394734 21388 round_trippers.go:414] Content-Length: 242 I0115 01:27:57.394890 21388 request.go:897] Response Body: {"kind":"User","apiVersion":"user.openshift.io/v1","metadata":{"name":"system:admin","selfLink":"/apis/user.openshift.io/v1/users/system%3Aadmin","creationTimestamp":null},"identities":null,"groups":["system:authenticated","system:masters"]} system:admin $ oc get clusterversion version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.alpha-2019-01-15-045324 False True 8m Unable to apply 4.0.0-0.alpha-2019-01-15-045324: the update could not be applied $ oc get -o yaml clusterversion version apiVersion: config.openshift.io/v1 kind: ClusterVersion metadata: creationTimestamp: 2019-01-15T09:19:39Z ... status: availableUpdates: null conditions: - lastTransitionTime: 2019-01-15T09:20:39Z status: "False" type: Available - lastTransitionTime: 2019-01-15T09:21:12Z message: 'Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (monitoring.coreos.com/v1, 212 of 218): the server does not recognize this resource, check extension API servers' reason: UpdatePayloadResourceTypeMissing status: "True" type: Failing - lastTransitionTime: 2019-01-15T09:20:39Z message: 'Unable to apply 4.0.0-0.alpha-2019-01-15-045324: a required extension is not available to update' reason: UpdatePayloadResourceTypeMissing status: "True" type: Progressing - lastTransitionTime: 2019-01-15T09:20:39Z message: 'Unable to retrieve available updates: Get http://localhost:8080/graph: dial tcp [::1]:8080: connect: connection refused' reason: RemoteFailed status: "False" type: RetrievedUpdates desired: payload: registry.svc.ci.openshift.org/openshift/origin-release@sha256:8ee1cca11db5341bf43aa18d61634a45a48a0fa632f83d80edbfc2822f0b649c version: 4.0.0-0.alpha-2019-01-15-045324 generation: 1 history: - completionTime: null payload: registry.svc.ci.openshift.org/openshift/origin-release@sha256:8ee1cca11db5341bf43aa18d61634a45a48a0fa632f83d80edbfc2822f0b649c startedTime: 2019-01-15T09:20:39Z state: Partial version: 4.0.0-0.alpha-2019-01-15-045324 versionHash: "" $ oc get pods --all-namespaces | grep cluster-version openshift-cluster-version cluster-version-operator-75f586759-r49lz 1/1 Running 0 10m $ oc logs -n openshift-cluster-version cluster-version-operator-75f586759-r49lz -c cluster-version-operator ... E0115 09:29:30.193102 1 sync.go:126] error creating resourcebuilder for servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (monitoring.coreos.com/v1, 212 of 218): failed to get resource type: no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1" ... $ oc get imagestreams No resources found. So that cluster isn't healthy either, but at least it can find my admin user and it has image streams.
Another example (from a recent CI run), showing a temporary lack of image streams that subsequently resolves itself: $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1064/pull-ci-openshift-installer-master-e2e-aws/2915/artifacts/e2e-aws/pods/openshift-cluster-version_cluster-version-operator-5cc9bfc9ff-jn792_cluster-version-operator.log.gz | gunzip | grep 'error running apply\|Payload for cluster version' E0115 03:31:53.320096 1 sync.go:133] error running apply for clusteroperator "openshift-apiserver" (config.openshift.io/v1, 68 of 218): clusteroperators.config.openshift.io "openshift-apiserver" not found E0115 03:33:03.485454 1 sync.go:133] error running apply for clusteroperator "openshift-apiserver" (config.openshift.io/v1, 68 of 218): timed out waiting for the condition E0115 03:33:20.834764 1 sync.go:133] error running apply for clusteroperator "openshift-controller-manager-operator" (config.openshift.io/v1, 77 of 218): clusteroperators.config.openshift.io "openshift-controller-manager-operator" not found E0115 03:33:32.724012 1 sync.go:133] error running apply for clusteroperator "openshift-controller-manager-operator" (config.openshift.io/v1, 77 of 218): clusteroperators.config.openshift.io "openshift-controller-manager-operator" not found E0115 03:36:44.424586 1 sync.go:133] error running apply for clusteroperator "openshift-kube-apiserver-operator" (config.openshift.io/v1, 46 of 218): Get https://127.0.0.1:6443/apis/config.openshift.io/v1/clusteroperators/openshift-kube-apiserver-operator: dial tcp 127.0.0.1:6443: connect: connection refused E0115 03:37:25.110368 1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 128 of 218): the server could not find the requested resource E0115 03:37:35.496992 1 sync.go:133] error running apply for imagestream "openshift/cli" (image.openshift.io/v1, 128 of 218): the server could not find the requested resource I0115 03:39:49.267497 1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced I0115 03:42:40.979861 1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced I0115 03:45:41.091330 1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced I0115 03:48:57.962396 1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced I0115 03:52:46.571608 1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced I0115 03:55:42.047818 1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced I0115 03:58:42.020924 1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced I0115 04:01:58.341061 1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced I0115 04:05:00.278205 1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced I0115 04:08:03.444130 1 cvo.go:339] Payload for cluster version 0.0.1-2019-01-15-031415 synced
Trevor, > This 400 is different from your original 401s. The 400 is the result of the `oc login -u kubeadmin -p $(cat auth/kubeadmin-password)`, but the 401 is the result of the `oc whoami --loglevel=8`. mac:jian3 jianzhang$ oc whoami --loglevel=8 I0115 17:49:29.640105 95331 loader.go:359] Config loaded from file /Users/jianzhang/project/aws-ocp/jian3/auth/kubeconfig I0115 17:49:29.641092 95331 loader.go:359] Config loaded from file /Users/jianzhang/project/aws-ocp/jian3/auth/kubeconfig I0115 17:49:29.641606 95331 round_trippers.go:383] GET https://qe-jian3-api.qe.devcluster.openshift.com:6443/apis/user.openshift.io/v1/users/~ I0115 17:49:29.641618 95331 round_trippers.go:390] Request Headers: I0115 17:49:29.641624 95331 round_trippers.go:393] Accept: application/json, */* I0115 17:49:29.641630 95331 round_trippers.go:393] User-Agent: oc/v1.11.0+031e5ec2a7 (darwin/amd64) kubernetes/031e5ec I0115 17:49:30.825584 95331 round_trippers.go:408] Response Status: 401 Unauthorized in 1183 milliseconds I0115 17:49:30.825625 95331 round_trippers.go:411] Response Headers: I0115 17:49:30.825640 95331 round_trippers.go:414] Content-Length: 129 I0115 17:49:30.825651 95331 round_trippers.go:414] Audit-Id: 1db6c5d1-3ea5-4d69-a4df-c567e6c4d729 I0115 17:49:30.825663 95331 round_trippers.go:414] Cache-Control: no-store I0115 17:49:30.825675 95331 round_trippers.go:414] Cache-Control: no-store I0115 17:49:30.825686 95331 round_trippers.go:414] Content-Type: application/json I0115 17:49:30.825698 95331 round_trippers.go:414] Date: Tue, 15 Jan 2019 09:49:28 GMT I0115 17:49:30.825758 95331 request.go:897] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401} I0115 17:49:30.826537 95331 helpers.go:201] server response object: [{ "metadata": {}, "status": "Failure", "message": "Unauthorized", "reason": "Unauthorized", "code": 401 }] F0115 17:49:30.826569 95331 helpers.go:119] error: You must be logged in to the server (Unauthorized)
Trevor, > a recent OKD update payload Could you have this checking with the OCP cluster? I have already attached the Kubeconfig, so you can debug on it to find the root cause.
Ah, thanks for the follow-up. And yeah, I can reproduce that with the admin kubeconfig for Shengsheng's cluster already (I expect it reproduces the same way with yours). But I don't know where to go from here. Hopefully someone from the OpenShift API server or auth teams has better luck than me ;).
*** Bug 1665935 has been marked as a duplicate of this bug. ***
We suspect the issue involves the aggregator-client certificate rotation, based on the errors in the logs showing certificate problems, issues with openshift-apis, and that certificate rotation after 24h was added around the time things stopped working. We are investigating more and attempting to confirm our suspicions.
Adding Master team in CC, see if they agree to comment 18.
We think this problem was resolved by https://github.com/openshift/cluster-openshift-apiserver-operator/pull/107 .
If this happens again, deleting the pods in openshift-apiserver should rekick and clear it. `oc -n openshift-apiserver delete pods --all` should cause new pods to start with the new config.
> We think this problem was resolved by https://github.com/openshift/cluster-openshift-apiserver-operator/pull/107 . This just made it into the 0.10.0 installer release [1]: $ oc adm release info --pullspecs quay.io/openshift-release-dev/ocp-release:4.0.0-0.1 | grep openshift-apiserver cluster-openshift-apiserver-operator quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ec66e1d488f16e852565e07319b4697222a4a5f7c16851efe0ed6dcb51e29c48 $ oc image info quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ec66e1d488f16e852565e07319b4697222a4a5f7c16851efe0ed6dcb51e29c48 | grep commit-url io.openshift.source-commit-url=https://github.com/openshift/cluster-openshift-apiserver-operator/commit/782e56ab163f79bd794f2343c8ab2a84bb825197 The follow commits are currently not part of that update payload: $ git log --first-parent --oneline 782e56ab..origin/master 2f7c281 Merge pull request #109 from deads2k/fixworkload 3245112 Merge pull request #104 from bparees/reporting [1]: https://github.com/openshift/installer/blob/v0.10.0/hack/build.sh#L5
The workaround https://bugzilla.redhat.com/show_bug.cgi?id=1665842#c25 is workable for the affected cluster, after `oc -n openshift-apiserver delete pods --all`, the cluster back to normal. Also the issue should be fixed in 4.0.0-0.nightly-2019-01-15-010905, cluster created with this build was running about 23 hours, no such issue found.
The issue also fixed in quay.io/openshift-release-dev/ocp-release:4.0.0-0.1, don't need the workaround https://bugzilla.redhat.com/show_bug.cgi?id=1665842#c25 on it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758