Hide Forgot
Description of problem: fail [k8s.io/kubernetes/test/e2e/framework/util.go:2396]: Expected error: <*errors.errorString | 0xc421fb9920>: { s: "failed to get logs from pod-secrets-d75c0e4f-51d8-11e9-9953-0a58ac101164 for secret-env-test: an error on the server (\"unknown\") has prevented the request from succeeding (get pods pod-secrets-d75c0e4f-51d8-11e9-9953-0a58ac101164)", } failed to get logs from pod-secrets-d75c0e4f-51d8-11e9-9953-0a58ac101164 for secret-env-test: an error on the server ("unknown") has prevented the request from succeeding (get pods pod-secrets-d75c0e4f-51d8-11e9-9953-0a58ac101164) not to have occurred https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/726 Seems like the kube api server had a failure responding to requests during the upgrade.
recurrence: https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/718
Ben, do we have info about how often we do see this flake?
The two i linked were the 2 i saw in the 2 days of history i went through, but you can query all job runs for the last 7 days here: https://search.svc.ci.openshift.org/?search=failed+to+get+logs+from+pod&maxAge=168h&context=2&type=all
Seeing consistent failures on this test. The search linked above is not picking up failures in the last 12 hours. https://openshift-gce-devel.appspot.com/builds/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/ (Build Cop)
https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/908 CVO reported successful upgrade ie Available at 12:58:01 but the completion is at 13:38:42. { "apiVersion": "v1", "items": [ { "apiVersion": "config.openshift.io/v1", "kind": "ClusterVersion", "metadata": { "creationTimestamp": "2019-04-04T12:38:16Z", "generation": 2, "name": "version", "resourceVersion": "46307", "selfLink": "/apis/config.openshift.io/v1/clusterversions/version", "uid": "850c19ae-56d6-11e9-97a2-122b11cdb986" }, "spec": { "channel": "stable-4.0", "clusterID": "5250a589-158a-42c9-a86b-e312876f4705", "desiredUpdate": { "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-04-04-121901", "version": "" }, "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph" }, "status": { "availableUpdates": null, "conditions": [ { "lastTransitionTime": "2019-04-04T12:58:01Z", "message": "Done applying 4.0.0-0.ci-2019-04-04-121901", "status": "True", "type": "Available" }, { "lastTransitionTime": "2019-04-04T13:43:27Z", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2019-04-04T13:48:42Z", "message": "Cluster version is 4.0.0-0.ci-2019-04-04-121901", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2019-04-04T12:38:36Z", "message": "Unable to retrieve available updates: unknown version 4.0.0-0.ci-2019-04-04-121901", "reason": "RemoteFailed", "status": "False", "type": "RetrievedUpdates" } ], "desired": { "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-04-04-121901", "version": "4.0.0-0.ci-2019-04-04-121901" }, "history": [ { "completionTime": "2019-04-04T13:48:42Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-04-04-121901", "startedTime": "2019-04-04T13:00:46Z", "state": "Completed", "version": "4.0.0-0.ci-2019-04-04-121901" }, { "completionTime": "2019-04-04T13:00:46Z", "image": "registry.svc.ci.openshift.org/ocp/release@sha256:38615fee13cc324aded26048a26e075cc6d3247f87cea90e49f0685bf798c304", "startedTime": "2019-04-04T12:38:36Z", "state": "Completed", "version": "4.0.0-0.ci-2019-04-04-081851" } ], "observedGeneration": 2, "versionHash": "S3imd-IFzHk=" } } ], "kind": "List", "metadata": { "resourceVersion": "", "selfLink": "" } } and openshift-apiserver is avialable false at 13:50:12 curl https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/908/artifacts/e2e-aws-upgrade/clusteroperators.json | jq '.items[] | select(.status.conditions[] | .type == "Available" and .status != "True") | [.metadata.name, .status.conditions]' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 61290 100 61290 0 0 126k 0 --:--:-- --:--:-- --:--:-- 125k [ "openshift-apiserver", [ { "lastTransitionTime": "2019-04-04T13:35:42Z", "reason": "AsExpected", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2019-04-04T13:35:48Z", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2019-04-04T13:50:12Z", "message": "Available: v1.quota.openshift.io is not ready: 503", "reason": "Available", "status": "False", "type": "Available" }, { "lastTransitionTime": "2019-04-04T13:35:42Z", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ] ]
https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/907/artifacts/e2e-aws-upgrade/ has similar error of openshift-apiserver failing. curl https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/907/artifacts/e2e-aws-upgrade/clusteroperators.json | jq '.items[] | select(.status.conditions[] | .type == "Available" and .status != "True") | [.metadata.name, .status.conditions]' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 61530 100 61530 0 0 130k 0 --:--:-- --:--:-- --:--:-- 130k [ "openshift-apiserver", [ { "lastTransitionTime": "2019-04-04T13:05:44Z", "reason": "AsExpected", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2019-04-04T13:06:02Z", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2019-04-04T13:22:01Z", "message": "Available: v1.quota.openshift.io is not ready: 503", "reason": "Available", "status": "False", "type": "Available" }, { "lastTransitionTime": "2019-04-04T13:05:44Z", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ] ]
https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/904 is failing with similar error. curl https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/905/artifacts/e2e-aws-upgrade/clusteroperators.json | jq '.items[] | select(.status.conditions[] | .type == "Available" and .status != "True") | [.metadata.name, .status.conditions]' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 61288 100 61288 0 0 130k 0 --:--:-- --:--:-- --:--:-- 130k [ "openshift-apiserver", [ { "lastTransitionTime": "2019-04-04T11:04:25Z", "reason": "AsExpected", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2019-04-04T11:04:31Z", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2019-04-04T11:20:38Z", "message": "Available: v1.quota.openshift.io is not ready: 503", "reason": "Available", "status": "False", "type": "Available" }, { "lastTransitionTime": "2019-04-04T11:04:25Z", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ] ]
(In reply to Abhinav Dahiya from comment #8) This is https://bugzilla.redhat.com/show_bug.cgi?id=1696387 > https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release- > openshift-origin-installer-e2e-aws-upgrade-4.0/904 is failing with similar > error. > > curl > https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin- > installer-e2e-aws-upgrade-4.0/905/artifacts/e2e-aws-upgrade/clusteroperators. > json | jq '.items[] | select(.status.conditions[] | .type == "Available" and > .status != "True") | [.metadata.name, .status.conditions]' > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left Speed > 100 61288 100 61288 0 0 130k 0 --:--:-- --:--:-- --:--:-- > 130k > [ > "openshift-apiserver", > [ > { > "lastTransitionTime": "2019-04-04T11:04:25Z", > "reason": "AsExpected", > "status": "False", > "type": "Failing" > }, > { > "lastTransitionTime": "2019-04-04T11:04:31Z", > "reason": "AsExpected", > "status": "False", > "type": "Progressing" > }, > { > "lastTransitionTime": "2019-04-04T11:20:38Z", > "message": "Available: v1.quota.openshift.io is not ready: 503", > "reason": "Available", > "status": "False", > "type": "Available" > }, > { > "lastTransitionTime": "2019-04-04T11:04:25Z", > "reason": "AsExpected", > "status": "True", > "type": "Upgradeable" > } > ] > ]
*** Bug 1696387 has been marked as a duplicate of this bug. ***
*** Bug 1698033 has been marked as a duplicate of this bug. ***
https://github.com/openshift/origin/pull/22425 merged, we should not see "message": "Available: v1.quota.openshift.io is not ready: 503" anymore.
[1] (launched just before origin#22425 landed) hit this. I'll check back in in a few hours to make sure these have gone away. [1]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/1585/pull-ci-openshift-installer-master-e2e-aws/5108
[1] has another, despite starting well after origin#22425 landed. But for some reason it's still running an older origin commit: $ curl -s https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-samples-operator/129/pull-ci-openshift-cluster-samples-operator-master-e2e-aws-image-ecosystem/343?log#log | grep 'Available: v1.quota.openshift.io is not ready: 503' Apr 10 19:41:39.739 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 Apr 10 19:41:46.944 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 Apr 10 19:41:56.542 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 Apr 10 19:42:03.754 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 Apr 10 19:42:10.944 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 Apr 10 19:42:18.141 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-samples-operator/129/pull-ci-openshift-cluster-samples-operator-master-e2e-aws-image-ecosystem/343/artifacts/release-latest/release-payload-latest/image-references | jq -r '.spec.tags[] | select(.name == "hyperkube").annotations' { "io.openshift.build.commit.id": "af45cda5bce85838501f67afade94c6871fd1e4f", "io.openshift.build.commit.ref": "master", "io.openshift.build.source-location": "https://github.com/openshift/origin", "io.openshift.build.versions": "kubernetes=1.13.4" } $ git log --first-parent --format='%ad %h %d %s' --date=iso -3 origin/master |cat 2019-04-10 12:59:01 -0700 2108314cd8 (origin/release-4.0, origin/master, origin/HEAD) Merge pull request #22504 from smarterclayton/handle_multiple_target_path 2019-04-10 10:40:39 -0700 d212b13acc Merge pull request #22425 from mfojtik/crq-to-crd 2019-04-10 08:09:03 -0400 af45cda5bc Merge pull request #22521 from deads2k/quota-pick [1]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-samples-operator/129/pull-ci-openshift-cluster-samples-operator-master-e2e-aws-image-ecosystem/343
Still reproduced in latest payload 4.0.0-0.nightly-2019-04-10-182914 which does not yet build in above fix PR. Will check again when new payload includes it.
Marking BetaBlocker based on apiserver upgrade failure in duplicate https://bugzilla.redhat.com/show_bug.cgi?id=1696387
Created attachment 1554620 [details] Recent instances of this error in CI Only instances since the fix are in upgrade tests, so I think we're good :).
Verified in latest payload 4.1.0-0.nightly-2019-04-18-210657 , the error message is not seen.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758