Bug 1694226
Summary: | cluster upgrade should maintain a functioning cluster during upgrade: Available: v1.quota.openshift.io is not ready: 503 | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ben Parees <bparees> | ||||
Component: | Master | Assignee: | Michal Fojtik <mfojtik> | ||||
Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.1.0 | CC: | adahiya, aos-bugs, jokerman, mifiedle, mmccomas, nmoraiti, wking, yapei | ||||
Target Milestone: | --- | Keywords: | BetaBlocker | ||||
Target Release: | 4.1.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-06-04 10:46:40 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1698672, 1698950, 1700504 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Ben Parees
2019-03-29 20:03:54 UTC
Ben, do we have info about how often we do see this flake? The two i linked were the 2 i saw in the 2 days of history i went through, but you can query all job runs for the last 7 days here: https://search.svc.ci.openshift.org/?search=failed+to+get+logs+from+pod&maxAge=168h&context=2&type=all Seeing consistent failures on this test. The search linked above is not picking up failures in the last 12 hours. https://openshift-gce-devel.appspot.com/builds/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/ (Build Cop) https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/908 CVO reported successful upgrade ie Available at 12:58:01 but the completion is at 13:38:42. { "apiVersion": "v1", "items": [ { "apiVersion": "config.openshift.io/v1", "kind": "ClusterVersion", "metadata": { "creationTimestamp": "2019-04-04T12:38:16Z", "generation": 2, "name": "version", "resourceVersion": "46307", "selfLink": "/apis/config.openshift.io/v1/clusterversions/version", "uid": "850c19ae-56d6-11e9-97a2-122b11cdb986" }, "spec": { "channel": "stable-4.0", "clusterID": "5250a589-158a-42c9-a86b-e312876f4705", "desiredUpdate": { "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-04-04-121901", "version": "" }, "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph" }, "status": { "availableUpdates": null, "conditions": [ { "lastTransitionTime": "2019-04-04T12:58:01Z", "message": "Done applying 4.0.0-0.ci-2019-04-04-121901", "status": "True", "type": "Available" }, { "lastTransitionTime": "2019-04-04T13:43:27Z", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2019-04-04T13:48:42Z", "message": "Cluster version is 4.0.0-0.ci-2019-04-04-121901", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2019-04-04T12:38:36Z", "message": "Unable to retrieve available updates: unknown version 4.0.0-0.ci-2019-04-04-121901", "reason": "RemoteFailed", "status": "False", "type": "RetrievedUpdates" } ], "desired": { "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-04-04-121901", "version": "4.0.0-0.ci-2019-04-04-121901" }, "history": [ { "completionTime": "2019-04-04T13:48:42Z", "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-04-04-121901", "startedTime": "2019-04-04T13:00:46Z", "state": "Completed", "version": "4.0.0-0.ci-2019-04-04-121901" }, { "completionTime": "2019-04-04T13:00:46Z", "image": "registry.svc.ci.openshift.org/ocp/release@sha256:38615fee13cc324aded26048a26e075cc6d3247f87cea90e49f0685bf798c304", "startedTime": "2019-04-04T12:38:36Z", "state": "Completed", "version": "4.0.0-0.ci-2019-04-04-081851" } ], "observedGeneration": 2, "versionHash": "S3imd-IFzHk=" } } ], "kind": "List", "metadata": { "resourceVersion": "", "selfLink": "" } } and openshift-apiserver is avialable false at 13:50:12 curl https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/908/artifacts/e2e-aws-upgrade/clusteroperators.json | jq '.items[] | select(.status.conditions[] | .type == "Available" and .status != "True") | [.metadata.name, .status.conditions]' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 61290 100 61290 0 0 126k 0 --:--:-- --:--:-- --:--:-- 125k [ "openshift-apiserver", [ { "lastTransitionTime": "2019-04-04T13:35:42Z", "reason": "AsExpected", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2019-04-04T13:35:48Z", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2019-04-04T13:50:12Z", "message": "Available: v1.quota.openshift.io is not ready: 503", "reason": "Available", "status": "False", "type": "Available" }, { "lastTransitionTime": "2019-04-04T13:35:42Z", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ] ] https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/907/artifacts/e2e-aws-upgrade/ has similar error of openshift-apiserver failing. curl https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/907/artifacts/e2e-aws-upgrade/clusteroperators.json | jq '.items[] | select(.status.conditions[] | .type == "Available" and .status != "True") | [.metadata.name, .status.conditions]' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 61530 100 61530 0 0 130k 0 --:--:-- --:--:-- --:--:-- 130k [ "openshift-apiserver", [ { "lastTransitionTime": "2019-04-04T13:05:44Z", "reason": "AsExpected", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2019-04-04T13:06:02Z", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2019-04-04T13:22:01Z", "message": "Available: v1.quota.openshift.io is not ready: 503", "reason": "Available", "status": "False", "type": "Available" }, { "lastTransitionTime": "2019-04-04T13:05:44Z", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ] ] https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/904 is failing with similar error. curl https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/905/artifacts/e2e-aws-upgrade/clusteroperators.json | jq '.items[] | select(.status.conditions[] | .type == "Available" and .status != "True") | [.metadata.name, .status.conditions]' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 61288 100 61288 0 0 130k 0 --:--:-- --:--:-- --:--:-- 130k [ "openshift-apiserver", [ { "lastTransitionTime": "2019-04-04T11:04:25Z", "reason": "AsExpected", "status": "False", "type": "Failing" }, { "lastTransitionTime": "2019-04-04T11:04:31Z", "reason": "AsExpected", "status": "False", "type": "Progressing" }, { "lastTransitionTime": "2019-04-04T11:20:38Z", "message": "Available: v1.quota.openshift.io is not ready: 503", "reason": "Available", "status": "False", "type": "Available" }, { "lastTransitionTime": "2019-04-04T11:04:25Z", "reason": "AsExpected", "status": "True", "type": "Upgradeable" } ] ] (In reply to Abhinav Dahiya from comment #8) This is https://bugzilla.redhat.com/show_bug.cgi?id=1696387 > https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release- > openshift-origin-installer-e2e-aws-upgrade-4.0/904 is failing with similar > error. > > curl > https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin- > installer-e2e-aws-upgrade-4.0/905/artifacts/e2e-aws-upgrade/clusteroperators. > json | jq '.items[] | select(.status.conditions[] | .type == "Available" and > .status != "True") | [.metadata.name, .status.conditions]' > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left Speed > 100 61288 100 61288 0 0 130k 0 --:--:-- --:--:-- --:--:-- > 130k > [ > "openshift-apiserver", > [ > { > "lastTransitionTime": "2019-04-04T11:04:25Z", > "reason": "AsExpected", > "status": "False", > "type": "Failing" > }, > { > "lastTransitionTime": "2019-04-04T11:04:31Z", > "reason": "AsExpected", > "status": "False", > "type": "Progressing" > }, > { > "lastTransitionTime": "2019-04-04T11:20:38Z", > "message": "Available: v1.quota.openshift.io is not ready: 503", > "reason": "Available", > "status": "False", > "type": "Available" > }, > { > "lastTransitionTime": "2019-04-04T11:04:25Z", > "reason": "AsExpected", > "status": "True", > "type": "Upgradeable" > } > ] > ] *** Bug 1696387 has been marked as a duplicate of this bug. *** *** Bug 1698033 has been marked as a duplicate of this bug. *** https://github.com/openshift/origin/pull/22425 merged, we should not see "message": "Available: v1.quota.openshift.io is not ready: 503" anymore. [1] (launched just before origin#22425 landed) hit this. I'll check back in in a few hours to make sure these have gone away. [1]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/1585/pull-ci-openshift-installer-master-e2e-aws/5108 [1] has another, despite starting well after origin#22425 landed. But for some reason it's still running an older origin commit: $ curl -s https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-samples-operator/129/pull-ci-openshift-cluster-samples-operator-master-e2e-aws-image-ecosystem/343?log#log | grep 'Available: v1.quota.openshift.io is not ready: 503' Apr 10 19:41:39.739 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 Apr 10 19:41:46.944 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 Apr 10 19:41:56.542 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 Apr 10 19:42:03.754 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 Apr 10 19:42:10.944 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 Apr 10 19:42:18.141 W clusteroperator/openshift-apiserver changed Available to False: Available: Available: v1.quota.openshift.io is not ready: 503 $ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-samples-operator/129/pull-ci-openshift-cluster-samples-operator-master-e2e-aws-image-ecosystem/343/artifacts/release-latest/release-payload-latest/image-references | jq -r '.spec.tags[] | select(.name == "hyperkube").annotations' { "io.openshift.build.commit.id": "af45cda5bce85838501f67afade94c6871fd1e4f", "io.openshift.build.commit.ref": "master", "io.openshift.build.source-location": "https://github.com/openshift/origin", "io.openshift.build.versions": "kubernetes=1.13.4" } $ git log --first-parent --format='%ad %h %d %s' --date=iso -3 origin/master |cat 2019-04-10 12:59:01 -0700 2108314cd8 (origin/release-4.0, origin/master, origin/HEAD) Merge pull request #22504 from smarterclayton/handle_multiple_target_path 2019-04-10 10:40:39 -0700 d212b13acc Merge pull request #22425 from mfojtik/crq-to-crd 2019-04-10 08:09:03 -0400 af45cda5bc Merge pull request #22521 from deads2k/quota-pick [1]: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-samples-operator/129/pull-ci-openshift-cluster-samples-operator-master-e2e-aws-image-ecosystem/343 Still reproduced in latest payload 4.0.0-0.nightly-2019-04-10-182914 which does not yet build in above fix PR. Will check again when new payload includes it. Marking BetaBlocker based on apiserver upgrade failure in duplicate https://bugzilla.redhat.com/show_bug.cgi?id=1696387 Created attachment 1554620 [details]
Recent instances of this error in CI
Only instances since the fix are in upgrade tests, so I think we're good :).
Verified in latest payload 4.1.0-0.nightly-2019-04-18-210657 , the error message is not seen. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |