test: [sig-cli] oc debug deployment configs from a build is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-cli%5C%5D+oc+debug+deployment+configs+from+a+build $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=24h&type=junit&search=%5C%5Bsig-cli%5C%5D+oc+debug+deployment+configs+from+a+build' | grep 'failures match' | sortperiodic-ci-openshift-release-master-ocp-4.7-e2e-aws-proxy - 3 runs, 100% failed, 67% of failures match periodic-ci-openshift-release-master-ocp-4.7-e2e-vsphere - 6 runs, 100% failed, 17% of failures match pull-ci-cri-o-cri-o-master-e2e-aws - 26 runs, 35% failed, 22% of failures match pull-ci-openshift-cloud-credential-operator-master-e2e-aws - 4 runs, 100% failed, 25% of failures match pull-ci-openshift-installer-master-e2e-aws-workers-rhel7 - 6 runs, 17% failed, 100% of failures match pull-ci-openshift-kubernetes-master-e2e-aws-fips - 14 runs, 86% failed, 17% of failures match pull-ci-openshift-machine-config-operator-master-e2e-aws - 27 runs, 78% failed, 5% of failures match pull-ci-openshift-machine-config-operator-master-e2e-aws-workers-rhel7 - 11 runs, 64% failed, 14% of failures match pull-ci-openshift-machine-config-operator-master-e2e-vsphere-upi - 7 runs, 100% failed, 14% of failures match pull-ci-openshift-machine-config-operator-master-okd-e2e-aws - 11 runs, 45% failed, 20% of failures match pull-ci-openshift-oc-master-e2e-aws - 7 runs, 57% failed, 25% of failures match pull-ci-openshift-openshift-controller-manager-master-e2e-aws - 2 runs, 50% failed, 100% of failures match pull-ci-openshift-ovn-kubernetes-master-e2e-vsphere-ovn - 5 runs, 100% failed, 20% of failures match pull-ci-openshift-sriov-network-operator-master-e2e-aws - 22 runs, 100% failed, 14% of failures match release-openshift-ocp-installer-e2e-ovirt-4.7 - 6 runs, 67% failed, 25% of failures match Example job [1]: fail [github.com/openshift/origin/test/extended/cli/debug.go:29]: Unexpected error: <*errors.errorString | 0xc0002d48a0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred with stdout from that test-case including: Nov 9 20:41:39.574: INFO: Error running /usr/bin/oc --namespace=e2e-test-oc-debug-754r9 --kubeconfig=/tmp/configfile291112964 get imagestreamtags local-busybox:latest: StdOut> Error from server (NotFound): imagestreamtags.image.openshift.io "local-busybox:latest" not found StdErr> Error from server (NotFound): imagestreamtags.image.openshift.io "local-busybox:latest" not found ... Nov 9 20:41:39.798: INFO: POD NODE PHASE GRACE CONDITIONS Nov 9 20:41:39.798: INFO: busybox1-1-deploy ip-10-0-61-59.us-west-1.compute.internal Succeeded [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:38 +0000 UTC PodCompleted } {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:38 +0000 UTC PodCompleted } {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:38 +0000 UTC PodCompleted } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:38 +0000 UTC }] Nov 9 20:41:39.798: INFO: busybox2-1-deploy ip-10-0-78-157.us-west-1.compute.internal Succeeded [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:39 +0000 UTC PodCompleted } {Ready False 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:39 +0000 UTC PodCompleted } {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:39 +0000 UTC PodCompleted } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:39 +0000 UTC }] Nov 9 20:41:39.798: INFO: local-busybox-1-build ip-10-0-78-157.us-west-1.compute.internal Running [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:41 +0000 UTC } {Ready True 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:42 +0000 UTC } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:42 +0000 UTC } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2020-11-09 20:40:38 +0000 UTC }] Maybe we just need to bump a timeout or something? [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-aws-proxy/1325878491849166848
Somewhat similar, but without the running pod: Error from server (NotFound): imagestreamtags.image.openshift.io "wildfly:latest" not found out of: [sig-cli] oc debug ensure it works with image streams Maybe related to Docker throttling (bug 1895107), and we are just missing clear ImageStreamTag reporting in some test suite handler?
Iām adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.
both of the PRs attached to this bz are merged (side note: I don't know why 25803 still says open. it was merged 3 days ago). BUT, I see this '[sig-cli] oc debug ensure it works with image streams' still failing a lot. at least in: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.7-informing#periodic-ci-openshift-release-master-ocp-4.7-e2e-vsphere-upi when it does fail, it will reject our release payloads, like this: https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com/releasestream/4.7.0-0.nightly/release/4.7.0-0.nightly-2021-01-22-195722 search: https://search.ci.openshift.org/?search=oc+debug+ensure+it+works+with+image+streams&maxAge=48h&context=1&type=junit&name=4.7&maxMatches=5&maxBytes=20971520&groupBy=job my concern was paying attention to 4.7 release and came across this today. Do we need a new bz or do we need more work done on this one?
Checked the logs, this case still failing a lot after all the PRs merged: https://search.ci.openshift.org/?search=oc+debug+deployment+configs+from+a+build&maxAge=48h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job follows are with the timeout error: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-upgrade-4.6-stable-to-4.7-ci/1352749515207086080 https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/713/pull-ci-openshift-oc-master-e2e-aws/1353149155815460864 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-shared-vpc-4.8/1353457507455995904 https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/15187/rehearse-15187-pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-multi/1353315696192262144 https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/693/pull-ci-openshift-oc-master-e2e-aws/1353155175140823040 https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/693/pull-ci-openshift-oc-master-e2e-aws/1353155175140823040
(In reply to jamo luhrsen from comment #9) > both of the PRs attached to this bz are merged (side note: I don't know why > 25803 still says open. it was merged 3 days ago). > > BUT, I see this '[sig-cli] oc debug ensure it works with image streams' > still failing a lot. at least in: This BZ is not related to this test, please don't mix cases.
(In reply to zhou ying from comment #10) > Checked the logs, this case still failing a lot after all the PRs merged: > > https://search.ci.openshift.org/ > ?search=oc+debug+deployment+configs+from+a+build&maxAge=48h&context=1&type=bu > g%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job > > follows are with the timeout error: > https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift- > origin-installer-e2e-aws-ovn-upgrade-4.6-stable-to-4.7-ci/1352749515207086080 > https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/ > openshift_oc/713/pull-ci-openshift-oc-master-e2e-aws/1353149155815460864 > https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift- > origin-installer-e2e-aws-shared-vpc-4.8/1353457507455995904 > https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/ > openshift_release/15187/rehearse-15187-pull-ci-openshift-cluster-network- > operator-master-e2e-aws-sdn-multi/1353315696192262144 > https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/ > openshift_oc/693/pull-ci-openshift-oc-master-e2e-aws/1353155175140823040 > https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/ > openshift_oc/693/pull-ci-openshift-oc-master-e2e-aws/1353155175140823040 All of the above failures are caused by build not being completed, lemme bump that particular time-out to 10 minutes like it's in the other tests.
(In reply to Maciej Szulik from comment #11) > (In reply to jamo luhrsen from comment #9) > > both of the PRs attached to this bz are merged (side note: I don't know why > > 25803 still says open. it was merged 3 days ago). > > > > BUT, I see this '[sig-cli] oc debug ensure it works with image streams' > > still failing a lot. at least in: > > This BZ is not related to this test, please don't mix cases. I found this bz by searching and it matched what @wking noted in comment #1. Was thinking it could all be the same root cause. If you think it's different, I can file a new bz. But, looking at this failed job: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ocp-4.7-e2e-vsphere-upi/1353683727850409984 and the stdout from that failed testcase, I see similar errors as to what is being discussed in this bz: Jan 25 13:28:15.963: INFO: Running 'oc --namespace=e2e-test-oc-debug-lvbdc --kubeconfig=/tmp/configfile556966883 get imagestreamtags wildfly:latest' Jan 25 13:28:16.088: INFO: Error running /usr/bin/oc --namespace=e2e-test-oc-debug-lvbdc --kubeconfig=/tmp/configfile556966883 get imagestreamtags wildfly:latest: StdOut> Error from server (NotFound): imagestreamtags.image.openshift.io "wildfly:latest" not found StdErr> Error from server (NotFound): imagestreamtags.image.openshift.io "wildfly:latest" not found Error from server (NotFound): imagestreamtags.image.openshift.io "wildfly:latest" not found [AfterEach] [sig-cli] oc debug github.com/openshift/origin/test/extended/util/client.go:138 STEP: Collecting events from namespace "e2e-test-oc-debug-lvbdc". STEP: Found 1 events. Jan 25 13:28:16.096: INFO: At 2021-01-25 13:26:58 +0000 UTC - event for e2e-test-oc-debug-lvbdc: {namespace-security-allocation-controller } CreatedSCCRanges: created SCC ranges Jan 25 13:28:16.101: INFO: POD NODE PHASE GRACE CONDITIONS Jan 25 13:28:16.101: INFO: Jan 25 13:28:16.110: INFO: skipping dumping cluster info - cluster too large Jan 25 13:28:16.142: INFO: Deleted {user.openshift.io/v1, Resource=users e2e-test-oc-debug-lvbdc-user}, err: <nil> Jan 25 13:28:16.162: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthclients e2e-client-e2e-test-oc-debug-lvbdc}, err: <nil> Jan 25 13:28:16.182: INFO: Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens POvSR7LhT5WwUW5UGlwwJAAAAAAAAAAA}, err: <nil> [AfterEach] [sig-cli] oc debug github.com/openshift/origin/test/extended/util/client.go:139 Jan 25 13:28:16.182: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-test-oc-debug-lvbdc" for this suite. Jan 25 13:28:16.229: INFO: Running AfterSuite actions on all nodes Jan 25 13:28:16.229: INFO: Running AfterSuite actions on node 1 fail [github.com/openshift/origin/test/extended/cli/debug.go:188]: Unexpected error: <*errors.errorString | 0xc0002969a0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred I see that the PR to bump the timeout to 10m was merged 4h ago. I'll check back on this later and see if that helped or not. If you think I need an entirely new bz, let me know and I'll file it asap.
> and the stdout from that failed testcase, I see similar errors as to what is > being discussed in this bz: Both cases are different, namely line 188 in debug.go test awaits image import, that's the problem you're seeing. It is connected with registry limits, probably docker if that wasn't changed, yet. See https://github.com/openshift/origin/pull/25746 where I'm fixing parts of that problem. This particular case is also waiting for image stream to appear which is confusing, but this time the input for the IS is provided by the build happening. So even both cases are waiting for image stream to appear, the source of the image is different and thus makes this issue different.
(In reply to Maciej Szulik from comment #15) > > and the stdout from that failed testcase, I see similar errors as to what is > > being discussed in this bz: > > Both cases are different, namely line 188 in debug.go test awaits image > import, > that's the problem you're seeing. It is connected with registry limits, > probably > docker if that wasn't changed, yet. See > https://github.com/openshift/origin/pull/25746 > where I'm fixing parts of that problem. > > This particular case is also waiting for image stream to appear which is > confusing, > but this time the input for the IS is provided by the build happening. So > even > both cases are waiting for image stream to appear, the source of the image > is different > and thus makes this issue different. thank you for that explanation. It makes sense to me :) I'll check back on the failure I was noting once your PR is merged to help with the docker.io issues. I know we've faced those in many places this entire 4.7 release. Thanks again.
> thank you for that explanation. It makes sense to me :) Sure, I'm happy to help :-) > I'll check back on the failure I was noting once your PR is merged to help > with the docker.io issues. > I know we've faced those in many places this entire 4.7 release. Thanks > again. Yeah, feel free to open a new one if you see this happening.
Zhou Ying, I've checked the failures and: 1. some are from 4.6 which doesn't contain the fix 2. others are from okg, which iirc is based on 4.6, so see 1. 3. the remaining failures are coming from the build part of the process For 3 I'll check the queue in a few more days and I'll create a bug for the build to verify why the builds are not completed even though we are waiting extended period of time. For now I'd move this to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633