periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy is permafailing. The 4.9 job doesn't look great, either. is failing frequently in CI, see: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy Example job failure: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy/1440227332513075200 They mostly seem to be failing on pulling images: fail [github.com/openshift/origin/test/extended/tbr_health/check.go:18]: Expected <string>: Failed to import expected imagestreams, latest error status: ImageStream Error: &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"imagestreams.image.openshift.io \"ruby\" not found", Reason:"NotFound", Details:(*v1.StatusDetails)(0xc0019e2ea0), Code:404}} Looking into the cluster-samples-operator logs I see things like this: 2021-09-21T08:41:03.969380858Z time="2021-09-21T08:41:03Z" level=error msg="unable to sync: config.samples.operator.openshift.io \"cluster\" not found, requeuing" 2021-09-21T08:41:08.426343198Z time="2021-09-21T08:41:08Z" level=info msg="test connection with timeout failed with dial tcp 104.100.22.132:443: i/o timeout" 2021-09-21T08:41:17.534254816Z time="2021-09-21T08:41:17Z" level=info msg="Received watch event imagestream/driver-toolkit but not upserting since deletion of the Config is in progress" 2021-09-21T08:41:28.425834834Z time="2021-09-21T08:41:28Z" level=info msg="test connection with timeout failed with dial tcp 104.119.21.151:443: i/o timeout" 2021-09-21T08:41:48.426345247Z time="2021-09-21T08:41:48Z" level=info msg="test connection with timeout failed with dial tcp 104.119.21.151:443: i/o timeout" 2021-09-21T08:42:08.425404137Z time="2021-09-21T08:42:08Z" level=info msg="test connection with timeout failed with dial tcp 104.119.21.151:443: i/o timeout" 2021-09-21T08:42:22.293899280Z W0921 08:42:22.293835 8 reflector.go:441] github.com/openshift/client-go/template/informers/externalversions/factory.go:101: watch of *v1.Template ended with: an error on the server ("unable to decode an event from the watch stream: stream error: stream ID 25; INTERNAL_ERROR") has prevented the request from succeeding 2021-09-21T08:42:25.890437860Z time="2021-09-21T08:42:25Z" level=error msg="unable to sync: config.samples.operator.openshift.io \"cluster\" not found, requeuing" 2021-09-21T08:42:28.426772151Z time="2021-09-21T08:42:28Z" level=info msg="test connection with timeout failed with dial tcp 104.100.22.132:443: i/o timeout" 2021-09-21T08:42:43.428406867Z time="2021-09-21T08:42:43Z" level=info msg="test connection with timeout failed with dial tcp 104.100.22.132:443: i/o timeout" 2021-09-21T08:42:43.428483787Z time="2021-09-21T08:42:43Z" level=info msg="unable to establish HTTPS connection to registry.redhat.io after 3 minutes, bootstrap to Removed" I don't see any proxy configuration in the cluster-samples-operator deployment.
yea my https://github.com/openshift/cluster-samples-operator/pull/394 / bz 2002368 broke proxy I'll take this one fix is pretty straight forward, but I'm also going to want to add e2e-aws-proxy as an option with the sample operator repo to vet the fix
UPDATE: my addition of adding e2e-aws-proxy to the sample repos proved interesting in that the rehearsal job passes without my fix to samples I had copied the e2e-aws-proxy job def from openshift/builder So all this makes me wonder if build api team has not been setting up e2e-aws-proxy correctly. I'm going to try to compare the setup for periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy with our PR tests and see if I can find the difference. While the fix is pretty straight forward, I'd still rather vet it with the equivalent run of periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy before it merges. I've copied David (now owner of samples) and Adam (build api team lead) for awareness.
I've compared our e2e-aws-proxy job configs with https://raw.githubusercontent.com/openshift/release/master/ci-operator/jobs/openshift/release/openshift-release-master-periodics.yaml and see no relevant diffs. I've asked DPTP help desk in the interim over in #forum-testplatform In the interim, going to move forward with my openshift/release PR, and create some dummy/test PRs in openshift/builder and openshift/cluster-samples-operator separate from my fix PR to cross reference and vet
Petr Muller responded .... hopefully can fix our e2e-aws-proxy def
I'll inspect the mustgather of the next few periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy runs later today / tommorrow and confirm samples operator is not an issue the key element will be that samples does not bootstrap as removed, which is what was occurring before, but the default managed that lead to a bunch of sig-builds tests failing because the language sample imagestreams those were dependent on did not exist
Looking at https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy the job https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy/1442546511056474112 *MIGHT* have our fix here. But rather than sort out the commit levels now, I'll let it finish, inspect, and we'll go from there.
Yeah https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy/1442546511056474112 was missing the fix. Waiting for the next invocation
And we have green e2e's https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy/1442674651170869248 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws-proxy/1442770312683851776 verification for this will be handled via the verification for https://bugzilla.redhat.com/show_bug.cgi?id=2002368 this bug addressed a regression introduced with that in 4.10 only change marking verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056