Bug 1827863
| Summary: | [Feature:Platform][Smoke] Managed cluster should ensure pods use downstream images from our release image with proper ImagePullPolicy [Suite:openshift/conformance/parallel] timing out | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Miciah Dashiel Butler Masters <mmasters> | |
| Component: | Cloud Compute | Assignee: | Alberto <agarcial> | |
| Cloud Compute sub component: | Other Providers | QA Contact: | Jianwei Hou <jhou> | |
| Status: | CLOSED INSUFFICIENT_DATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | unspecified | CC: | agarcial, aos-bugs, jokerman, kgarriso, mfojtik, obulatov, ssoto, xtian | |
| Version: | 4.2.z | |||
| Target Milestone: | --- | |||
| Target Release: | 4.6.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1874524 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-01 10:07:58 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1874524 | |||
Apr 24 12:13:42.904: INFO: Running 'oc --namespace=openshift-console --config=/tmp/admin.kubeconfig exec downloads-6fcb9d8c68-dpjsw -c download-server -- cat /etc/redhat-release' Apr 24 12:15:58.357: INFO: Image relase info:Red Hat Enterprise Linux Server release 7.8 (Maipo) Apr 24 12:15:58.357: INFO: Running 'oc --namespace=openshift-controller-manager-operator --config=/tmp/admin.kubeconfig exec openshift-controller-manager-operator-9495b7dbf-8vnvr -c operator -- cat /etc/redhat-release' Apr 24 12:15:59.049: INFO: Image relase info:Red Hat Enterprise Linux Server release 7.8 (Maipo) ... Apr 24 12:21:15.331: INFO: Running 'oc --namespace=openshift-etcd --config=/tmp/admin.kubeconfig exec etcd-member-control-plane-2 -c etcd-member -- cat /etc/redhat-release' Apr 24 12:23:30.999: INFO: Image relase info:Red Hat Enterprise Linux Server release 7.8 (Maipo) Sometimes `oc exec` takes considerable time (2 minutes for etcd-member-control-plane-2) to run `cat`. I don't know what may cause such delays, so I'm moving this to CLI. It looks like the problem is not with any component per se, but rather the overall timeout on the test. I'm moving this to cloud team, since it looks like the lags are caused by the vsphere installation. *** Bug 1813967 has been marked as a duplicate of this bug. *** The vSphere CI/dev environment have very limited resources which might be impacting on the timing. I'm assigning this to be tracked by the team who owns this test. I'd suggest to increase the timeout and revisit it when vSphere CI is migrated to a new environment with more capacity https://docs.google.com/document/d/1f26SLA_nYpKopYUJ5_YpAKRtxAXs6x1l-sYz359utgE/edit?ts=5ea70e3a https://github.com/openshift/origin/blob/7c3ca66a9dfce672a21172425856598e2d1a9916/cmd/openshift-tests/e2e.go#L92 https://github.com/openshift/origin/blob/5c167724f4a2c63064acf19c90e0445ad384f5d8/pkg/test/ginkgo/cmd_runsuite.go#L181 Setting timeout onto 20 minutes, https://github.com/openshift/origin/pull/24968 Hey Oleg, seemed natural to me based on the issue. Can you think of a better component home? please feel free to just move back to cloud team otherwise. This is a sig-arch test. It covers the platform as a whole, so I don't know a better component. I'm closing this as I don't see this timeout happening https://search.apps.build01.ci.devcluster.openshift.com/?search=Managed+cluster+should+ensure+pods+use+downstream+images+from+our+release+image+with+proper+ImagePullPolicy&maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job I'll reopen if this is identified again. |
Description of problem: The "Managed cluster should ensure pods use downstream images from our release image with proper ImagePullPolicy" test is failing frequently with what appears to be a timeout. I am seeing the following: started: (0/3/2144) "[Feature:Platform][Smoke] Managed cluster should ensure pods use downstream images from our release image with proper ImagePullPolicy [Suite:openshift/conformance/parallel]" Eventually followed by the following: --------------------------------------------------------- Received interrupt. Running AfterSuite... ^C again to terminate immediately Apr 24 12:23:53.994: INFO: Running AfterSuite actions on all nodes Apr 24 12:23:53.994: INFO: Waiting up to 3m0s for all (but 100) nodes to be ready Apr 24 12:23:54.089: INFO: Running AfterSuite actions on node 1 failed: (15m0s) 2020-04-24T12:23:54 "[Feature:Platform][Smoke] Managed cluster should ensure pods use downstream images from our release image with proper ImagePullPolicy [Suite:openshift/conformance/parallel]" It looks like the test is timing out after 15 minutes, and the test runner is sending a signal to terminate the test. I am seeing especially many failures that follow this pattern on e2e-vsphere-upi. For example: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.2/623 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.2/618 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.2/617 I did see some failures following this pattern for a few other platforms: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_release/8271/rehearse-8271-release-openshift-ocp-installer-e2e-openstack-ppc64le-4.3/57 https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-4.3/1817 Using search.svc, I found many more failures of the same test, but often those failures result from missing image or i/o timeout errors whereas the above failures appear to be the test timing out and being terminated by the test runner. https://search.svc.ci.openshift.org/?search=Managed%20cluster%20should%20ensure%20pods%20use%20downstream%20images%20from%20our%20release%20image%20with%20proper%20ImagePullPolicy