Bug 1707210
| Summary: | no logs gathered for failed upgrade job | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ben Parees <bparees> |
| Component: | Test Infrastructure | Assignee: | Steve Kuznetsov <skuznets> |
| Status: | CLOSED ERRATA | QA Contact: | |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 4.1.0 | CC: | mfojtik, nmoraiti, pmuller, sponnaga, vlaad |
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:48:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ben Parees
2019-05-07 04:13:31 UTC
similar issues in e2e-aws-serial: https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22774/pull-ci-openshift-origin-master-e2e-aws-serial/5875 Seen here: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-controller-manager-operator/250/pull-ci-openshift-cluster-kube-controller-manager-operator-master-e2e-aws-upgrade/48/artifacts/ Raising severity to urgent. Do successful runs of the same job (or at least runs that do not hit the timeout) save any useful artifact? If so, can you give me a pointer to this artifact? I assume this is something that needs to be fixed in the appropriate test template - it looks that artifacts are collected even on timeout, but the useful one is not placed in the collected location... yes, successful runs appear to have logs gathered: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/batch/pull-ci-openshift-origin-master-e2e-aws-serial/5900 https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/batch/pull-ci-openshift-origin-master-e2e-aws-serial/5900/artifacts/e2e-aws-serial/ https://github.com/openshift/ci-operator-prowgen/pull/157 https://github.com/openshift/release/pull/3706 Updating the jobs are the latest prow bumping. Pl merge it ASAP tonight before we make final build. I am lowering the priority so it will not block code freeze The issue should be fixed by now. we will continue monitoring the jobs and make sure that we won't hit that error again. What was the actual issue? At every step Prow will give a (configurable) grace period for jobs when they are being asked to terminate. It is up to the job to trap that and do something. The issue was an internal default overriding the configure default for timeout and grace period. Not sure this one makes sense to send to QA. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |