Bug 2084361

Summary: ci jobs exit 2 for an unknown reason
Product: OpenShift Container Platform Reporter: jamo luhrsen <jluhrsen>
Component: Test FrameworkAssignee: OpenShift Release Oversight <openshift-release-oversight>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4.11CC: dgoodwin, openshift-release-oversight, stbenjam
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-21 19:44:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jamo luhrsen 2022-05-12 00:05:30 UTC
Some jobs, like these two [0][1], are showing up as a failure, but there
is no clear reason for the failure other than the test container was exit(2)

There is a brief slack conversation about this here[2].

a quick hit on search.ci shows that it has happened 15 out of 332 failed
jobs:

    curl -s 'https://search.ci.openshift.org/search?maxAge=12h&type=build-log&context=1&search=Step.*openshift-e2e-test+failed' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | grep 'exit status' | sort | uniq -c
    317 error: failed to execute wrapped command: exit status 1
     15 error: failed to execute wrapped command: exit status 2

The test container failure message does not help:

    {"component":"entrypoint","error":"wrapped process failed: exit status 2","file":"k8s.io/test-infra/prow/entrypoint/run.go:80","func":"k8s.io/test-infra/prow/entrypoint.Options.Run","level":"error","msg":"Error executing test process","severity":"error","time":"2022-05-11T08:16:30Z"}
error: failed to execute wrapped command: exit status 2


I cannot find any other clues in the job artifacts. It seems like all the
tests and steps are ok in these jobs and it should be marked as a pass.

[0] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade/1524255508167397376
[1] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-upgrade-from-stable-4.10-e2e-aws-ovn-upgrade/1524255509832536064
[2] https://coreos.slack.com/archives/C01CQA76KMX/p1652304143240249

Comment 2 jamo luhrsen 2022-05-18 15:02:54 UTC
moving back to assigned, because the PR associated with this bz was just for debugging
purposes. I haven't come across another example of this yet since the debug PR went
in. When I do, I'll reply here with what I find.

Comment 4 Devan Goodwin 2022-11-21 19:44:26 UTC
No more exit status 2 in the results using the command above. Closing.