Bug 1815941

Summary: e2e tests use substantially more CPU in CI infrastructure than necessary
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: Test InfrastructureAssignee: Clayton Coleman <ccoleman>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.5CC: skuznets
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1815943 (view as bug list) Environment:
Last Closed: 2020-05-15 00:28:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1815943, 1815944    

Description Clayton Coleman 2020-03-22 22:28:33 UTC
A performance issue was identified in Ginkgo where every Openshift-tests invocation was triggering a full stack trace for every test, every time a test ran.  We added a carry patch to our forked ginkgo (which is mostly static) to disable capturing this trace and reduced total CPU used by an e2e run for 4.5 from ~10.5k core seconds to ~4.5k core seconds (60% reduction in total CPU). As e2e is one of our more significant loads this has a significant reduction in CI resources consumed.

The fix is safe to use within our limited use of Ginkgo and has been backported to 4.5 and 4.4 already with a vendor bump.  We should vendor bump in 4.3 as well since that's a significant amount of installs.

Comment 1 Clayton Coleman 2020-03-22 22:31:48 UTC
Ginkgo patch in https://github.com/openshift/onsi-ginkgo/commit/67da0dd32db383f566d9d2b6559bf1d0ef03b5cb

Vendor bump in 4.5: https://github.com/openshift/origin/pull/24742 
Vendor bump in 4.4: https://github.com/openshift/origin/pull/24743
Vendor bump in 4.3: https://github.com/openshift/origin/pull/24746

Verified in 4.5 and 4.4 builds, graph of pod core seconds from api.ci (the tall job at the same time is a 4.3 build)

https://www.dropbox.com/s/jrpw3bsxclqcb64/Screenshot%202020-03-22%2018.30.38.png?raw=1

Comment 3 Steve Kuznetsov 2020-05-15 00:28:30 UTC
This merged but is not something QE verifies.