Bug 1815943

Summary: e2e tests use substantially more CPU in CI infrastructure than necessary
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: Test InfrastructureAssignee: Clayton Coleman <ccoleman>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.5CC: skuznets
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1815941
: 1815944 (view as bug list) Environment:
Last Closed: 2020-05-13 22:01:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1815941    
Bug Blocks: 1815944    

Description Clayton Coleman 2020-03-22 22:32:26 UTC
+++ This bug was initially created as a clone of Bug #1815941 +++

A performance issue was identified in Ginkgo where every Openshift-tests invocation was triggering a full stack trace for every test, every time a test ran.  We added a carry patch to our forked ginkgo (which is mostly static) to disable capturing this trace and reduced total CPU used by an e2e run for 4.5 from ~10.5k core seconds to ~4.5k core seconds (60% reduction in total CPU). As e2e is one of our more significant loads this has a significant reduction in CI resources consumed.

The fix is safe to use within our limited use of Ginkgo and has been backported to 4.5 and 4.4 already with a vendor bump.  We should vendor bump in 4.3 as well since that's a significant amount of installs.

--- Additional comment from Clayton Coleman on 2020-03-22 18:31:48 EDT ---

Ginkgo patch in https://github.com/openshift/onsi-ginkgo/commit/67da0dd32db383f566d9d2b6559bf1d0ef03b5cb

Vendor bump in 4.5: https://github.com/openshift/origin/pull/24742 
Vendor bump in 4.4: https://github.com/openshift/origin/pull/24743
Vendor bump in 4.3: https://github.com/openshift/origin/pull/24746

Verified in 4.5 and 4.4 builds, graph of pod core seconds from api.ci (the tall job at the same time is a 4.3 build)

https://www.dropbox.com/s/jrpw3bsxclqcb64/Screenshot%202020-03-22%2018.30.38.png?raw=1

Comment 3 errata-xmlrpc 2020-05-13 22:01:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581