Bug 1940057

Summary: Openshift builds should use a wach instead of polling when checking for pod status
Product: OpenShift Container Platform Reporter: aaleman
Component: BuildAssignee: Otávio Fernandes <olemefer>
Status: CLOSED ERRATA QA Contact: XiuJuan Wang <xiuwang>
Severity: high Docs Contact:
Priority: high    
Version: 4.7CC: adam.kaplan, aos-bugs, mharri, pmuller, skuznets, sttts, vrutkovs, wking
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 22:53:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description aaleman 2021-03-17 14:16:02 UTC
Description of problem:

Openshift builds seem to poll pods, which creates load issues on the APIServer when having a large number of builds. This was a contributing factor to the build farm outage on March 11th and 12th. Instead of polling, they should use a watch, similiar to what is here: https://github.com/openshift/ci-tools/pull/1784/files#diff-9f3405a81a6f10db7c3377444f624e921a620cbfaae2df4340abe708aa9cff2cR247

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Comment 1 Adam Kaplan 2021-03-17 14:27:51 UTC
Note that the polling issue here is limited to the openshift-apiserver; openshift-controller-manager does not seem to have this problem.

Comment 3 Adam Kaplan 2021-03-18 12:30:49 UTC
*** Bug 1939730 has been marked as a duplicate of this bug. ***

Comment 5 Otávio Fernandes 2021-06-07 15:17:34 UTC
Pull-request (#206) is under review, we should have one more round of LGTM and most probably will be approved. I'll this BZ ASAP.

Comment 6 Otávio Fernandes 2021-06-08 14:26:58 UTC
We have reviewed the pull-request, however the we are still working on the CI run failures, non-related to the changes. To be continued.

Comment 8 XiuJuan Wang 2021-06-10 10:09:38 UTC
Verified this bug before pr merged and nightly build 4.8.0-0.nightly-2021-06-09-214128

Comment 11 errata-xmlrpc 2021-07-27 22:53:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.