Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1609862

Summary:

Master CPU saturated by listing CRs too frequently

Product:

OpenShift Container Platform

Reporter:

Clayton Coleman <ccoleman>

Component:

kube-apiserver

Assignee:

Stefan Schimanski <sttts>

Status:

CLOSED WONTFIX

QA Contact:

Xingxing Xia <xxia>

Severity:

medium

Docs Contact:

Priority:

low

Version:

3.11.0

CC:

aos-bugs, jokerman, mfojtik, mmccomas, sttts

Target Milestone:

---

Target Release:

3.11.z

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-05-26 11:03:55 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
logs and kill -ABRT output	none
30s CPU profile captured during run	none

Description Clayton Coleman 2018-07-30 16:51:38 UTC

Created attachment 1471591 [details]
logs and kill -ABRT output

api.ci was upgraded (control plane) to 3.11.  For about 10-20 minutes the master API maxed out on CPU and was very slow.

Captured an ABRT dump and a profile from that time period - saw abnormally high levels of CPU from fetching custom resources / definitions.

Comment 1 Clayton Coleman 2018-07-30 16:52:10 UTC

Created attachment 1471592 [details]
30s CPU profile captured during run

Comment 2 Clayton Coleman 2018-07-30 17:14:16 UTC

I think listing CRDs is slow.  The cluster had two things going on:

1. VPA was listing a CR in individual namespaces 1-2 times a second (which is not a high rate)
2. Prow lists prow jobs in all namespaces every 5 or so seconds - there are 7.5k namespaces

Steve recently bumped the retention of Prow resources.  It looks like CR are just slow enough to be bad.  We should verify that our decode path is not overly broken.

I think this points to how easy it is to get into bad spots with resources - prow was using a naive pattern (should be using an informer) and then it crossed into bad.

Dropping severity.

Comment 3 Clayton Coleman 2018-07-30 17:15:02 UTC

Assigning to Steve to close when he switches prow to an informer.

Comment 4 Michal Fojtik 2020-05-19 13:18:09 UTC

This bug hasn't had any engineering activity in the last ~30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

As such, we're marking this bug as "LifecycleStale".

If you have further information on the current state of the bug, please update it and remove the "LifecycleStale" keyword, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

Comment 5 Michal Fojtik 2020-05-26 11:03:55 UTC

This bug hasn't had any activity 7 days after it was marked as LifecycleStale, so we are closing this bug as WONTFIX. If you consider this bug still valuable, please reopen it or create new bug.

Comment 6 Red Hat Bugzilla 2023-09-14 04:32:21 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days