Bug 2035046

Summary: SNO: Recover Platform CPU by Reducing the Kubelet Service Monitor Scrape Interval
Product: OpenShift Container Platform Reporter: Ken Young <keyoung>
Component: Telco EdgeAssignee: Nahian <npathan>
Telco Edge sub component: RAN QA Contact: yliu1
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: high    
Priority: unspecified CC: bwensley
Version: 4.10   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: OCP 4.11 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-12-13 20:29:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ken Young 2021-12-22 19:13:22 UTC
Description of problem:

An SNO deployed at the Telco Far Edge allocated limited CPU for the platform reserving as much of the CPU Cores for the revenue generating workload.  Every opportunity to reduce platform overhead creates more room for revenue generating workload.

A low hanging opportunity to recovery a significant amount of platform core is to reduce the Kubelet Service Monitor Scrape Interval.  This is currently hard coded and the goal is to make this configurable leaving the default behaviour the same.  This would add a mechanism to configure this using annotations leveraging a new feature of Prometheus.

Version-Release number of selected component (if applicable):

4.10

How reproducible:

100%

Steps to Reproduce:
1.  Monitor an SNO up with Prometheus
2.  Measure CPU usage

Actual results:

The current CPU level

Expected results:

A non-trivial CPU usage measurement reduction

Additional info: