Bug 1919964

Summary: [RFE] Reduce stalld CPU usage: do not parse almost idle CPUs
Product: Red Hat Enterprise Linux 8 Reporter: Daniel Bristot de Oliveira <daolivei>
Component: stalldAssignee: Clark Williams <williams>
Status: CLOSED ERRATA QA Contact: Mark Simmons <msimmons>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.4CC: bhu, dhellmann, jlelli, kcarcia, msimmons, mstowell, williams
Target Milestone: rcKeywords: FutureFeature, Triaged
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: stalld-1.9-2.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-18 16:06:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1898189    

Description Daniel Bristot de Oliveira 2021-01-25 13:32:23 UTC
Description of problem:

On large systems, stalld is consuming a noticeable amount of CPU time, even when the system is idle.

This happens because stalld parses the /proc/sched_debug for all CPUs, even when the system is almost idle. This can be improved by skipping the CPUs' parse that had some idle time since the last check.

Version-Release number of selected component (if applicable):
1.4

How reproducible:
Always

Steps to Reproduce:
1. Start the system and watch it on top tool

Actual results:
Considerable CPU usage


Expected results:
Low CPU usage when the system is not at risk of facing starvation.

Additional info:

Comment 2 Daniel Bristot de Oliveira 2021-02-25 10:47:13 UTC
The 1.9 version of stalld implements idle detection by parsing the (small) /proc/stat.

It works by skipping:
	- Reading the huge /proc/sched_debug if all CPUs had idle time.
	- Reading the huge /proc/sched_debug if a CPU with its own stalld thread had idle time.
	- Parsing the sched_debug buffer for CPUs with idle time.

This feature reduces the stalld cpu usage to the point that it becomes almost invisible on the top tool when the system is idle.

More information can be found in this commit.
https://gitlab.com/rt-linux-tools/stalld/-/commit/b2866594aaea3369abfbed8da3af16d66cdb4d99

Clark has more info about the inclusion of this version on RHEL.

Comment 10 errata-xmlrpc 2021-05-18 16:06:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (stalld bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:1918