Bug 2231437

Summary: disk plug-in scalability issue
Product: Red Hat Enterprise Linux 9 Reporter: Jiří Mencák <jmencak>
Component: tunedAssignee: Jaroslav Škarvada <jskarvad>
Status: NEW --- QA Contact: Robin Hack <rhack>
Severity: high Docs Contact:
Priority: high    
Version: 9.2CC: jeder, jorton, jskarvad
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jiří Mencák 2023-08-11 14:36:41 UTC
Description of problem:
TuneD has a scalability issue when using the [disk] plug-in, which causes TuneD profile application taking a very long time.  With 2000 block devices, it can be 1-2 minutes.  This causes issues especially in OpenShift, where the Node Tuning Operator (NTO) waits for a profile application up to 1 minute and then kills TuneD and starts an exponential backoff giving TuneD more and more time to apply a profile.  Customers can then see TuneD restarts and taking a profile application up to 5 minutes.

One of the problems is that TuneD unnecessarily checks for APM using the hdparm command even if APM is not used by profiles.  Executing hdparm on each of the block devices takes quite a while and for hundreds (let alone thousands) of block devices this takes simply too lonw.

Version-Release number of selected component (if applicable):
All

How reproducible:
Always

Steps to Reproduce:
1. Use a system with 2k block devices.  If you don't have one handy, use the following script to create them:

for d in $(seq 1 2000)
do
  dmsetup create dummy$d --table '0 4092 zero'
#  dmsetup remove dummy$d
done

2. Use the throughput-performance profile which uses the [disk] plug-in.

Actual results:
TuneD takes 1-2 minutes to apply the profile.

Expected results:
TuneD takes max. 1s to apply the profile.

Additional info:
Problematic code: https://github.com/redhat-performance/tuned/blob/f4c976f2f5b0ddd541922ce54ed3ae7f4dbc0f84/tuned/plugins/plugin_disk.py#L107

Associated bugs: https://issues.redhat.com/browse/OCPBUGS-17531
Associated customer case: https://access.redhat.com/support/cases/#/case/03570928