Bug 2080227

Summary: FD leak limiting the ability to switch the profile
Product: Red Hat Enterprise Linux 8 Reporter: Christophe Besson <cbesson>
Component: tunedAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Robin Hack <rhack>
Severity: low Docs Contact:
Priority: low    
Version: 8.5CC: jeder, jskarvad, sbalasub
Target Milestone: rcKeywords: Patch, Reproducer, Triaged, Upstream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: tuned-2.20.0-0.1.rc1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-16 09:12:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Christophe Besson 2022-04-29 09:50:51 UTC
Description of problem:
Customer reported that tuned doesn't seem to be able to handle too many profile switches.

Version-Release number of selected component (if applicable):
tuned-2.16.0-1.el8

How reproducible:
Always

Steps to Reproduce:
1. Run this loop:
while true; do tuned-adm profile virtual-guest; done
2. Wait a few minutes to reproduce the issue. You can monitor the nofile for the tuned pid with lsof to see when it will fail.

Actual results:
# tuned-adm profile virtual-guest
Requested profile 'virtual-guest' doesn't exist.

Additional info:
- the profile cannot be read because the nofile limit is reached.
- restarting the daemon fixes the issue.
- disabling the CPU plugin workarounds the issue.
- some output from my reproducer:

tuned.log after doing a loop
~~~
2022-04-28 09:50:53,005 INFO     tuned.plugins.plugin_cpu: We are running on an x86 GenuineIntel platform
2022-04-28 09:50:53,005 ERROR    tuned.utils.commands: Executing x86_energy_perf_policy error: [Errno 24] Too many open files
2022-04-28 09:50:53,005 WARNING  tuned.plugins.plugin_cpu: unable to run x86_energy_perf_policy tool, ignoring CPU energy performance bias, is the tool installed?
2022-04-28 09:50:53,008 INFO     tuned.plugins.base: instance disk: assigning devices vdb, vda, dm-1, dm-0, loop0
2022-04-28 09:54:19,298 ERROR    tuned.daemon.controller: Failed to apply profile 'virtual-guest'
2022-04-28 09:54:22,154 ERROR    tuned.daemon.controller: Failed to apply profile 'virtual-guest'
2022-04-28 09:54:40,129 ERROR    tuned.daemon.controller: Failed to apply profile 'virtual-guest'
2022-04-28 09:57:12,285 ERROR    tuned.daemon.controller: Failed to apply profile 'virtual-guest'
~~~

tuned (PID 983) remains active but does not work anymore (needs a restart).

# ulimit -n
1024

# ls -1 /proc/983/fd/* | wc -l
1024

# ls -l /proc/983/fd/99*
lrwx------. 1 root root 64 Apr 28 09:51 /proc/983/fd/99 -> 'anon_inode:[perf_event]'
lrwx------. 1 root root 64 Apr 28 09:51 /proc/983/fd/990 -> 'anon_inode:[perf_event]'
lrwx------. 1 root root 64 Apr 28 09:51 /proc/983/fd/991 -> 'anon_inode:[perf_event]'
lrwx------. 1 root root 64 Apr 28 09:51 /proc/983/fd/992 -> 'anon_inode:[perf_event]'
lrwx------. 1 root root 64 Apr 28 09:51 /proc/983/fd/993 -> 'anon_inode:[perf_event]'
lrwx------. 1 root root 64 Apr 28 09:51 /proc/983/fd/994 -> 'anon_inode:[perf_event]'
lrwx------. 1 root root 64 Apr 28 09:51 /proc/983/fd/995 -> 'anon_inode:[perf_event]'
lrwx------. 1 root root 64 Apr 28 09:51 /proc/983/fd/996 -> 'anon_inode:[perf_event]'
lrwx------. 1 root root 64 Apr 28 09:51 /proc/983/fd/997 -> 'anon_inode:[perf_event]'
lrwx------. 1 root root 64 Apr 28 09:51 /proc/983/fd/998 -> 'anon_inode:[perf_event]'
lrwx------. 1 root root 64 Apr 28 09:51 /proc/983/fd/999 -> 'anon_inode:[perf_event]'

# lsof -a -p 983 | grep -c a_inode
1013

tuned   983 root 1009u  a_inode               0,14        0    10334 [perf_event]
tuned   983 root 1010u  a_inode               0,14        0    10334 [perf_event]
tuned   983 root 1011u  a_inode               0,14        0    10334 [perf_event]
tuned   983 root 1012u  a_inode               0,14        0    10334 [perf_event]
tuned   983 root 1013u  a_inode               0,14        0    10334 [perf_event]
tuned   983 root 1014u  a_inode               0,14        0    10334 [perf_event]
tuned   983 root 1015u  a_inode               0,14        0    10334 [perf_event]
tuned   983 root 1016u  a_inode               0,14        0    10334 [perf_event]
tuned   983 root 1017u  a_inode               0,14        0    10334 [perf_event]
tuned   983 root 1018u  a_inode               0,14        0    10334 [perf_event]
tuned   983 root 1019u  a_inode               0,14        0    10334 [perf_event]
tuned   983 root 1020u  a_inode               0,14        0    10334 [perf_event]

Comment 1 Jaroslav Škarvada 2022-05-06 09:12:11 UTC
This is probably leak from the tools TuneD executes, maybe we could improve it with the CLOEXEC.

Comment 8 Jaroslav Škarvada 2023-02-08 15:19:56 UTC
Upstream PR:
https://github.com/redhat-performance/tuned/pull/439

Comment 19 errata-xmlrpc 2023-05-16 09:12:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (tuned bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3062