Description of problem: Tuned restart unloads a profile and reverts all values to defaults and then configures the profile again. This causes issues on SR-IOV enabled systems where the queue count is configured via a tuned profile. Tuned restart reverts the queue count to the amount of cpus and then reduces it again to the configured value. This change causes SR-IOV device reset and applications using it get confused and lose packets. The same happens when the profile changes, but that is not as disruptive, because it is something the administrator did and so he should know the consequences. Version-Release number of selected component (if applicable): RHEL 9.2, but all of the versions really How reproducible: Always, configure sysctls, cfs values, cpu affinity or nic queue counts and restart tuned. Actual results: Workloads are disrupted. Expected results: No disruption of values that have not changed. Additional info: More information: This restart can happen during OCP upgrade (possibly during RHEL's yum update too?) of the master nodes and the effect on the worker nodes is unexpected. OCP seldom upgrades profiles without a reboot, so tuned pretty much always starts with a clean system. Manual or upgrade related tuned restart therefore does not need the rollback, because after reboot it will be configuring a clean system again. In other words: we can configure tuned to ignore the rollback just for this use case if this is made configurable.
Actually, a configuration option to explicityl disable the rollback on TuneD shutdown might be enough. Still a WiP (needs more testing on RHOCP), but feel free to test and review. Should already work: https://github.com/redhat-performance/tuned/pull/533
Hi, Adding to Juri's bump of severity and priority I would like to add some additional context that this fix is needed as part of a larger set of fixes to address latency performance for our Telco RAN solution on OCP 4.13. This is currently blocking our key partners from being able to use 4.13 and we are under pressure to give them a target OCP 4.13.z release when these will be fixed. I've gone ahead and requested this be backported to 9.2 as soon as possible. Any priority you could give to get this verified in 9.3 and backported would be greatly appreciated by the Telco program.
Hi Bryan, thank you for the additional context. (In reply to Bryan Litton from comment #6) > I've gone ahead and requested this be backported to 9.2 as soon as possible. > Any priority you could give to get this verified in 9.3 and backported would > be greatly appreciated by the Telco program. If the request is to ship this feature in OCP only, then I believe we do not need to backport to RHEL 9.2 as OCP uses TuneD via FDP (Fast Data Path).