Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Description of problem:
Tuned restart unloads a profile and reverts all values to defaults and then configures the profile again.
This causes issues on SR-IOV enabled systems where the queue count is configured via a tuned profile. Tuned restart reverts the queue count to the amount of cpus and then reduces it again to the configured value. This change causes SR-IOV device reset and applications using it get confused and lose packets.
The same happens when the profile changes, but that is not as disruptive, because it is something the administrator did and so he should know the consequences.
Version-Release number of selected component (if applicable):
RHEL 9.2, but all of the versions really
How reproducible:
Always, configure sysctls, cfs values, cpu affinity or nic queue counts and restart tuned.
Actual results:
Workloads are disrupted.
Expected results:
No disruption of values that have not changed.
Additional info:
More information:
This restart can happen during OCP upgrade (possibly during RHEL's yum update too?) of the master nodes and the effect on the worker nodes is unexpected.
OCP seldom upgrades profiles without a reboot, so tuned pretty much always starts with a clean system. Manual or upgrade related tuned restart therefore does not need the rollback, because after reboot it will be configuring a clean system again.
In other words: we can configure tuned to ignore the rollback just for this use case if this is made configurable.
Actually, a configuration option to explicityl disable the rollback on TuneD shutdown might be enough.
Still a WiP (needs more testing on RHOCP), but feel free to test and review. Should already work:
https://github.com/redhat-performance/tuned/pull/533
Hi,
Adding to Juri's bump of severity and priority I would like to add some additional context that this fix is needed as part of a larger set of fixes to address latency performance for our Telco RAN solution on OCP 4.13. This is currently blocking our key partners from being able to use 4.13 and we are under pressure to give them a target OCP 4.13.z release when these will be fixed.
I've gone ahead and requested this be backported to 9.2 as soon as possible. Any priority you could give to get this verified in 9.3 and backported would be greatly appreciated by the Telco program.
Hi Bryan,
thank you for the additional context.
(In reply to Bryan Litton from comment #6)
> I've gone ahead and requested this be backported to 9.2 as soon as possible.
> Any priority you could give to get this verified in 9.3 and backported would
> be greatly appreciated by the Telco program.
If the request is to ship this feature in OCP only, then I believe we do not need to backport
to RHEL 9.2 as OCP uses TuneD via FDP (Fast Data Path).
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (tuned bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2023:6703