RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2203142 - RFE: Allow skipping rollback when changing profile or restarting tuned
Summary: RFE: Allow skipping rollback when changing profile or restarting tuned
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: tuned
Version: 9.2
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: rc
: ---
Assignee: Jaroslav Škarvada
QA Contact: Robin Hack
URL:
Whiteboard:
Depends On:
Blocks: 2188812 2188934
TreeView+ depends on / blocked
 
Reported: 2023-05-11 10:13 UTC by Martin Sivák
Modified: 2023-11-07 11:40 UTC (History)
11 users (show)

Fixed In Version: tuned-2.21.0-0.1.rc1.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-07 08:56:19 UTC
Type: Story
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OCPBUGS-13065 0 None None None 2023-05-11 10:16:13 UTC
Red Hat Issue Tracker RHELPLAN-156938 0 None None None 2023-05-11 10:18:35 UTC
Red Hat Product Errata RHBA-2023:6703 0 None None None 2023-11-07 08:56:30 UTC

Description Martin Sivák 2023-05-11 10:13:53 UTC
Description of problem:

Tuned restart unloads a profile and reverts all values to defaults and then configures the profile again.

This causes issues on SR-IOV enabled systems where the queue count is configured via a tuned profile. Tuned restart reverts the queue count to the amount of cpus and then reduces it again to the configured value. This change causes SR-IOV device reset and applications using it get confused and lose packets.


The same happens when the profile changes, but that is not as disruptive, because it is something the administrator did and so he should know the consequences.

Version-Release number of selected component (if applicable):

RHEL 9.2, but all of the versions really

How reproducible:

Always, configure sysctls, cfs values, cpu affinity or nic queue counts and restart tuned.

Actual results:

Workloads are disrupted.

Expected results:

No disruption of values that have not changed.
Additional info:


More information:

This restart can happen during OCP upgrade (possibly during RHEL's yum update too?) of the master nodes and the effect on the worker nodes is unexpected.

OCP seldom upgrades profiles without a reboot, so tuned pretty much always starts with a clean system. Manual or upgrade related tuned restart therefore does not need the rollback, because after reboot it will be configuring a clean system again.

In other words: we can configure tuned to ignore the rollback just for this use case if this is made configurable.

Comment 4 Jiří Mencák 2023-05-22 14:51:54 UTC
Actually, a configuration option to explicityl disable the rollback on TuneD shutdown might be enough.

Still a WiP (needs more testing on RHOCP), but feel free to test and review.  Should already work:
https://github.com/redhat-performance/tuned/pull/533

Comment 6 Bryan Litton 2023-05-25 12:31:34 UTC
Hi,

Adding to Juri's bump of severity and priority I would like to add some additional context that this fix is needed as part of a larger set of fixes to address latency performance for our Telco RAN solution on OCP 4.13. This is currently blocking our key partners from being able to use 4.13 and we are under pressure to give them a target OCP 4.13.z release when these will be fixed. 

I've gone ahead and requested this be backported to 9.2 as soon as possible. Any priority you could give to get this verified in 9.3 and backported would be greatly appreciated by the Telco program.

Comment 8 Jiří Mencák 2023-05-26 11:13:18 UTC
Hi Bryan,
thank you for the additional context.

(In reply to Bryan Litton from comment #6)
> I've gone ahead and requested this be backported to 9.2 as soon as possible.
> Any priority you could give to get this verified in 9.3 and backported would
> be greatly appreciated by the Telco program.

If the request is to ship this feature in OCP only, then I believe we do not need to backport
to RHEL 9.2 as OCP uses TuneD via FDP (Fast Data Path).

Comment 28 errata-xmlrpc 2023-11-07 08:56:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (tuned bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6703


Note You need to log in before you can comment on or make changes to this bug.