RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2148990 - Stopping or Starting tuned makes the network interface becomes unreachable for a few seconds
Summary: Stopping or Starting tuned makes the network interface becomes unreachable fo...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: tuned
Version: 9.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Jaroslav Škarvada
QA Contact: Robin Hack
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-28 12:42 UTC by Renaud Métrich
Modified: 2023-05-24 05:59 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-140646 0 None None None 2022-11-28 12:45:19 UTC

Description Renaud Métrich 2022-11-28 12:42:36 UTC
Description of problem:

When using the "realtime" profile, the following directive is used in the configuration:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
[net]
channels=combined ${f:check_net_queue_count:${netdev_queue_count}}
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

This makes *tuned* reconfigure all network interfaces to only have 1 RX queue, e.g.:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Initially:

# ls /sys/class/net/eno1/queues/
rx-0  rx-1  rx-2  rx-3  tx-0

After tuned started:

# ls /sys/class/net/eno1/queues/
rx-0  tx-0
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

The tuning is restored upon stopping the service, or when a system shutdown happens.

The reconfiguration makes the devices become unavailable for a few seconds, which breaks applications requiring the network, e.g.:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# systemd-run -u ping_quick.service -- ping  -i 0.1 -I eno1 -c 1000 8.8.8.8

# systemctl stop tuned

# journalctl -u ping_quick.service
[...]
Nov 28 07:38:20 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: 64 bytes from 8.8.8.8: icmp_seq=126 ttl=50 time=26.0 ms
Nov 28 07:38:20 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: 64 bytes from 8.8.8.8: icmp_seq=127 ttl=50 time=25.9 ms
Nov 28 07:38:20 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: 64 bytes from 8.8.8.8: icmp_seq=128 ttl=50 time=25.9 ms
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=131 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=132 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=133 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=134 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=135 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=136 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=137 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=138 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=139 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=140 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=141 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: From 10.16.211.59 icmp_seq=142 Destination Host Unreachable
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: 64 bytes from 8.8.8.8: icmp_seq=143 ttl=50 time=661 ms
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: 64 bytes from 8.8.8.8: icmp_seq=144 ttl=50 time=553 ms
Nov 28 07:38:24 dell-per6515-01.khw2.lab.eng.bos.redhat.com ping[2609]: 64 bytes from 8.8.8.8: icmp_seq=145 ttl=50 time=449 ms
[...]
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

This is a real issue when the system is brought down, because *tuned* stops in parallel to other services requiring the network, which makes the services fail instead of stopping properly.

Version-Release number of selected component (if applicable):

tuned-2.19.0-1.el9.noarch
tuned-profiles-realtime-2.19.0-1.el9.noarch

How reproducible:

Always

Steps to Reproduce:
1. Use a system which has a "tg3" interface (e.g. PowerEdge R6515 or R530)
2. Start the "realtime" profile
3. Stop tuned

Actual results:

Network dead for some time

Expected results:

Network functional or blocking

Comment 1 Jaroslav Škarvada 2023-01-04 00:28:43 UTC
Is there a way how to prevent it from happening?

If there is no way we could workaround it by unconfiguring it only on the so called "full rollback" which happens if TuneD service is uninstalled or manually stopped or TuneD profile is changed. I.e. after this workaround it will not happen on machine shutdown. It will probably still happen on TuneD start.

Other than that we could probably only drop the tuning or make it conditional (e.g. controllable by some configuration variable).

I will add Nitesh who implemented this feature.

Comment 2 Jaroslav Škarvada 2023-01-05 13:14:31 UTC
From discussion with Nitesh:

> - Do we know what tuned was executing exactly when the network packets were dropped?
It seems ethtool.

> - What happens if instead of configuring the queue count equivalent to HK we provide a custom queue count value that is more than the initial value?
I don't know, I need to setup testing environment.

> - We can also manually change the queue count with ethtool and parallely execute the script.
I am afraid that this will not help, but I am going to test.

> - Wrt to the workaround that you suggested, isn't the tuning already conditional and can be removed from tuned.conf?
IMHO currently if user specifies invalid value of 'netdev_queue_count'
it should fail with error and not call ethtool, but I meant some more
clean solution to skip the tuning (just by changing the
realtime-variables.conf and without customization of the whole
profile).

Comment 3 Jaroslav Škarvada 2023-01-05 13:16:21 UTC
I think we can currently use the full-rollback option (i.e. not doing the unconfiguration on reboot) and add configuration parameter to cleanly disable this tuning.

Other workarounds are currently unknown to me.

Comment 4 Jan Žerdík 2023-01-24 10:21:10 UTC
Hi. Yes, this behavior is caused by calling ethtool. You can easily simulate it by running something like "ethtool -L eno1 rx 4 tx 4" (this is example what we call from TuneD, you can see it in log if you run TuneD with debug log). I didn't find option to run ethtool without restarting net interface. We can implement workarounds mentioned above.
And what did you mean by "Network blocking"?

Comment 5 Renaud Métrich 2023-01-24 12:34:10 UTC
By "Network blocking" I would mean packets are not answered by "Destination not reachable", but enqueued until network is re-established.

Comment 6 Jan Žerdík 2023-02-20 16:46:10 UTC
Hi. I implemented workaround that allows to skip the change of channels parameter. 

https://github.com/redhat-performance/tuned/pull/511

Is it enough for you? The workaround mentioned above without full rollback would unnecessarily complicate plugin code by from the point of my view.

Comment 8 Renaud Métrich 2023-05-24 05:59:44 UTC
Hi Jan,

Sorry for the delay, the customer closed the case after we provided him a solution suiting his needs.
I'm wondering if not restoring the system default's when stopping wouldn't be the solution in the end.

This would solve the case on shutdown, but also not harm anything on tuned restart: if a new profile is selected and tuned restarted to apply the new profile, then changes will be made when starting the new profile, instead of current worflow: restore default settings on tuned stop, then apply new settings on tuned start.

What do you think?


Note You need to log in before you can comment on or make changes to this bug.