RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1395899 - cpu-partitioning: set workqueue affinity early
Summary: cpu-partitioning: set workqueue affinity early
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: tuned
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Jaroslav Škarvada
QA Contact: Tereza Cerna
Lenka Kimlickova
URL:
Whiteboard:
Depends On: 1395860 1414098
Blocks: 1394932 1400961
TreeView+ depends on / blocked
 
Reported: 2016-11-16 21:35 UTC by Luiz Capitulino
Modified: 2017-08-01 12:32 UTC (History)
5 users (show)

Fixed In Version: tuned-2.8.0-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 12:32:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2102 0 normal SHIPPED_LIVE tuned bug fix and enhancement update 2017-08-01 16:07:33 UTC

Description Luiz Capitulino 2016-11-16 21:35:38 UTC
The cpu-partitioning profile should set the workqueue affinity mask (/sys/devices/virtual/workqueue/cpumask) very early in boot, right when the initrd takes control. This avoids module initialization code from running on CPUs isolated with CPUAffinity, which can create long lived timers that will fire later on on those CPUs.

The following procedure creates a dracut module to do that (which should be adopted by the cpu-partitioning profile):

1. # mkdir /usr/lib/dracut/modules.d/01cpumask

2. Create module-setup.sh with the following contents:

#!/bin/bash

check() {
        return 0
}

depends() {
        return 0
}

install() {
        inst_hook pre-udev 00 "$moddir/workqueue-cpumask.sh"
}

3. Create workqueue-cpumask.sh with the following contents:

#!/bin/bash

cpumask=MASK
file=/sys/devices/virtual/workqueue/cpumask

log()
{
        echo "$1" >> /dev/kmsg
}

if [ ! -f $file ]; then
        log "ERROR: could not write cpumask"
        return
fi

if ! echo $cpumask > $file; then
       log "ERROR: could not write cpumask"
fi

4. Make both files executable and re-generate the initrd with dracut, so that they are included in the initrd image

A few important notes:

1. "MASK" should be a list of housekeeping CPUs
2. The log() function should be improved with better error messages and maybe a different log location

Comment 1 Jeremy Eder 2016-11-17 12:48:39 UTC
Hmm, are you proposing this be part of tuned itself? I can see other profiles requiring the same things, so I'd hope so.

Comment 2 Luiz Capitulino 2016-11-17 13:50:58 UTC
My initial idea was to make it part of the profile. That is, script.sh could implement the steps from the description in start() and then stop() could just remove 01cpumask and re-generate the initrd image.

But having it as part of tuned looks like a good idea too. I think the only problem would be to come up with a good abstraction in tuned.conf for setting this up given that dracut modules have several different options as to when the script is run in initrd.

Comment 3 Jeremy Eder 2016-11-17 13:56:51 UTC
Understood.  To me this sounds like "add dracut support to tuned" and then the first user of said feature would be this bugzilla.

Comment 4 Jaroslav Škarvada 2017-03-01 15:51:50 UTC
I think this could be smartly resolved by static initrd overlay (with help of the RFE bug 1414098) together with the cpuaffinity kernel/systemd boot option (which was proposed in https://github.com/systemd/systemd/issues/5368). Even now when CPUAffinity is not supported by systemd, it can be introduced in our scripts, e.g.:

[bootloader]
cmdline=cpuaffinity=DESIRED_AFFINITY
initrd_add_dir=tuned-overlay.img

$ cat tuned-overlay.img/usr/lib/dracut/hooks/pre-udev/00-tuned-pre-udev.sh
#!/bin/sh

type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh

cpuaffinity="$(getargs cpuaffinity)"

echo "tuned setting CPUAffinity to $(getargs cpuaffinity)" > /dev/kmsg

----
Well you need CPUMask, so we could either calculate CPUMask from the CPUAffinity in the script, or just inject the CPUMask to the cmdline instead of the CPUAffinity, e.g.:

[bootloader]
cmdline=cpumask=DESIRED_CPUMASK
initrd_add_dir=tuned-overlay.img

$ cat tuned-overlay.img/usr/lib/dracut/hooks/pre-udev/00-tuned-pre-udev.sh
#!/bin/sh

type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh

cpumask="$(getargs cpumask)"

file=/sys/devices/virtual/workqueue/cpumask

log()
{
        echo "$1" >> /dev/kmsg
}

if ! echo $cpumask > $file 2>/dev/null; then
       log "ERROR: could not write cpumask"
fi

Not to cause conflicts we can set the cmdline name to e.g. tuned.cpumask or tuned-cpumask or whatever. This all should now work out of the box.

Also you can pre-generate initrd.img fully statically and use the following:

[bootloader]
cmdline=cpumask=DESIRED_CPUMASK
initrd_add_img=tuned-overlay.img

I.e. Tuned will not re-generate the tuned-overlay.img, because it's not needed.

Comment 5 Jaroslav Škarvada 2017-03-03 18:06:21 UTC
Upstream commit introducing functionality mentioned in comment 4:
https://github.com/redhat-performance/tuned/commit/64decfe7568007cb2fde3a679ef05e366014214f

Available in:
tuned-2.7.1-1.20170303git64decfe7.el7

from:
https://jskarvad.fedorapeople.org/tuned/devel/repo/

Comment 6 Luiz Capitulino 2017-03-06 19:19:15 UTC
This is working as expected. I can see on dmesg that the workqueue cpumask is being set. Also, initial zero-loss test-case passed. I'm now running a long duration test.

Comment 7 Luiz Capitulino 2017-03-07 14:02:10 UTC
My tests passed, very well done!

The only detail about this one is that, I think we should change the kernel parameter "tuned.cpumask" to something more meaningful. This parameter will be visible for the entire life time of the system and "cpumask" is too generic.

Here are some suggestions (in order of preference):

tuned.housekeeping_mask
tuned.worqueue_mask
tuned.early_workqueue_mask
tuned.non_isolcpus

Comment 8 Jaroslav Škarvada 2017-03-07 14:17:51 UTC
(In reply to Luiz Capitulino from comment #7)
> My tests passed, very well done!
> 
> The only detail about this one is that, I think we should change the kernel
> parameter "tuned.cpumask" to something more meaningful. This parameter will
> be visible for the entire life time of the system and "cpumask" is too
> generic.
> 
> Here are some suggestions (in order of preference):
> 
> tuned.housekeeping_mask
> tuned.worqueue_mask
> tuned.early_workqueue_mask
> tuned.non_isolcpus

I like the most:
tuned.non_isolcpus

because we can reuse it later for other things as well theoretically not only for teh worqueue_mask.

Comment 9 Luiz Capitulino 2017-03-07 14:22:03 UTC
tuned.non_isolcpus it is :)

Comment 10 Jaroslav Škarvada 2017-03-07 14:40:49 UTC
Upstream commit changing the kernel boot command line parameter to tuned.non_isolcpus:
https://github.com/redhat-performance/tuned/commit/6906738603b92f34f042486a8e9e924ac8531a2a

Available in:
tuned-2.7.1-1.20170307git69067386.el7

Comment 12 Tereza Cerna 2017-04-26 10:21:23 UTC
Verified in:
    tuned-2.8.0-2.el7.noarch
PASS


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: Check setting of workqueue affinity [BZ#1395899]
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   PASS   ] :: Check message in dmesg (Expected 0, got 0)
:: [   PASS   ] :: Check isolates cores in cpumask file (Expected 0, got 0)
:: [   LOG    ] :: Duration: 1s
:: [   LOG    ] :: Assertions: 2 good, 0 bad
:: [   PASS   ] :: RESULT: Check setting of workqueue affinity [BZ#1395899]

Comment 16 errata-xmlrpc 2017-08-01 12:32:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2102


Note You need to log in before you can comment on or make changes to this bug.