Bug 1875275

Summary: Failure to enter full_nohz due to needless SCHED softirqs
Product: Red Hat Enterprise Linux 8 Reporter: Juri Lelli <jlelli>
Component: kernel-rtAssignee: Juri Lelli <jlelli>
kernel-rt sub component: Scheduler QA Contact: Qiao Zhao <qzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: atheurer, bhu, jmario, kcarcia, keyoung, lcapitulino, lgoncalv, mstowell, mtosatti, pauld, qzhao, williams
Version: 8.4Keywords: Reopened, Triaged, ZStream
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-rt-4.18.0-326.rt7.107.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1990272 1990273 (view as bug list) Environment:
Last Closed: 2021-11-09 17:28:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1990272, 1990273    

Description Juri Lelli 2020-09-03 07:58:17 UTC
Description of problem:
Running sysjitter workload on a system running with tuned cpu-partitioning
profile results in an excessive number of interrupts.

Debugging performed in bz1833196 resulted in a set of patches to reduce
perturbation from timers, however it also highlighted that we still need
to fix an additional case related to the scheduler use of SD_LOAD_BALANCE
flag (bz1833196#c38).

This is to track resolution of the remaining issue, either backporting
upstream patches that removes SD_LOAD_BALANCE flag or the RHEL only fix
proposed in the bz above (bz1833196#c48).

Steps to Reproduce:
1. install perf and sysjitter (https://github.com/alexeiz/sysjitter)
2. start cpu-partitioning profile with isolated and nonbalance cpus
3. perf stat -C 4 -e irq_vectors:local_timer_entry taskset --cpu-list 4 ./sysjitter/sysjitter --runtime 10 200 (on an isolated/non balance cpu)

Actual results:
Very high number (thousand) of irq_vectors:local_timer_entry events, e.g.

[root@rt-qe-04 ~]# perf stat -C 4 -e irq_vectors:local_timer_entry taskset --cpu-list 4 ./sysjitter/sysjitter --runtime 10 200
core_i: 4
threshold(ns): 200
cpu_mhz: 2398
runtime(ns): 9995739030
runtime(s): 9.996
int_n: 10005
int_n_per_sec: 1000.927
int_min(ns): 2760
int_median(ns): 6025
int_mean(ns): 6056
int_90(ns): 6086
int_99(ns): 6685
int_999(ns): 6775
int_9999(ns): 8123
int_99999(ns): 12657
int_max(ns): 12657
int_total(ns): 60598000
int_total(%): 0.606

 Performance counter stats for 'CPU(s) 4':

            11,007      irq_vectors:local_timer_entry

Expected results:
Very low number (double digit) of irq_vectors:local_timer_entry events

[root@rt-qe-04 ~]# perf stat -C 4 -e irq_vectors:local_timer_entry taskset --cpu-list 4 ./sysjitter/sysjitter --runtime 10 200
core_i: 4
threshold(ns): 200
cpu_mhz: 2398
runtime(ns): 9995639269
runtime(s): 9.996
int_n: 4
int_n_per_sec: 0.400
int_min(ns): 2807
int_median(ns): 6078
int_mean(ns): 5721
int_90(ns): 10602
int_99(ns): 10602
int_999(ns): 10602
int_9999(ns): 10602
int_99999(ns): 10602
int_max(ns): 10602
int_total(ns): 22887
int_total(%): 0.000

 Performance counter stats for 'CPU(s) 4':

                42      irq_vectors:local_timer_entry

Comment 26 errata-xmlrpc 2021-11-09 17:28:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel-rt security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4140