Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
DescriptionLuiz Capitulino
2016-06-20 20:54:53 UTC
Description of problem:
When executing sysjitter on the RT kernel with the nfv tuned profile applied, the maximum latency is 5x higher than the non-RT kernel (which also have the nfv tuned profile applied):
3.10.0-430.rt56.316.el7.x86_64
------------------------------
int_max(ns): 972369 45512 13058424 46733 6313047 46932 4834804 45903 46667 47638 48078 46666 46197 48170 47511 48357
3.10.0-430.el7.x86_64
---------------------
int_max(ns): 6173124 17081 11391642 9792 10461025 9041 9409563 8890 9389 9372 8991 9341 9591 9296 9015 9872
Each value in the in_max(ns) line represents a CPU from 0 to 15. CPUs with values greater than 5 digits are for housekeeping, so we can ignore those. If we compare CPU15 on both kernels we get:
RT kernel = 48us
non-RT kernel = 9.8us
The latency is almost 5x higher in the RT kernel.
Version-Release number of selected component (if applicable): 3.10.0-430.rt56.316.el7.x86_64
How reproducible:
Steps to Reproduce:
1. Setup system for real-time and apply the nfv tuned profile
2. Download sysjitter from http://www.openonload.org/download/sysjitter/
3. Build and run it with:
# ./sysjitter --runtime 60 300
NOTE: I was debugging this issue and will post the information that I have as a comment. I'm creating this BZ because I'll go out on vacation for 3 weeks. It's not a huge deal to wait for me to come back to keep debugging this (I'd actually love to do this)
According to my debugging, the problem is that the ktimersoftd kernel thread preempts the sysjitter thread and executes for +40us. The ktimersoftd thread spends most of this 40us in the outer loop of __run_timers() where it runs almost 10K iterations:
sysjitter-15844 [015] [015] 26121.947647: sched_switch: prev_comm=sysjitter prev_pid=15844 prev_prio=120 prev_state=R+ ==> next_comm=kworker/15:1 next_pid=11873 next_prio=120
kworker/15:1-11873 [015] [015] 26121.947653: sched_switch: prev_comm=kworker/15:1 prev_pid=11873 prev_prio=120 prev_state=S ==> next_comm=sysjitter next_pid=15844 next_prio=120
sysjitter-15844 [015] [015] 26131.931832: sched_switch: prev_comm=sysjitter prev_pid=15844 prev_prio=120 prev_state=R+ ==> next_comm=ktimersoftd/15 next_pid=165 next_prio=98
ktimersoftd/15-165 [015] [015] 26131.931833: funcgraph_entry: | run_timer_softirq() {
ktimersoftd/15-165 [015] [015] 26131.931833: bputs: run_timer_softirq: calling irq_work
ktimersoftd/15-165 [015] [015] 26131.931833: bputs: run_timer_softirq: done irq_work
ktimersoftd/15-165 [015] [015] 26131.931834: bputs: run_timer_softirq: starting main loop
ktimersoftd/15-165 [015] [015] 26131.931876: timer_expire_entry: timer=0xffff88086d3d3da0 function=delayed_work_timer_fn now=4320798976
ktimersoftd/15-165 [015] [015] 26131.931877: timer_expire_exit: timer=0xffff88086d3d3da0
ktimersoftd/15-165 [015] [015] 26131.931878: bprint: run_timer_softirq: exiting main loop: inner 1 outer: 9986
See how it takes 42us between the printing of "starting main loop" (which is just before the first while() loop in __run_timers()) and the calling of delayed_work_timer_fn(). Also, before exiting we see that the first while loop was iterated 9986 times.
I can't reproduce this issue with kernel-rt-3.10.0-505.rt56.409. Latencies are max=16us. Closing as WORKSFORME (which is strange, since I opened the BZ myself).