RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1348337 - sysjitter gives higher latency on the RT kernel
Summary: sysjitter gives higher latency on the RT kernel
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel-rt
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Luiz Capitulino
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 1175461
TreeView+ depends on / blocked
 
Reported: 2016-06-20 20:54 UTC by Luiz Capitulino
Modified: 2016-09-09 17:33 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-09 17:33:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Luiz Capitulino 2016-06-20 20:54:53 UTC
Description of problem:

When executing sysjitter on the RT kernel with the nfv tuned profile applied, the maximum latency is 5x higher than the non-RT kernel (which also have the nfv tuned profile applied):

3.10.0-430.rt56.316.el7.x86_64
------------------------------
int_max(ns): 972369 45512 13058424 46733 6313047 46932 4834804 45903 46667 47638 48078 46666 46197 48170 47511 48357

3.10.0-430.el7.x86_64
---------------------
int_max(ns): 6173124 17081 11391642 9792 10461025 9041 9409563 8890 9389 9372 8991 9341 9591 9296 9015 9872

Each value in the in_max(ns) line represents a CPU from 0 to 15. CPUs with values greater than 5 digits are for housekeeping, so we can ignore those. If we compare CPU15 on both kernels we get:

RT kernel = 48us
non-RT kernel = 9.8us

The latency is almost 5x higher in the RT kernel.

Version-Release number of selected component (if applicable): 3.10.0-430.rt56.316.el7.x86_64


How reproducible:


Steps to Reproduce:
1. Setup system for real-time and apply the nfv tuned profile
2. Download sysjitter from http://www.openonload.org/download/sysjitter/
3. Build and run it with:

# ./sysjitter --runtime 60 300

NOTE: I was debugging this issue and will post the information that I have as a comment. I'm creating this BZ because I'll go out on vacation for 3 weeks. It's not a huge deal to wait for me to come back to keep debugging this (I'd actually love to do this)

Comment 1 Luiz Capitulino 2016-06-20 21:02:47 UTC
According to my debugging, the problem is that the ktimersoftd kernel thread preempts the sysjitter thread and executes for +40us. The ktimersoftd thread spends most of this 40us in the outer loop of __run_timers() where it runs almost 10K iterations:

       sysjitter-15844 [015] [015] 26121.947647: sched_switch:         prev_comm=sysjitter prev_pid=15844 prev_prio=120 prev_state=R+ ==> next_comm=kworker/15:1 next_pid=11873 next_prio=120
    kworker/15:1-11873 [015] [015] 26121.947653: sched_switch:         prev_comm=kworker/15:1 prev_pid=11873 prev_prio=120 prev_state=S ==> next_comm=sysjitter next_pid=15844 next_prio=120
       sysjitter-15844 [015] [015] 26131.931832: sched_switch:         prev_comm=sysjitter prev_pid=15844 prev_prio=120 prev_state=R+ ==> next_comm=ktimersoftd/15 next_pid=165 next_prio=98
  ktimersoftd/15-165   [015] [015] 26131.931833: funcgraph_entry:                   |  run_timer_softirq() {
  ktimersoftd/15-165   [015] [015] 26131.931833: bputs:                run_timer_softirq: calling irq_work
  ktimersoftd/15-165   [015] [015] 26131.931833: bputs:                run_timer_softirq: done irq_work
  ktimersoftd/15-165   [015] [015] 26131.931834: bputs:                run_timer_softirq: starting main loop
  ktimersoftd/15-165   [015] [015] 26131.931876: timer_expire_entry:   timer=0xffff88086d3d3da0 function=delayed_work_timer_fn now=4320798976
  ktimersoftd/15-165   [015] [015] 26131.931877: timer_expire_exit:    timer=0xffff88086d3d3da0
  ktimersoftd/15-165   [015] [015] 26131.931878: bprint:               run_timer_softirq: exiting main loop: inner 1 outer: 9986

See how it takes 42us between the printing of "starting main loop" (which is just before the first while() loop in __run_timers()) and the calling of delayed_work_timer_fn(). Also, before exiting we see that the first while loop was iterated 9986 times.

Comment 2 Luiz Capitulino 2016-08-30 16:07:39 UTC
Moving to 7.4. This looks like a real issue for real-time, but we have no known reports of this issue and it doesn't affect our cyclictest test-cases.

Comment 3 Luiz Capitulino 2016-09-09 17:33:12 UTC
I can't reproduce this issue with kernel-rt-3.10.0-505.rt56.409. Latencies are max=16us. Closing as WORKSFORME (which is strange, since I opened the BZ myself).


Note You need to log in before you can comment on or make changes to this bug.