Bug 1304001

Summary: Timer interrupt handling can take up to 20 usec on RT kernel
Product: Red Hat Enterprise Linux 7 Reporter: Andrew Theurer <atheurer>
Component: kernel-rtAssignee: Clark Williams <williams>
kernel-rt sub component: Memory Management QA Contact: Jiri Kastner <jkastner>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: medium    
Priority: unspecified CC: atheurer, bhu, jshortt, krister, lgoncalv, srostedt, williams
Version: 7.2   
Target Milestone: rc   
Target Release: 7.3   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-10 21:53:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1274397    

Description Andrew Theurer 2016-02-02 16:32:12 UTC
Description of problem:
Timer interrupt handling uses up to 20 usec

Version-Release number of selected component (if applicable):
3.10.0-327.rt56.204.el7.x86_64

How reproducible:


Steps to Reproduce:
1. run cpu bound process on isolcpu
2. trace for task switch
3. record time between task switches, where ksoftirqd is switched in/out

Actual results:
We see that cpu bound process is preempted for ksoftirqd, which does timer interrupt work, the switches back to cpu bound process.  This work can take up to 20 usec

Expected results:
Timer interrupt work should not switch thread context and process timer interrupt work much faster (perhaps 2-3 usec, not sure of exact amount)

Additional info:
This is currently observed with a DPDK program running at fifo:95

Comment 2 Steven Rostedt 2016-02-02 16:51:51 UTC
What exactly is the bound task doing? ksoftirqd does timer work, and a timer interrupt can run. And things like task usage accounting can take 20 us.

Comment 3 Steven Rostedt 2016-02-02 16:55:31 UTC
Also, have you added nohz_full and rcu_nocb to that isolated CPU as well? That could help too.

Comment 4 Andrew Theurer 2016-02-02 17:45:43 UTC
We have no_full, but not rcu_nocb yet, so we will add that.  I don't believe the user thread is doing anything other than some futex calls once in a while.  We'll trace to see if anything else is going on with the user thread.

Comment 5 Beth Uptagrafft 2016-03-08 17:56:33 UTC
Andrew, were you able to do a trace to see if anything else was going on?

Thanks,
Beth

Comment 6 Andrew Theurer 2016-03-08 19:13:38 UTC
Karl, would it be possible to do another trace to see what is happening during the timer interrupt?

Comment 7 Beth Uptagrafft 2016-03-28 14:25:57 UTC
Andrew/Karl, any updates?

Comment 8 Beth Uptagrafft 2016-05-10 19:51:23 UTC
Andrew/Karl, any updates? If we can't get any additional updates, we will need to close this BZ.

Comment 9 Andrew Theurer 2016-05-10 21:41:51 UTC
We have not done a trace, and I am not sure when we will have opportunity to do so.  We can close this and repoen when we have a trace.

Comment 10 Beth Uptagrafft 2016-05-10 21:53:27 UTC
Thank you, Andrew. As you suggest, we will close this issue. Please reopen the BZ if you do get a trace and we will take a look at it.