Bug 1837816
| Summary: | NMI backtrace for cpu 0 when run oslat | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Qiao Zhao <qzhao> |
| Component: | kernel-rt | Assignee: | Juri Lelli <jlelli> |
| kernel-rt sub component: | Core-Kernel | QA Contact: | Qiao Zhao <qzhao> |
| Status: | CLOSED NOTABUG | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | bhu, lgoncalv, peterx, qzhao, rt-qe, williams |
| Version: | 8.2 | Flags: | pm-rhel:
mirror+
|
| Target Milestone: | rc | ||
| Target Release: | 8.3 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-05-28 01:09:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Qiao Zhao
2020-05-20 04:03:23 UTC
> Steps to Reproduce:
> 1. Clone oslat https://github.com/xzpeter/oslat
> 2. Compile and run oslat with "./oslat --cpu-list 0,1 --rtprio 1 --runtime
> 10" or any cpu-list on CPU0
> 3. Check console log/dmesg can saw call trace.
Oh, so this is Peter's new sysjitter like tool, isn't it?
I've got the impression that running it on non-isolated CPUs
(is this the case?) might simply hog system threads (e.g., RCU
cb, etc.).
Reading github README it seems that the tool is intended to be
run on fully isolated CPUs. Peter, is my impression correct?
Would you expect something like what's described in this BZ to
happen if oslat threads are run on housekeeping/non-isolated
CPUs?
Thanks!
(In reply to Juri Lelli from comment #4) > > Steps to Reproduce: > > 1. Clone oslat https://github.com/xzpeter/oslat > > 2. Compile and run oslat with "./oslat --cpu-list 0,1 --rtprio 1 --runtime > > 10" or any cpu-list on CPU0 > > 3. Check console log/dmesg can saw call trace. > > Oh, so this is Peter's new sysjitter like tool, isn't it? Yes, it's Peter's new tool oslat (old name sysjitter2) > > I've got the impression that running it on non-isolated CPUs > (is this the case?) might simply hog system threads (e.g., RCU > cb, etc.). > > Reading github README it seems that the tool is intended to be > run on fully isolated CPUs. Peter, is my impression correct? > Would you expect something like what's described in this BZ to > happen if oslat threads are run on housekeeping/non-isolated > CPUs? > > Thanks! Qiao, Firstly I don't think we can run oslat (or any program) using FIFO:99, e.g., the ksoftirqd threads will be using FIFO:2 on most kernels, so FIFO:99 will mean none of the ksoftirqd (and other kernel threads that we still want) to be scheduled properly. So when running these tests, I'd always be using FIFO:1. Meanwhile, there's indeed a tricky point in oslat that we can't run FIFO priority (even fifo:1) on cpu0 because cpu0 is by default used by the main thread of oslat to receive stop signals. So "--rtprio N --cpu-list 0,..." used together means it'll probably hang forever because the stopping signal is not gonna received correctly. I've also added a warning into oslat tool (v0.1.1 pushed) to show that it's expected. Let's just run without cpu0 for simplicity, or you can lift the main thread priority too to make it work again: sudo chrt -f 2 ./oslat --rtprio 1 --cpu-list 0 --runtime 1 Then the main thread will be lifted to fifo:2 then the SIGARLM will still get its way. (In reply to Peter Xu from comment #6) > Qiao, > > Firstly I don't think we can run oslat (or any program) using FIFO:99, e.g., > the ksoftirqd threads will be using FIFO:2 on most kernels, so FIFO:99 will > mean none of the ksoftirqd (and other kernel threads that we still want) to > be scheduled properly. So when running these tests, I'd always be using > FIFO:1. > > Meanwhile, there's indeed a tricky point in oslat that we can't run FIFO > priority (even fifo:1) on cpu0 because cpu0 is by default used by the main > thread of oslat to receive stop signals. So "--rtprio N --cpu-list 0,..." > used together means it'll probably hang forever because the stopping signal > is not gonna received correctly. I've also added a warning into oslat tool > (v0.1.1 pushed) to show that it's expected. Let's just run without cpu0 for > simplicity, or you can lift the main thread priority too to make it work > again: > > sudo chrt -f 2 ./oslat --rtprio 1 --cpu-list 0 --runtime 1 > > Then the main thread will be lifted to fifo:2 then the SIGARLM will still > get its way. Hi Peter, Thanks for your explanation and fix/update code, now the results/format looks good. (v0.1.3) :) - Qiao (In reply to Qiao Zhao from comment #7) > (In reply to Peter Xu from comment #6) > > Qiao, > > > > Firstly I don't think we can run oslat (or any program) using FIFO:99, e.g., > > the ksoftirqd threads will be using FIFO:2 on most kernels, so FIFO:99 will > > mean none of the ksoftirqd (and other kernel threads that we still want) to > > be scheduled properly. So when running these tests, I'd always be using > > FIFO:1. > > > > Meanwhile, there's indeed a tricky point in oslat that we can't run FIFO > > priority (even fifo:1) on cpu0 because cpu0 is by default used by the main > > thread of oslat to receive stop signals. So "--rtprio N --cpu-list 0,..." > > used together means it'll probably hang forever because the stopping signal > > is not gonna received correctly. I've also added a warning into oslat tool > > (v0.1.1 pushed) to show that it's expected. Let's just run without cpu0 for > > simplicity, or you can lift the main thread priority too to make it work > > again: > > > > sudo chrt -f 2 ./oslat --rtprio 1 --cpu-list 0 --runtime 1 > > > > Then the main thread will be lifted to fifo:2 then the SIGARLM will still > > get its way. > > Hi Peter, > > Thanks for your explanation and fix/update code, now the results/format > looks good. (v0.1.3) :) Good. Should we close this then? Thanks! (In reply to Juri Lelli from comment #8) > (In reply to Qiao Zhao from comment #7) > > (In reply to Peter Xu from comment #6) > > > Qiao, > > > > > > Firstly I don't think we can run oslat (or any program) using FIFO:99, e.g., > > > the ksoftirqd threads will be using FIFO:2 on most kernels, so FIFO:99 will > > > mean none of the ksoftirqd (and other kernel threads that we still want) to > > > be scheduled properly. So when running these tests, I'd always be using > > > FIFO:1. > > > > > > Meanwhile, there's indeed a tricky point in oslat that we can't run FIFO > > > priority (even fifo:1) on cpu0 because cpu0 is by default used by the main > > > thread of oslat to receive stop signals. So "--rtprio N --cpu-list 0,..." > > > used together means it'll probably hang forever because the stopping signal > > > is not gonna received correctly. I've also added a warning into oslat tool > > > (v0.1.1 pushed) to show that it's expected. Let's just run without cpu0 for > > > simplicity, or you can lift the main thread priority too to make it work > > > again: > > > > > > sudo chrt -f 2 ./oslat --rtprio 1 --cpu-list 0 --runtime 1 > > > > > > Then the main thread will be lifted to fifo:2 then the SIGARLM will still > > > get its way. > > > > Hi Peter, > > > > Thanks for your explanation and fix/update code, now the results/format > > looks good. (v0.1.3) :) > > Good. Should we close this then? > > Thanks! Sure, we can close this bug as not a bug. - Qiao |