RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1886109 - BUG: using smp_processor_id() in preemptible [00000000] code: handler106/3082 [rhel-rt-8.4.0]
Summary: BUG: using smp_processor_id() in preemptible [00000000] code: handler106/3082...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: kernel-rt
Version: 8.4
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 8.0
Assignee: Juri Lelli
QA Contact: Rona Gugliemino
URL:
Whiteboard:
: 1887805 (view as bug list)
Depends On: 1888237
Blocks: 1883636 1898189
TreeView+ depends on / blocked
 
Reported: 2020-10-07 16:47 UTC by Juri Lelli
Modified: 2024-06-13 23:11 UTC (History)
32 users (show)

Fixed In Version: kernel-rt-4.18.0-255.rt7.20.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1888237 (view as bug list)
Environment:
Last Closed: 2021-05-18 15:12:49 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1885850 1 None None None 2023-09-28 08:04:42 UTC

Description Juri Lelli 2020-10-07 16:47:10 UTC
Description of problem:
When running oslat tests on RHCOS 4.6 (openshift) lots of dmesg BUG splats
like the following are printed:

[100848.465045] BUG: using smp_processor_id() in preemptible [00000000] code: handler106/3082
[100848.465047] caller is flow_lookup.isra.15+0x2c/0xf0 [openvswitch]
[100848.465049] CPU: 46 PID: 3082 Comm: handler106 Not tainted 4.18.0-193.24.1.rt13.74.el8_2.dt1.x86_64 #1
[100848.465049] Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.5.4 01/13/2020
[100848.465049] Call Trace:
[100848.465051]  dump_stack+0x5c/0x80
[100848.465052]  check_preemption_disabled+0xc4/0xd0
[100848.465054]  flow_lookup.isra.15+0x2c/0xf0 [openvswitch]
[100848.465057]  ovs_flow_tbl_lookup+0x3b/0x60 [openvswitch]
[100848.465060]  ovs_flow_cmd_new+0x2d8/0x430 [openvswitch]
[100848.465061]  ? __switch_to_asm+0x35/0x70
[100848.465062]  ? __switch_to_asm+0x41/0x70
[100848.465063]  ? __switch_to_asm+0x35/0x70
[100848.465067]  genl_family_rcv_msg+0x1d7/0x410
[100848.465069]  ? migrate_enable+0x123/0x3a0
[100848.465071]  genl_rcv_msg+0x47/0x8c
[100848.465072]  ? __kmalloc_node_track_caller+0xff/0x2e0
[100848.465074]  ? genl_family_rcv_msg+0x410/0x410
[100848.465075]  netlink_rcv_skb+0x4c/0x120
[100848.465077]  genl_rcv+0x24/0x40
[100848.465078]  netlink_unicast+0x197/0x230
[100848.465080]  netlink_sendmsg+0x204/0x3d0
[100848.465081]  sock_sendmsg+0x4c/0x50
[100848.465082]  ___sys_sendmsg+0x29f/0x300
[100848.465084]  ? migrate_enable+0x123/0x3a0
[100848.465085]  ? ep_send_events_proc+0x8a/0x1f0
[100848.465087]  ? ep_scan_ready_list.constprop.23+0x237/0x260
[100848.465087]  ? rt_spin_unlock+0x23/0x40
[100848.465089]  ? ep_poll+0x1b3/0x390
[100848.465090]  ? __fget+0x72/0xa0
[100848.465092]  __sys_sendmsg+0x57/0xa0
[100848.465093]  do_syscall_64+0x87/0x1a0
[100848.465094]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[100848.465095] RIP: 0033:0x7f1ed72ccb07
[100848.465096] Code: c3 66 90 41 54 41 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 eb ec ff ff 44 89 e2 48 89 ee 89 df 41 89 c0 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 48 89 44 24 08 e8 24 ed ff ff 48
[100848.465097] RSP: 002b:00007f1ecbd9ba80 EFLAGS: 00003293 ORIG_RAX: 000000000000002e
[100848.465098] RAX: ffffffffffffffda RBX: 000000000000007b RCX: 00007f1ed72ccb07
[100848.465098] RDX: 0000000000000000 RSI: 00007f1ecbd9bb10 RDI: 000000000000007b
[100848.465098] RBP: 00007f1ecbd9bb10 R08: 0000000000000000 R09: 00007f1ecbd9d390
[100848.465099] R10: 0000000019616156 R11: 0000000000003293 R12: 0000000000000000
[100848.465099] R13: 00007f1ecbd9d338 R14: 00007f1ecbd9bfb0 R15: 00007f1ecbd9bb10

This seems to be caused by net/openvswitch/flow_table::flow_lookup accessing per-cpu
data on preemptible (and migratable) sections.

How reproducible:
Always

Steps to Reproduce:
1. Configure OCP 4.6 worker node with RT kernel
2. Enable stalld
3. Run oslat tests with stalld enabled

Additional info:
This was seen while working on BZ1885850

Potentially introduced by upstream commit eac87c413bf97 ("net: openvswitch:
reorder masks array based on usage")

Comment 1 Martin Sivák 2020-10-08 14:03:30 UTC
This is currently a blocker for the OCP Telco effort. It causes extreme latency spikes on our 4.6 test cluster.

Comment 2 Andrew Theurer 2020-10-08 14:17:23 UTC
> Potentially introduced by upstream commit eac87c413bf97 ("net: openvswitch:
> reorder masks array based on usage")

This is a very important performance enhancement for OVS.  If this is a problem, we need to find a way to keep this optimization if we can.

Comment 3 Juri Lelli 2020-10-08 14:21:18 UTC
(In reply to Andrew Theurer from comment #2)
> > Potentially introduced by upstream commit eac87c413bf97 ("net: openvswitch:
> > reorder masks array based on usage")
> 
> This is a very important performance enhancement for OVS.  If this is a
> problem, we need to find a way to keep this optimization if we can.

I'm thinking this below might be acceptable

--->8---
    net: openvswitch: Fix using smp_processor_id() in preemptible code

    Signed-off-by: Juri Lelli <juri.lelli>
---
 net/openvswitch/flow_table.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index aff660fe3a141..88b03cfdec9ec 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -706,11 +706,14 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
                                   u32 *n_mask_hit,
                                   u32 *index)
 {
-       u64 *usage_counters = this_cpu_ptr(ma->masks_usage_cntr);
+       u64 *usage_counters;
        struct sw_flow *flow;
        struct sw_flow_mask *mask;
        int i;

+       get_cpu_light();
+       usage_counters = this_cpu_ptr(ma->masks_usage_cntr);
+
        if (likely(*index < ma->max)) {
                mask = rcu_dereference_ovsl(ma->masks[*index]);
                if (mask) {
@@ -719,6 +722,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
                                u64_stats_update_begin(&ma->syncp);
                                usage_counters[*index]++;
                                u64_stats_update_end(&ma->syncp);
+                               put_cpu_light();
                                return flow;
                        }
                }
@@ -739,10 +743,12 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
                        u64_stats_update_begin(&ma->syncp);
                        usage_counters[*index]++;
                        u64_stats_update_end(&ma->syncp);
+                       put_cpu_light();
                        return flow;
                }
        }

+       put_cpu_light();
        return NULL;
 }

Comment 4 Eelco Chaudron 2020-10-09 12:02:47 UTC
(In reply to Juri Lelli from comment #3)
> (In reply to Andrew Theurer from comment #2)
> > > Potentially introduced by upstream commit eac87c413bf97 ("net: openvswitch:
> > > reorder masks array based on usage")
> > 
> > This is a very important performance enhancement for OVS.  If this is a
> > problem, we need to find a way to keep this optimization if we can.
> 
> I'm thinking this below might be acceptable
> 
> --->8---
>     net: openvswitch: Fix using smp_processor_id() in preemptible code
> 
>     Signed-off-by: Juri Lelli <juri.lelli>
> ---
>  net/openvswitch/flow_table.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index aff660fe3a141..88b03cfdec9ec 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -706,11 +706,14 @@ static struct sw_flow *flow_lookup(struct flow_table
> *tbl,
>                                    u32 *n_mask_hit,
>                                    u32 *index)
>  {
> -       u64 *usage_counters = this_cpu_ptr(ma->masks_usage_cntr);
> +       u64 *usage_counters;
>         struct sw_flow *flow;
>         struct sw_flow_mask *mask;
>         int i;
> 
> +       get_cpu_light();
> +       usage_counters = this_cpu_ptr(ma->masks_usage_cntr);
> +
>         if (likely(*index < ma->max)) {
>                 mask = rcu_dereference_ovsl(ma->masks[*index]);
>                 if (mask) {
> @@ -719,6 +722,7 @@ static struct sw_flow *flow_lookup(struct flow_table
> *tbl,
>                                 u64_stats_update_begin(&ma->syncp);
>                                 usage_counters[*index]++;
>                                 u64_stats_update_end(&ma->syncp);
> +                               put_cpu_light();
>                                 return flow;
>                         }
>                 }
> @@ -739,10 +743,12 @@ static struct sw_flow *flow_lookup(struct flow_table
> *tbl,
>                         u64_stats_update_begin(&ma->syncp);
>                         usage_counters[*index]++;
>                         u64_stats_update_end(&ma->syncp);
> +                       put_cpu_light();
>                         return flow;
>                 }
>         }
> 
> +       put_cpu_light();
>         return NULL;
>  }

I have no experience with the RT kernel, but reading a bit into it, I guess the suggested change is ok.

Comment 5 Juri Lelli 2020-10-09 12:50:57 UTC
Thanks for having a look.

Patch posted upstream:

https://marc.info/?l=linux-rt-users&m=160224770202585&w=2

Comment 6 Juri Lelli 2020-10-12 06:13:40 UTC
(In reply to Eelco Chaudron from comment #4)

[...]

> 
> I have no experience with the RT kernel, but reading a bit into it, I guess
> the suggested change is ok.

Interesting. Sebastian (on the upstream discussion) identified additional
potential problems with the commit identified in comment#0:

https://lore.kernel.org/lkml/20201009154116.a4fcrrm7flxonidd@linutronix.de/

Eelco, would you be up for following up on this? I will have a look as well,
but I guess you'll be quicker to respond, as most probably more familiar with
the code and its associated locking scheme.

Thanks,
Juri

Comment 7 Eelco Chaudron 2020-10-12 08:17:20 UTC
(In reply to Juri Lelli from comment #6)

Replied upstream: https://lore.kernel.org/lkml/65BBD0B4-2A74-421A-BF81-357CD5F84747@redhat.com/

Comment 8 Eelco Chaudron 2020-10-13 12:48:21 UTC
Looks like we could fix this with a general fix, see the following post:

https://lore.kernel.org/netdev/160259304349.181017.7492443293310262978.stgit@ebuild/T/#u

Juri can you build an RT kernel with this patch to make sure no more splats happen?

Comment 9 Juri Lelli 2020-10-13 16:21:56 UTC
RT 8.2 kernel build started:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=31946895

I had to slightly modify your change as 8.2 doesn't seem to have n_cache_hit variable.

Also, I noticed that Sebastian had additional questions upstream (but I started the
build anyway to try to buy us some time).

BTW, as the eventual fix should apply to kernel (and reach RHEL-RT via RHEL), should
I be changing the component to networking and assign the BZ to you? I will of course
continue monitoring and helping with builds and testing.

Comment 10 Marcelo Ricardo Leitner 2020-10-13 21:15:56 UTC
*** Bug 1887805 has been marked as a duplicate of this bug. ***

Comment 11 Eelco Chaudron 2020-10-14 09:52:41 UTC
(In reply to Juri Lelli from comment #9)
> Also, I noticed that Sebastian had additional questions upstream (but I
> started the
> build anyway to try to buy us some time).

Need some additional fix to solve the u64 sync issue, which is only related to 32-bit architectures. I will work on this today and send an updated patch upstream.

> BTW, as the eventual fix should apply to kernel (and reach RHEL-RT via
> RHEL), should
> I be changing the component to networking and assign the BZ to you? I will
> of course
> continue monitoring and helping with builds and testing.

Not sure how the RHEL-RT procedure goes. I think the easiest way is to create a clone of this BZ for RHEL where I will fix the issue and that this BZ depends on it. So you get notified when RT changes are needed? If this is all not necessary then assign the BZ to me.

Comment 12 Juri Lelli 2020-10-14 13:05:43 UTC
(In reply to Eelco Chaudron from comment #11)
> (In reply to Juri Lelli from comment #9)
> > Also, I noticed that Sebastian had additional questions upstream (but I
> > started the
> > build anyway to try to buy us some time).
> 
> Need some additional fix to solve the u64 sync issue, which is only related
> to 32-bit architectures. I will work on this today and send an updated patch
> upstream.
> 
> > BTW, as the eventual fix should apply to kernel (and reach RHEL-RT via
> > RHEL), should
> > I be changing the component to networking and assign the BZ to you? I will
> > of course
> > continue monitoring and helping with builds and testing.
> 
> Not sure how the RHEL-RT procedure goes. I think the easiest way is to
> create a clone of this BZ for RHEL where I will fix the issue and that this
> BZ depends on it. So you get notified when RT changes are needed? If this is
> all not necessary then assign the BZ to me.

OK, thanks for driving this!

I cloned the bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1888237

Comment 13 Andrew Theurer 2020-10-14 23:10:00 UTC
Unfortunately we can't use the kernel packages from brew in comment #9 because it has install deps which are not possible in RHCOS.  The only packages installed for RHCOS for RT are:

kernel-rt-core
kernel-rt-modules
kernel-rt-modules-extra
kernel-rt-kvm

These normally require other RPMs like:

Resolving dependencies... done
error: Could not depsolve transaction; 7 problems detected:
 Problem 1: conflicting requests
 - nothing provides hdparm needed by tuned-2.14.0-4.el8.noarch
 - nothing provides python3-configobj needed by tuned-2.14.0-4.el8.noarch
 - nothing provides python3-dbus needed by tuned-2.14.0-4.el8.noarch
 - nothing provides python3-decorator needed by tuned-2.14.0-4.el8.noarch
 - nothing provides python3-gobject-base needed by tuned-2.14.0-4.el8.noarch
 - nothing provides python3-linux-procfs needed by tuned-2.14.0-4.el8.noarch
 - nothing provides python3-perf needed by tuned-2.14.0-4.el8.noarch
 - nothing provides python3-pyudev needed by tuned-2.14.0-4.el8.noarch
 - nothing provides python3-schedutils needed by tuned-2.14.0-4.el8.noarch
 - nothing provides python3-syspurpose needed by tuned-2.14.0-4.el8.noarch
 - nothing provides virt-what needed by tuned-2.14.0-4.el8.noarch
 Problem 2: conflicting requests
 - nothing provides squashfs-tools needed by dracut-squash-049-53.git20191001.el8.x86_64
 Problem 3: conflicting requests
 - nothing provides python3-ethtool needed by tuna-0.14-4.el8.noarch
 - nothing provides python3-linux-procfs >= 0.6 needed by tuna-0.14-4.el8.noarch
 - nothing provides python3-schedutils >= 0.6 needed by tuna-0.14-4.el8.noarch
 Problem 4: package tuned-profiles-realtime-2.14.0-4.el8.noarch requires tuna, but none of the providers can be installed
 - conflicting requests
 - nothing provides python3-ethtool needed by tuna-0.14-4.el8.noarch
 - nothing provides python3-linux-procfs >= 0.6 needed by tuna-0.14-4.el8.noarch
 - nothing provides python3-schedutils >= 0.6 needed by tuna-0.14-4.el8.noarch
 Problem 5: package rt-setup-2.1-2.el8.x86_64 requires tuna, but none of the providers can be installed
 - conflicting requests
 - nothing provides python3-ethtool needed by tuna-0.14-4.el8.noarch
 - nothing provides python3-linux-procfs >= 0.6 needed by tuna-0.14-4.el8.noarch
 - nothing provides python3-schedutils >= 0.6 needed by tuna-0.14-4.el8.noarch
 Problem 6: package kexec-tools-2.0.20-20.el8.x86_64 requires dracut-squash >= 049, but none of the providers can be installed
 - conflicting requests
 - nothing provides squashfs-tools needed by dracut-squash-049-53.git20191001.el8.x86_64
 Problem 7: package kernel-rt-4.18.0-193.24.1.rt13.74.el8_2.1886109.test.cki.kt0.x86_64 requires rt-setup, but none of the providers can be installed
 - package rt-setup-2.1-2.el8.x86_64 requires tuna, but none of the providers can be installed
 - conflicting requests
 - nothing provides python3-ethtool needed by tuna-0.14-4.el8.noarch
 - nothing provides python3-linux-procfs >= 0.6 needed by tuna-0.14-4.el8.noarch
 - nothing provides python3-schedutils >= 0.6 needed by tuna-0.14-4.el8.noarch

I suspect there is a slightly different build process for kernel-rt for RHCOS, which ensures packages like tuned, tuna, tuned-profiles-resltime, etc are not deps.  Is anyone aware of this?

Comment 14 Juri Lelli 2020-10-15 05:18:55 UTC
(In reply to Andrew Theurer from comment #13)
> Unfortunately we can't use the kernel packages from brew in comment #9
> because it has install deps which are not possible in RHCOS.  The only
> packages installed for RHCOS for RT are:
> 
> kernel-rt-core
> kernel-rt-modules
> kernel-rt-modules-extra
> kernel-rt-kvm
> 
> These normally require other RPMs like:
> 
> Resolving dependencies... done
> error: Could not depsolve transaction; 7 problems detected:
>  Problem 1: conflicting requests
>  - nothing provides hdparm needed by tuned-2.14.0-4.el8.noarch
>  - nothing provides python3-configobj needed by tuned-2.14.0-4.el8.noarch
>  - nothing provides python3-dbus needed by tuned-2.14.0-4.el8.noarch
>  - nothing provides python3-decorator needed by tuned-2.14.0-4.el8.noarch
>  - nothing provides python3-gobject-base needed by tuned-2.14.0-4.el8.noarch
>  - nothing provides python3-linux-procfs needed by tuned-2.14.0-4.el8.noarch
>  - nothing provides python3-perf needed by tuned-2.14.0-4.el8.noarch
>  - nothing provides python3-pyudev needed by tuned-2.14.0-4.el8.noarch
>  - nothing provides python3-schedutils needed by tuned-2.14.0-4.el8.noarch
>  - nothing provides python3-syspurpose needed by tuned-2.14.0-4.el8.noarch
>  - nothing provides virt-what needed by tuned-2.14.0-4.el8.noarch
>  Problem 2: conflicting requests
>  - nothing provides squashfs-tools needed by
> dracut-squash-049-53.git20191001.el8.x86_64
>  Problem 3: conflicting requests
>  - nothing provides python3-ethtool needed by tuna-0.14-4.el8.noarch
>  - nothing provides python3-linux-procfs >= 0.6 needed by
> tuna-0.14-4.el8.noarch
>  - nothing provides python3-schedutils >= 0.6 needed by
> tuna-0.14-4.el8.noarch
>  Problem 4: package tuned-profiles-realtime-2.14.0-4.el8.noarch requires
> tuna, but none of the providers can be installed
>  - conflicting requests
>  - nothing provides python3-ethtool needed by tuna-0.14-4.el8.noarch
>  - nothing provides python3-linux-procfs >= 0.6 needed by
> tuna-0.14-4.el8.noarch
>  - nothing provides python3-schedutils >= 0.6 needed by
> tuna-0.14-4.el8.noarch
>  Problem 5: package rt-setup-2.1-2.el8.x86_64 requires tuna, but none of the
> providers can be installed
>  - conflicting requests
>  - nothing provides python3-ethtool needed by tuna-0.14-4.el8.noarch
>  - nothing provides python3-linux-procfs >= 0.6 needed by
> tuna-0.14-4.el8.noarch
>  - nothing provides python3-schedutils >= 0.6 needed by
> tuna-0.14-4.el8.noarch
>  Problem 6: package kexec-tools-2.0.20-20.el8.x86_64 requires dracut-squash
> >= 049, but none of the providers can be installed
>  - conflicting requests
>  - nothing provides squashfs-tools needed by
> dracut-squash-049-53.git20191001.el8.x86_64
>  Problem 7: package
> kernel-rt-4.18.0-193.24.1.rt13.74.el8_2.1886109.test.cki.kt0.x86_64 requires
> rt-setup, but none of the providers can be installed
>  - package rt-setup-2.1-2.el8.x86_64 requires tuna, but none of the
> providers can be installed
>  - conflicting requests
>  - nothing provides python3-ethtool needed by tuna-0.14-4.el8.noarch
>  - nothing provides python3-linux-procfs >= 0.6 needed by
> tuna-0.14-4.el8.noarch
>  - nothing provides python3-schedutils >= 0.6 needed by
> tuna-0.14-4.el8.noarch
> 
> I suspect there is a slightly different build process for kernel-rt for
> RHCOS, which ensures packages like tuned, tuna, tuned-profiles-resltime, etc
> are not deps.  Is anyone aware of this?

I believe I successfully installed a scratch kernel-rt by doing something like the
following in a worker node:

$ sudo rpm-ostree reset
$ sudo rpm-ostree override remove kernel{,-core,-modules,-modules-extra} --install http://brew-task-repos.usersys.redhat.com/repos/scratch/jlelli/kernel-rt/4.18.0/193.24.1.rt13.74.el8_2.1886109.test.cki.kt0/x86_64/kernel-rt-core-4.18.0-193.24.1.rt13.74.el8_2.1886109.test.cki.kt0.x86_64.rpm --install ...
$ sudo systemctl reboot

Not sure though if the other packages were installed already though.

Also, I now see

> kernel-rt-4.18.0-193.24.1.rt13.74.el8_2.1886109.test.cki.kt0.x86_64 requires ...

Are trying to install kernel-rt-4.18.0-193.24.1.rt13.74.el8_2.1886109.test.cki.kt0.x86_64
as well (the "main" package)? We don't need that to test a new kernel and that one
usually brings in the other deps.

Comment 15 Troy Wilson 2020-10-15 18:07:14 UTC
Using this kernel on OCP 4.6.0-rc.2 with downstream PAO, and stalld running with an interval of 30, oslat still has unacceptable latency (~47 us max) in even a short 10-minute test, but that is a huge improvement from the 17k us (stock kernel from PAO, stalld enabled) and 3k us (stock kernel from PAO, stalld disabled) latencies we had been seeing.



uid: 0 ****
allowed cpu list: 4-22
oslat 4.18.0-193.24.1.rt13.74.el8_2.1886109.test.cki.kt0.x86_64
Cloning into 'oslat'...
cc -O2 -Wall -c -o main.o main.c
cc -O2 -Wall -c -o rt-utils.o rt-utils.c
cc -O2 -Wall -c -o error.o error.c
cc -O2 -Wall -c -o trace.o trace.c
cc -o oslat -lpthread -lnuma -lm main.o rt-utils.o error.o trace.o
new cpu list: 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22
cmd to run: oslat --runtime 600 --rtprio 1 --cpu-list 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22 --cpu-main-thread 4
Version: v0.1.7
Total runtime: 600 seconds
Thread priority: SCHED_FIFO:1
CPU list: 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22
CPU for main thread: 4
Workload: no
Workload mem: 0 (KiB)
Preheat cores: 18
Pre-heat for 1 seconds...
Test starts...
Test completed.
Core:	5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
CPU Freq:	2590 2590 2590 2590 2590 2597 2590 2590 2590 2597 2590 2590 2590 2597 2590 2590 2590 2597 (Mhz)
001 (us):	23035708776 23032836891 23038885569 23040713165 23043529386 23052157548 23043740675 23041749321 23035458992 23052772087 23036229957 23055929040 23033220514 22754539032 23042830347 23040469655 23041587872 23040229132
002 (us):	557607 592739 590590 597392 594439 596796 594056 597734 594503 595716 595112 596917 592570 597233 594496 595650 595140 592852
003 (us):	43752 8379 10528 4120 6640 4314 6883 3392 7187 5551 5827 4617 8680 3843 7009 6077 6158 8480
004 (us):	518 287 252 347 299 352 314 327 341 350 349 335 351 389 91 349 104 154
005 (us):	150 89 92 75 80 86 93 92 76 90 76 87 94 80 49 74 38 62
006 (us):	41 30 32 32 38 24 40 23 35 28 39 26 27 22 23 37 18 29
007 (us):	10 6 11 5 18 7 13 3 17 5 16 7 5 4 7 13 7 5
008 (us):	1 0 8 1 1 2 3 4 6 1 1 2 2 1 3 1 1 2
009 (us):	1 0 1 1 3 5 2 1 0 1 0 0 0 0 0 1 0 1
010 (us):	0 1 4 4 5 1 1 0 0 0 0 0 0 0 1 0 0 0
011 (us):	3 1 1 4 0 0 0 1 0 0 0 0 0 0 0 0 0 2
012 (us):	1 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
013 (us):	3 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
014 (us):	0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
015 (us):	0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0
016 (us):	0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
017 (us):	0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
018 (us):	0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
019 (us):	0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
020 (us):	0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
021 (us):	0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
022 (us):	0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
023 (us):	0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
024 (us):	0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
025 (us):	0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
026 (us):	0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0
027 (us):	0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 1
028 (us):	0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
029 (us):	0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
030 (us):	0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
031 (us):	0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
032 (us):	0 0 0 0 0 0 0 0 0 0 0 1 0 0 10 0 10 1 (including overflows)
Minimum:	1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (us)
Average:	1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 (us)
Maximum:	13 21 28 11 23 10 10 11 8 9 13 34 8 8 47 9 45 34 (us)
Max-Min:	12 20 27 10 22 9 9 10 7 8 12 33 7 7 46 8 44 33 (us)
Duration:	600.902 600.902 600.902 600.902 600.902 599.283 600.902 600.902 600.902 599.283 600.902 600.902 600.902 599.283 600.902 600.902 600.902 599.283 (sec)

Comment 40 Rona Gugliemino 2020-12-17 23:37:33 UTC
of_rules test passed after re-running 7 failed tests.

First run:
https://beaker.engineering.redhat.com/jobs/4891591

Re-run
https://beaker.engineering.redhat.com/jobs/4893133


"nightly" vsperf gating ci test passed
job:
https://beaker.engineering.redhat.com/jobs/4891593

results:
https://docs.google.com/spreadsheets/d/1J0xiaF9YxQTuJxh6z9FEQmDaIQrVZPmZgPD92sgxroU/edit#gid=984973794

A few issues that are being investigated:
1. no results seen for sr-iov part of test
2. rt-kernel is not installed on guest-vm. We are working on solving that problem currently.

Running "gating" vsperf ci test over the weekend. 
Marking bz Verified because it's not certain "gating" and not solely "nightly" vsperf test is needed in order to verify fix.

Comment 42 Matthew Secaur 2021-03-26 13:57:52 UTC
Hi,

Is this fix going to be backported to RHEL 8.2? I have a customer who is requesting it. If there is no plan for that, I will make the request.

Thanks!

Comment 43 Beth Uptagrafft 2021-03-26 14:16:12 UTC
(In reply to Matthew Secaur from comment #42)
> Hi,
> 
> Is this fix going to be backported to RHEL 8.2? I have a customer who is
> requesting it. If there is no plan for that, I will make the request.
> 
> Thanks!

Hi Matthew,
Thank you for asking, because I don't think it was on our radar. I set the ZTR to 8.2.0, so we can evaluate it. 

Thanks,
Beth

Comment 44 Juri Lelli 2021-03-26 14:30:24 UTC
Hi,

Hasn't this been already backported to 8.2.z via bz1893282 and thus fixed in
kernel-4.18.0-193.36.1.el8_2 and corresponding kernel-rt-4.18.0-193.36.1.rt13.86.el8_2 ?

Comment 53 errata-xmlrpc 2021-05-18 15:12:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel-rt security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1739


Note You need to log in before you can comment on or make changes to this bug.