RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2218682 - bpf_jit_limit hit again - copy_seccomp() fix
Summary: bpf_jit_limit hit again - copy_seccomp() fix
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.2
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: rc
: ---
Assignee: Viktor Malik
QA Contact: Ziqian SUN (Zamir)
URL:
Whiteboard:
Depends On:
Blocks: 2226945
TreeView+ depends on / blocked
 
Reported: 2023-06-29 20:02 UTC by Peter Hunt
Modified: 2023-11-07 11:07 UTC (History)
13 users (show)

Fixed In Version: kernel-5.14.0-342.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2219567 2226945 (view as bug list)
Environment:
Last Closed: 2023-11-07 08:48:41 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pehunt: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src/kernel centos-stream-9 merge_requests 2801 0 None opened seccomp: Move copy_seccomp() to no failure path. 2023-07-12 08:40:45 UTC
Red Hat Issue Tracker OCPBUGS-15043 0 None None None 2023-06-29 20:02:18 UTC
Red Hat Issue Tracker RHELPLAN-161218 0 None None None 2023-06-29 20:02:54 UTC
Red Hat Product Errata RHSA-2023:6583 0 None None None 2023-11-07 08:49:34 UTC

Description Peter Hunt 2023-06-29 20:02:18 UTC
Description of problem:
sibling to https://issues.redhat.com/browse/OCPBUGS-15043

similar to https://bugzilla.redhat.com/show_bug.cgi?id=2140163 except, the issue is hit again in openshift 4.13.3 which uses kernel 5.14.0-284.16.1.el9_2 as well as openshift 4.12.15 (which uses kernel 4.18.0-372.52.1.el8_6) 

I can't find a bug for 9.2 version, so I'm not sure if it's fixed already, but I'd like one to track regardless. I also wonder if a comparatively newer patch (https://lore.kernel.org/bpf/20230321170925.74358-1-kuniyu@amazon.com/) is relevant and what the status of backport is for that as well.

Version-Release number of selected component (if applicable):
5.14.0-284.16.1.el9_2
4.18.0-372.52.1.el8_6

How reproducible:
from the issue above

```
Currently, they are unable to spin up the pods on this specific worker.
Although they do not see any resource overcommitment in the node describe.
```


Steps to Reproduce:
1. This worker node has 228 running pods
2. 
3.

Actual results:

```
When customers tried to start deployment, and that deployments tried to run pods on worker number 6, those pods entered "CreateContainerError". Upon checking the events of these pods, all presented the error:

"runc create failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524".
```


Expected results:

from the jira
```
Container should not be stuck on this error since this issue was addressed in 4.12.2 errata earlier 

https://issues.redhat.com/browse/OCPBUGS-2637
https://issues.redhat.com/browse/RUN-1668
https://access.redhat.com/errata/RHBA-2023:0568
-> OCPBUGS-6981 - error 524 from seccomp(2) when trying to load filter [rhel-8.6.0.z]
```

interestingly, the kernel version listed in the bug above is newer than the one in 4.12.15, though the bug apparently went through errata. It may be an rhcos packaging problem, but wanted to open here to track el9 version as well


Additional info:

Comment 1 Viktor Malik 2023-07-03 10:51:45 UTC
Hi Peter,

the above patch [1] that you mention looks like it could resolve the issue. It has been recently backported to CentOS Stream 9 as a part of our regular BPF subsystem rebase and will appear in RHEL 9.3. So, in case we confirm that it is the necessary fix, we will need to backport it to 8.6 and 9.2 z-streams.

I crafted a Brew build for 9.2z with [1] included, so that we can test it:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=53691694

@Peter if you or someone could use this to check if it resolves the problem, that would be great. I'll post it to the original OCP Jira issue, too.

FWIW, the 5.14.0-284.16.1.el9_2 kernel also suffers from a memleak introduced by upstream commit [2]. There is a fix for it already [3], so we should backport that one to 9.2 z-stream, too.
But since this issue appears on 4.18.0-372.52.1.el8_6, too, which doesn't have [2], I'm fairly confident that we'll need to backport [1] anyways.

[1] https://github.com/torvalds/linux/commit/10ec8ca8ec1a2f04c4ed90897225231c58c124a7
[2] https://github.com/torvalds/linux/commit/3a15fb6ed92cb32b0a83f406aa4a96f28c9adbc3
[3] https://github.com/torvalds/linux/commit/a1140cb215fa13dcec06d12ba0c3ee105633b7c4

Comment 2 Viktor Malik 2023-07-03 20:44:05 UTC
I also crafted a Brew build for 8.6z with the mentioned fix included, in case it helps with testing:
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=53692906

Comment 5 Viktor Malik 2023-07-04 11:21:45 UTC
Since the same issue likely affects RHEL 8, too, I created a copy of this bug for it: bz#2219567.

Comment 27 Viktor Malik 2023-07-11 11:37:02 UTC
Since this bug is for RHEL9, I'm going to use it to backport the memleak fix a1140cb215fa ("seccomp: Move copy_seccomp() to no failure path.") into 9.3.

Comment 42 errata-xmlrpc 2023-11-07 08:48:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6583


Note You need to log in before you can comment on or make changes to this bug.