Bug 1656432

Summary: [cgroup bpf devices] BPF program is not properly freed
Product: Red Hat Enterprise Linux 8 Reporter: Pavel Hrdina <phrdina>
Component: kernelAssignee: Jiri Olsa <jolsa>
kernel sub component: BPF QA Contact: Ziqian SUN (Zamir) <zsun>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: high CC: bhu, ctrautma, jbenc, jbrouer, jhsiao, jolsa, knoel, kzhang, rvr, skozina, zsun
Version: 8.0   
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-4.18.0-111.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 21:35:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1513930, 1689297, 1696304, 1717394, 1717396    
Attachments:
Description Flags
reproduce test program none

Description Pavel Hrdina 2018-12-05 13:58:02 UTC
Created attachment 1511682 [details]
reproduce test program

Description of problem:

In libvirt we are using cgroups to limit resources available to QEMU processes and on of the resources is access to devices.  With the new cgroupv2 the access to devices is controller by BPF programs.  In libvirt we need to create new program for every VM and attach it to appropriate cgroup.


Version-Release number of selected component (if applicable):
kernel-4.18.0-47.el8.src.rpm


How reproducible:
Not always which indicates that it's most likely some race-condition.


Steps to Reproduce:
1. boot OS with 'systemd.unified_cgroup_hierarchy=1' on the kernel command line
2. install qemu-kvm package (for that you need to enable AppStream repository)
3. download attached test program
4. compile using gcc/clang
5. set 'ulimit -l unlimited' in order to successfully run test program
5. run attached program multiple times
6. use 'bpftool prog list' to list all BPF programs, you will see that there are existing "cgroup_device" programs that are no longer assigned to any existing cgroup
7. in the root cgroup you can check 

Actual results:
there are BPF programs left in the system that are not freed/removed

Expected results:
all programs should be freed

Comment 3 Jiri Olsa 2019-04-01 16:00:28 UTC
(In reply to Pavel Hrdina from comment #0)
> Created attachment 1511682 [details]
> reproduce test program
> 
> Description of problem:
> 
> In libvirt we are using cgroups to limit resources available to QEMU
> processes and on of the resources is access to devices.  With the new
> cgroupv2 the access to devices is controller by BPF programs.  In libvirt we
> need to create new program for every VM and attach it to appropriate cgroup.
> 
> 
> Version-Release number of selected component (if applicable):
> kernel-4.18.0-47.el8.src.rpm
> 
> 
> How reproducible:
> Not always which indicates that it's most likely some race-condition.
> 
> 
> Steps to Reproduce:
> 1. boot OS with 'systemd.unified_cgroup_hierarchy=1' on the kernel command
> line
> 2. install qemu-kvm package (for that you need to enable AppStream
> repository)
> 3. download attached test program
> 4. compile using gcc/clang
> 5. set 'ulimit -l unlimited' in order to successfully run test program
> 5. run attached program multiple times
> 6. use 'bpftool prog list' to list all BPF programs, you will see that there
> are existing "cgroup_device" programs that are no longer assigned to any
> existing cgroup
> 7. in the root cgroup you can check 
> 
> Actual results:
> there are BPF programs left in the system that are not freed/removed
> 
> Expected results:
> all programs should be freed

correct, looks like rhel8 does not release the program once the
cgroup is removed.. upstream seems to work, checking on the fix

jirka

Comment 4 Jiri Olsa 2019-04-01 16:03:49 UTC
(In reply to Jiri Olsa from comment #3)
> (In reply to Pavel Hrdina from comment #0)
> > Created attachment 1511682 [details]
> > reproduce test program
> > 
> > Description of problem:
> > 
> > In libvirt we are using cgroups to limit resources available to QEMU
> > processes and on of the resources is access to devices.  With the new
> > cgroupv2 the access to devices is controller by BPF programs.  In libvirt we
> > need to create new program for every VM and attach it to appropriate cgroup.
> > 
> > 
> > Version-Release number of selected component (if applicable):
> > kernel-4.18.0-47.el8.src.rpm
> > 
> > 
> > How reproducible:
> > Not always which indicates that it's most likely some race-condition.
> > 
> > 
> > Steps to Reproduce:
> > 1. boot OS with 'systemd.unified_cgroup_hierarchy=1' on the kernel command
> > line
> > 2. install qemu-kvm package (for that you need to enable AppStream
> > repository)
> > 3. download attached test program
> > 4. compile using gcc/clang
> > 5. set 'ulimit -l unlimited' in order to successfully run test program
> > 5. run attached program multiple times
> > 6. use 'bpftool prog list' to list all BPF programs, you will see that there
> > are existing "cgroup_device" programs that are no longer assigned to any
> > existing cgroup
> > 7. in the root cgroup you can check 
> > 
> > Actual results:
> > there are BPF programs left in the system that are not freed/removed
> > 
> > Expected results:
> > all programs should be freed
> 
> correct, looks like rhel8 does not release the program once the
> cgroup is removed.. upstream seems to work, checking on the fix
> 

looks like we're missing this one:
  d7bf2c10af05 bpf: allocate cgroup storage entries on attaching bpf programs

will provide build for testing

jirka

Comment 5 Jiri Olsa 2019-04-01 22:01:25 UTC
build with the fix:
  https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=20826762

works for my test, could you please try?

thanks,
jirka

Comment 6 Pavel Hrdina 2019-04-02 11:55:52 UTC
I've installed that kernel and after running this command:

    for i in {1..100}; do ./test-bpf ; done

where test-bpf is the compiled attachment

# date && bpftool prog list
Tue Apr  2 10:01:27 CEST 2019
4: cgroup_device  tag aeb9784193c239a2  gpl
	loaded_at 2019-04-02T09:45:17+0200  uid 0
	xlated 608B  jited 375B  memlock 4096B  map_ids 4
12: cgroup_device  tag aeb9784193c239a2  gpl
	loaded_at 2019-04-02T09:46:55+0200  uid 0
	xlated 608B  jited 375B  memlock 4096B  map_ids 12


And there are still some programs left in the kernel after some time
passed since the test-bpf was executed.

# uname -a
Linux rhel8 4.18.0-80.6.el8bpf_cgroup.x86_64 #1 SMP Mon Apr 1 20:10:51 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


After running and stopping VM 100 times the result is worse,
all of the 100 programs are still there.

Unfortunately that commit did not fix the issue.

I've tried newer kernel that is shipped in Fedora 29, it's 5.0.5 version,
if I try the test-bpf all BPF programs are correctly freed, but if
I start VM 100 times there is always only one BPF program left, which is
weird.  In addition some of the BPF programs are no freed immediately
but it takes N seconds for them to be freed.

Comment 7 Jiri Olsa 2019-04-03 09:03:19 UTC
right.. wrong direction, I can now reproduce in upstream as well, checking on the fix

jirka

Comment 8 Jiri Olsa 2019-06-20 21:06:32 UTC
fixed by upstream:
4bfc0bb2c60e bpf: decouple the lifetime of cgroup_bpf from cgroup itself

will post backport shortly
jirka

Comment 9 Jiri Olsa 2019-06-21 11:14:37 UTC
I have the backported build in here, could you please test?
  https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22302147

thanks,
jirka

Comment 15 Pavel Hrdina 2019-06-24 14:46:27 UTC
(In reply to Jiri Olsa from comment #9)
> I have the backported build in here, could you please test?
>   https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=22302147
> 
> thanks,
> jirka

Hi, thanks for the backport.  Tested with libvirt and everything looks good,
there were no programs leaked after starting and destroying 100 VMs.

Comment 17 Herton R. Krzesinski 2019-07-04 14:17:30 UTC
Patch(es) available on kernel-4.18.0-111.el8

Comment 22 errata-xmlrpc 2019-11-05 21:35:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3517