Bug 2004037

Summary:	Percpu counter usage is gradually getting increasing during podman container recreation.
Product:	Red Hat Enterprise Linux 8	Reporter:	rcheerla
Component:	kernel	Assignee:	Waiman Long <llong>
kernel sub component:	Memory Management	QA Contact:	Chao Ye <cye>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	urgent	CC:	akanekar, akesarka, aquini, atomlin, bhenders, bjohri, cchen, chris.bowles, chrzhang, cldavey, cye, David.Taylor, ddutile, dornelas, fboboc, fperalta, hfukumot, jaeshin, jarod, kjavier, kwwong, llong, mharri, michele, mmilgram, mmilgram, mschibli, ngirard, nsu, nyelle, palonsor, pauwebst, pehunt, pescorza, pifang, psingour, rgertzbe, rmanes, rnoma, roarora, ruud, saniyer, skamboj, skanniha, skrenger, snishika, tkimura, vagrawal, vbendel, vumrao, wwurzbac
Version:	8.4	Keywords:	Triaged, ZStream
Target Milestone:	rc	Flags:	nyelle: needinfo- nyelle: needinfo- nyelle: needinfo- pehunt: needinfo- pm-rhel: mirror+
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	kernel-4.18.0-404.el8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	2054076 2110039 2110040 (view as bug list)		Environment:
Last Closed:	2022-11-08 10:14:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2037529, 2054076, 2110039, 2110040

Description rcheerla 2021-09-14 12:02:02 UTC

Description of problem: pcpu_get_vm_areas using most of the memory during container recreation using podman.


Version-Release number of selected component (if applicable): RHEL 8.4
4.18.0-305.17.1.el8_4.x86_64

podman-3.2.3-0.10.module+el8.4.0+11989+6676f7ad.x86_64
podman-catatonit-3.2.3-0.10.module+el8.4.0+11989+6676f7ad.x86_64


How reproducible: 100%


Steps to Reproduce:
1. Install RHEL 8.4 

2. Install container tools.

$ dnf install -y @container-tools  

3. Run below podman command in a loop, you may run multiple loops with different name to get quick spike in Percpu counter value in /proc/meminfo output.

$ while :; do podman run --name=test --replace centos /bin/echo 'running'; done

Actual results:  Percpu usage is getting increasing gradually.

Expected results: Memory should get released in Percpu usage.


Additional info: Similar kind of issue has been discussed in the following LWN link.

https://lwn.net/Articles/851475/

Comment 1 rcheerla 2021-09-14 12:48:37 UTC

Hello Team,

1] Executed below command multiple times like below on RHEL 8.4 latest kernel.

while :; do podman run --name=test1 --replace centos /bin/echo 'running'; done
..
while :; do podman run --name=test20 --replace centos /bin/echo 'running'; done

2] Monitored the system for about 12 hours and do see around 2 GB growth in Percpu counter value.

grep Per meminfo-a.out
Percpu:          2077248 kB


o Do see growth in directory entries for memory controller. 
o However I don't see many entries under /sys/fs/cgroup/memory directory, hardly I could see 200 entries. 

cat /proc/cgroups | column -t
#subsys_name  hierarchy  num_cgroups  enabled
cpuset        11         16           1
cpu           10         20           1
cpuacct       10         20           1
blkio         5          20           1
memory        4          31055        1
devices       6          67           1
freezer       8          16           1
net_cls       2          16           1
perf_event    3          16           1
net_prio      2          16           1
hugetlb       9          16           1
pids          12         100          1
rdma          7          1            1

o meminfo output

MemTotal:       10015428 kB
MemFree:         1280632 kB
MemAvailable:    2362640 kB
Buffers:            3240 kB
Cached:          1925576 kB

Slab:            1641464 kB
SReclaimable:     943816 kB
SUnreclaim:       697648 kB

VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:          2077248 kB

o vmallocinfo output. 

pcpu_get_vm_areas+0x0/0x1140               2139095040  Bytes
pcpu_create_chunk+0x16c/0x1c0              174743552   Bytes
_stp_map_new_is.constprop.65+0x148/0x220   144994304   Bytes
pci_mmcfg_arch_map+0x31/0x70               134221824   Bytes
relay_open_buf.part.11+0x1af/0x2e0         42024960    Bytes
_do_fork+0x8f/0x350                        35246080    Bytes
alloc_large_system_hash+0x19e/0x261        29462528    Bytes
vmw_fb_init+0x1bd/0x3c0                    8413184     Bytes
layout_and_allocate+0x9c9/0xd40            6782976     Bytes
pcpu_create_chunk+0x77/0x1c0               4153344     Bytes


o I have collected the stack traces leading to pcpu_alloc(). 
o Showing something like below, will attach the nohup.out file as well.


0x36714500bab0
  0xffffffff964e6949 : kmem_cache_open+0x3b9/0x420 [kernel]
 0xffffffff964e7262 : __kmem_cache_create+0x12/0x50 [kernel]
 0xffffffff9648c7f9 : kmem_cache_create_usercopy+0x169/0x2d0 [kernel]
 0xffffffff9648c972 : kmem_cache_create+0x12/0x20 [kernel]
 0xffffffffc0889088 : 0xffffffffc0889088
 0xffffffffc0889088 : 0xffffffffc0889088
 0xffffffffc088903b : 0xffffffffc088903b
 0xffffffff962027f6 : do_one_initcall+0x46/0x1c3 [kernel]
 0xffffffff9638a3fa : do_init_module+0x5a/0x220 [kernel]
 0xffffffff9638c835 : load_module+0x14c5/0x17f0 [kernel]
 0xffffffff9638cc9b : __do_sys_init_module+0x13b/0x180 [kernel]
 0xffffffff9620420b : do_syscall_64+0x5b/0x1a0 [kernel]
 0xffffffff96c000ad : entry_SYSCALL_64_after_hwframe+0x65/0xca [kernel]

0x36714500bad0
  0xffffffff9653ab2e : alloc_vfsmnt+0x7e/0x1e0 [kernel]
 0xffffffff9653bfb3 : clone_mnt+0x33/0x330 [kernel]
 0xffffffff9653d6dc : copy_tree+0x6c/0x300 [kernel]
 0xffffffff9653d9d8 : __do_loopback.isra.61+0x68/0xd0 [kernel]
 0xffffffff9653fe79 : do_mount+0x7c9/0x950 [kernel]
 0xffffffff965403a6 : ksys_mount+0xb6/0xd0 [kernel]
 0xffffffff965403e1 : __x64_sys_mount+0x21/0x30 [kernel]
 0xffffffff9620420b : do_syscall_64+0x5b/0x1a0 [kernel]
 0xffffffff96c000ad : entry_SYSCALL_64_after_hwframe+0x65/0xca [kernel]
 0xffffffff96c000ad : entry_SYSCALL_64_after_hwframe+0x65/0xca [kernel]

0x367143219b20
  0xffffffff9669fcf4 : __percpu_counter_init+0x24/0xa0 [kernel]
 0xffffffff96b2788e : fprop_global_init+0x1e/0x30 [kernel]
 0xffffffff96b42b64 : mem_cgroup_css_alloc+0x1f4/0x860 [kernel]
 0xffffffff96399720 : cgroup_apply_control_enable+0x130/0x350 [kernel]
 0xffffffff9639bc86 : cgroup_mkdir+0x216/0x4c0 [kernel]
 0xffffffff965ada7a : kernfs_iop_mkdir+0x5a/0x90 [kernel]
 0xffffffff96527572 : vfs_mkdir+0x102/0x1b0 [kernel]
 0xffffffff9652b0ad : do_mkdirat+0x7d/0xf0 [kernel]
 0xffffffff9620420b : do_syscall_64+0x5b/0x1a0 [kernel]
 0xffffffff96c000ad : entry_SYSCALL_64_after_hwframe+0x65/0xca [kernel]
 0xffffffff96c000ad : entry_SYSCALL_64_after_hwframe+0x65/0xca [kernel]

0x367145011ab0
  0xffffffff96b42a14 : mem_cgroup_css_alloc+0xa4/0x860 [kernel]
 0xffffffff96399720 : cgroup_apply_control_enable+0x130/0x350 [kernel]
 0xffffffff9639bc86 : cgroup_mkdir+0x216/0x4c0 [kernel]
 0xffffffff965ada7a : kernfs_iop_mkdir+0x5a/0x90 [kernel]
 0xffffffff96527572 : vfs_mkdir+0x102/0x1b0 [kernel]
 0xffffffff9652b0ad : do_mkdirat+0x7d/0xf0 [kernel]
 0xffffffff9620420b : do_syscall_64+0x5b/0x1a0 [kernel]
 0xffffffff96c000ad : entry_SYSCALL_64_after_hwframe+0x65/0xca [kernel]
 0xffffffff96c000ad : entry_SYSCALL_64_after_hwframe+0x65/0xca [kernel]


o I have tried to drop_cache by using 'echo 2' and 'echo 3' did not help.

I am not sure how to find the used memory in "Percpu" or how to reclaim this, need further suggestions. 
Or this could be a bug in the kernel.

Also, confirm if below patch set is backported into any RHEL8 kernel so that I will test and confirm the result.

https://yhbt.net/lore/all/20210407182618.2728388-4-guro@fb.com/T/

Regards,
Raju

Comment 3 Waiman Long 2021-09-14 13:16:13 UTC

Thanks for the report.

Increase in percpu memory consumption over time is inevitable due to memory fragmentation. However, we will backport some of the upstream percpu related commits to reduce the rate of increase.

Comment 4 Waiman Long 2021-09-16 13:33:46 UTC

What I have found out is that the increase in percpu memory consumption is likely due to the percpu vmstat data in dying mem_cgroup structures being held in place by references stored in page cache. By "echo 1 > /proc/sys/vm/drop_caches", this will allow those dying mem_cgroup structures to finally get freed. Could you try that to see if that helps to reduce the percpu memory consumption back to a more normal level?

Comment 5 rcheerla 2021-09-16 14:48:25 UTC

Hi Waiman,

I have tried "echo 1 > /proc/sys/vm/drop_caches" but did not help.

o Before initiating "podman run" command. 

$ grep Per /proc/meminfo 
Percpu:            20160 kB

$ cat /proc/cgroups | column -t
#subsys_name  hierarchy  num_cgroups  enabled
memory        3          163          1

$ date 
Thu Sep 16 08:56:05 CDT 2021

$ grep Per /proc/meminfo 
Percpu:            31872 kB


o After the loop has been completed. 

$ ps aux | grep -i "podman run"
root     3968422  0.0  0.0  12136  1196 pts/0    S+   09:32   0:00 grep --color=auto -i podman run

$ date 
Thu Sep 16 09:36:26 CDT 2021

$ grep Per /proc/meminfo 
Percpu:           111456 kB  <<---

$ cat /proc/cgroups | column -t
#subsys_name  hierarchy  num_cgroups  enabled
memory        3          1327         1


$ echo 1 > /proc/sys/vm/drop_caches 
$ echo 2 > /proc/sys/vm/drop_caches 
$ echo 3 > /proc/sys/vm/drop_caches 
$ sync

o Still it did not reduce. 

$ grep Per /proc/meminfo 
Percpu:           111456 kB    <<<---

Comment 9 rcheerla 2021-09-20 17:50:22 UTC

(In reply to Waiman Long from comment #8)

> Well, it is what I have expected. But is the memory increase slowing down or
> is at the same rate?
> 
> -Longman

Yes the growth is almost at the same rate, however I will run the loop for longer time and will confirm the result.

Comment 12 John Siddle 2021-10-05 14:57:28 UTC

FWIW, it seems like the previous versions of the podman packages exhibit the same problem at roughly the same rate.

    test:

        # do podman run --name=test --replace centos /bin/echo 'running'

    [root@rhel8 ~]# rpm -qa | grep podman
    podman-catatonit-3.0.1-7.module+el8.4.0+11311+9da8acfb.x86_64
    cockpit-podman-29-2.module+el8.4.0+11311+9da8acfb.noarch
    podman-3.0.1-7.module+el8.4.0+11311+9da8acfb.x86_64

    [root@rhel8 ~]# grep Percpu /proc/meminfo 
    Percpu:             1080 kB

    * run the test 3k times *

    [root@rhel8 ~]# grep Percpu /proc/meminfo 
    Percpu:             2832 kB

    1752 KiB growth over 3000 runs 

    ~~~~~~~~~~

    after updating podman packages and rebooting:

    [root@rhel8 ~]# rpm -qa | grep podman
    podman-catatonit-3.2.3-0.11.module+el8.4.0+12050+ef972f71.x86_64
    podman-3.2.3-0.11.module+el8.4.0+12050+ef972f71.x86_64
    cockpit-podman-32-2.module+el8.4.0+11990+22932769.noarch

    [root@rhel8 ~]# grep Percpu /proc/meminfo 
    Percpu:             1064 kB

    * run the test 3k times *

    [root@rhel8 ~]# grep Percpu /proc/meminfo 
    Percpu:             2760 kB

    1696 KiB growth over 3000 runs

not sure if 3k runs if sufficient to highlight the problem.

Comment 13 Waiman Long 2021-10-05 15:46:58 UTC

(In reply to John Siddle from comment #12)
> FWIW, it seems like the previous versions of the podman packages exhibit the
> same problem at roughly the same rate.

Thanks for running the test. It does look like downgrading to the previous version of podman and/or kernel will not help.

Comment 39 Waiman Long 2022-02-03 18:07:18 UTC

*** Bug 2004453 has been marked as a duplicate of this bug. ***

Comment 52 Rafael Aquini 2022-03-29 18:10:23 UTC

*** Bug 2044626 has been marked as a duplicate of this bug. ***

Comment 53 Ruud 2022-03-29 21:17:35 UTC

My ticket 2044626 is closed as duplicate for this one, i sadly did not have enough time to debug it fully and supply all information needed. I did however find a possible workaround, by doing "swapoff -a" and running the machine without any swap my memleak issue went away. While this workaround works for me, for certain machines i do prefer to have a small swap preferably.

Comment 59 Rafael Aquini 2022-04-21 14:07:46 UTC

*** Bug 2037529 has been marked as a duplicate of this bug. ***

Comment 60 Waiman Long 2022-04-21 15:08:20 UTC

An upstream patch has been posted.

https://lore.kernel.org/lkml/20220421145845.1044652-1-longman@redhat.com/

Comment 62 Peter Hunt 2022-04-22 15:21:43 UTC

*** Bug 2014136 has been marked as a duplicate of this bug. ***

Comment 83 Chao Ye 2022-06-17 05:42:09 UTC

MM tier test pass with kernel-4.18.0-397.g2c67.el8.mr2872_220603_1814 from comment#72:
https://beaker.engineering.redhat.com/jobs/6725731

Set Verified:Tested

Comment 106 Nick Su 2022-07-25 10:29:14 UTC

Hi

One of my customer hit this issue on their OCP4.8, may I know whether there is any ETA for this fix be ported to OCP please

Comment 107 kwwong 2022-07-27 02:20:13 UTC

Hi team,

It seems this bug is affecting elasticsearch in OCP 4.7 as well, can you please look into a fix ported to OCP? Case 03269526 is attached to this bug, thanks.

Comment 108 kwwong 2022-07-27 02:21:16 UTC

Hi team,

It seems this bug is affecting elasticsearch in OCP 4.7 as well, can you please look into a fix ported to OCP? Case 03269526 is attached to this bug, thanks.

Comment 126 Peter Hunt 2022-08-08 18:10:43 UTC

*** Bug 2111139 has been marked as a duplicate of this bug. ***

Comment 147 errata-xmlrpc 2022-11-08 10:14:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7683