2049289 – podman increase percpu memory and can't be freed

Bug 2049289 - podman increase percpu memory and can't be freed

Summary: podman increase percpu memory and can't be freed

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Containers
Sub Component:
Version:	4.6
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Giuseppe Scrivano
QA Contact:	pmali
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2049288 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-02-01 20:56 UTC by Pamela Escorza
Modified:	2024-01-09 12:35 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-06-28 19:56:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Pamela Escorza 2022-02-01 20:56:24 UTC

Description of problem:
as per investigation in bug https://bugzilla.redhat.com/show_bug.cgi?id=2004037, from kernel side, at user level application podman has been identified as it's not able freed percpu memory which let the node out of resources.


Version-Release number of selected component (if applicable):
OCP version 4.6.25
podman-1.9.3-3.rhaos4.6.el8.x86_64   
kernel-4.18.0-193.47.1.el8_2.x86_64 

How reproducible:

How reproducible: 100%


Steps to Reproduce:
1. Install RHEL 8.2 

2. Install container tools.

$ dnf install -y @container-tools  

3. Run below podman command in a loop, you may run multiple loops with different name to get quick spike in Percpu counter value in /proc/meminfo output.

$ while :; do podman run --name=test --replace centos /bin/echo 'running'; done

Actual results:  
Percpu usage is getting increasing gradually.

Expected results: 
Memory should get released in Percpu usage.


Additional info:
This bug has been opened to verify if at user level applications like podman can change the way some of the interprocess communication works and if it's possible to work around this percpu memory increase problem.

Comment 4 Daniel Walsh 2022-02-02 13:48:24 UTC

Giuseppe or Brent does this ring any bells?

Comment 7 Pamela Escorza 2022-02-02 16:44:53 UTC

*** Bug 2049288 has been marked as a duplicate of this bug. ***

Comment 14 Giuseppe Scrivano 2022-02-03 16:18:01 UTC

it is going to be recreated.

It is not about the file size, I think its memory pages are referencing different memory cgroups, causing them to not be freed.

It is interesting to see in your case how much it can help to free these cgroups.  That is why I'd like to know how /proc/cgroups looks like before and after you delete the events.log file.

The file is going to be recreated, so you lose only what was present there before, you don't have to restart any service.

Comment 15 Waiman Long 2022-02-03 16:56:10 UTC

Based on a patched upstream kernel with podman version 3.4.5-dev on the latest RHEL8 system, I got the following page_owner output for a page pinned by one of the offline memcgs:

Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 366110 (podman), ts 565417059747 ns, free_ts 565413281650 ns
PFN 1142538 type Movable Block 2231 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff)
 prep_new_page+0x8e/0xb0
 get_page_from_freelist+0xc4d/0xe50
 __alloc_pages+0x172/0x320
 alloc_pages_vma+0x84/0x230
 shmem_alloc_page+0x3f/0x90
 shmem_alloc_and_acct_page+0x76/0x1c0
 shmem_getpage_gfp+0x48d/0x890
 shmem_write_begin+0x36/0xc0
 generic_perform_write+0xed/0x1d0
 __generic_file_write_iter+0xdc/0x1b0
 generic_file_write_iter+0x5d/0xb0
 new_sync_write+0x11f/0x1b0
 vfs_write+0x1ba/0x2a0
 ksys_write+0x59/0xd0
 do_syscall_64+0x37/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
Charged to offline memcg libpod-conmon-027816ee7641fce83a044ccce3e99b2a33525b6958d9363d9c497db01ee2050a.
"org.label-schema.build-date":"20201204","org.label-schema.licen
se":"GPLv2","org.label-schema.name":"CentOS Base Image","org.lab
el-schema.schema-version":"1.0","org.label-schema.vendor":"CentO
S"}}\n{"ID":"027816ee7641fce83a044ccce3e99b2a33525b6958d9363d9c49

The first 256 bytes of the shmem page was printed, and it does look like some kind of log file.

After deleting the event log file (/run/libpod/events/events.log), the number of cgroups dropped from 267 to 164. 

After another 1000 invocation of podman, percpu memory consumption (as reported in meminfo) increased from 101376 kB to 117504 kB. After deleting the event log file again, the percpu memory consumption dropped back to 100992 kB.

So deleting the event log file can be one possible workaround.

-Longman

Comment 18 Pamela Escorza 2022-02-03 17:45:46 UTC

Customer has provided the /proc/cgroups and its the same before and after the delete of the event log file. Attached both files

Comment 21 Tom Sweeney 2022-02-03 18:48:03 UTC

Assigning to Giuseppe to continue the debugging.

Comment 26 Giuseppe Scrivano 2022-02-04 08:26:55 UTC

from a Podman PoV I think the workaround is to set events_logger="journald" in /etc/containers/containers.conf to prevent using the events.log file.

For the issue you are seeing with CRI-O, that is a different leak, so let's track it separately.  Can you clone or file a new issue for Node?

Comment 27 Pamela Escorza 2022-02-04 08:49:58 UTC

gscrivan : I'm not sure neither, IFAIU the only way to determine what is allocating the memory percpu is by getting the page_owner information to move the bug wherever is required. This bug was initially under CRI-O review.

Comment 28 Giuseppe Scrivano 2022-02-04 11:57:09 UTC

Could you try the equivalent test with CRI-O?  What happens if you delete /run/crio?  This is a destructive action though, you may need to restart the node and restart all the containers there

Comment 59 Giuseppe Scrivano 2022-06-13 08:17:07 UTC

I've got no feedback on my suggestion from February:

> please copy the file from /usr/share/containers.

> You can `cp /usr/share/containers/containers.conf /etc/containers/` then edit `/etc/containers/containers.conf` to change the log backend to journals.

after that, switch log driver to journald.

Comment 60 Tom Sweeney 2022-06-28 19:56:06 UTC

Based on the discussions above, I'm closing this as fixed with Giuseppe's last comment.  If this does not fix your CRI-O percpu issues @roarora , please open a new BZ.

Comment 61 Red Hat Bugzilla 2023-09-15 01:51:31 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.