Description of problem: as per investigation in bug https://bugzilla.redhat.com/show_bug.cgi?id=2004037, from kernel side, at user level application podman has been identified as it's not able freed percpu memory which let the node out of resources. Version-Release number of selected component (if applicable): OCP version 4.6.25 podman-1.9.3-3.rhaos4.6.el8.x86_64 kernel-4.18.0-193.47.1.el8_2.x86_64 How reproducible: How reproducible: 100% Steps to Reproduce: 1. Install RHEL 8.2 2. Install container tools. $ dnf install -y @container-tools 3. Run below podman command in a loop, you may run multiple loops with different name to get quick spike in Percpu counter value in /proc/meminfo output. $ while :; do podman run --name=test --replace centos /bin/echo 'running'; done Actual results: Percpu usage is getting increasing gradually. Expected results: Memory should get released in Percpu usage. Additional info: This bug has been opened to verify if at user level applications like podman can change the way some of the interprocess communication works and if it's possible to work around this percpu memory increase problem.
Giuseppe or Brent does this ring any bells?
*** Bug 2049288 has been marked as a duplicate of this bug. ***
it is going to be recreated. It is not about the file size, I think its memory pages are referencing different memory cgroups, causing them to not be freed. It is interesting to see in your case how much it can help to free these cgroups. That is why I'd like to know how /proc/cgroups looks like before and after you delete the events.log file. The file is going to be recreated, so you lose only what was present there before, you don't have to restart any service.
Based on a patched upstream kernel with podman version 3.4.5-dev on the latest RHEL8 system, I got the following page_owner output for a page pinned by one of the offline memcgs: Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 366110 (podman), ts 565417059747 ns, free_ts 565413281650 ns PFN 1142538 type Movable Block 2231 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff) prep_new_page+0x8e/0xb0 get_page_from_freelist+0xc4d/0xe50 __alloc_pages+0x172/0x320 alloc_pages_vma+0x84/0x230 shmem_alloc_page+0x3f/0x90 shmem_alloc_and_acct_page+0x76/0x1c0 shmem_getpage_gfp+0x48d/0x890 shmem_write_begin+0x36/0xc0 generic_perform_write+0xed/0x1d0 __generic_file_write_iter+0xdc/0x1b0 generic_file_write_iter+0x5d/0xb0 new_sync_write+0x11f/0x1b0 vfs_write+0x1ba/0x2a0 ksys_write+0x59/0xd0 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Charged to offline memcg libpod-conmon-027816ee7641fce83a044ccce3e99b2a33525b6958d9363d9c497db01ee2050a. "org.label-schema.build-date":"20201204","org.label-schema.licen se":"GPLv2","org.label-schema.name":"CentOS Base Image","org.lab el-schema.schema-version":"1.0","org.label-schema.vendor":"CentO S"}}\n{"ID":"027816ee7641fce83a044ccce3e99b2a33525b6958d9363d9c49 The first 256 bytes of the shmem page was printed, and it does look like some kind of log file. After deleting the event log file (/run/libpod/events/events.log), the number of cgroups dropped from 267 to 164. After another 1000 invocation of podman, percpu memory consumption (as reported in meminfo) increased from 101376 kB to 117504 kB. After deleting the event log file again, the percpu memory consumption dropped back to 100992 kB. So deleting the event log file can be one possible workaround. -Longman
Customer has provided the /proc/cgroups and its the same before and after the delete of the event log file. Attached both files
Assigning to Giuseppe to continue the debugging.
from a Podman PoV I think the workaround is to set events_logger="journald" in /etc/containers/containers.conf to prevent using the events.log file. For the issue you are seeing with CRI-O, that is a different leak, so let's track it separately. Can you clone or file a new issue for Node?
gscrivan : I'm not sure neither, IFAIU the only way to determine what is allocating the memory percpu is by getting the page_owner information to move the bug wherever is required. This bug was initially under CRI-O review.
Could you try the equivalent test with CRI-O? What happens if you delete /run/crio? This is a destructive action though, you may need to restart the node and restart all the containers there
I've got no feedback on my suggestion from February: > please copy the file from /usr/share/containers. > You can `cp /usr/share/containers/containers.conf /etc/containers/` then edit `/etc/containers/containers.conf` to change the log backend to journals. after that, switch log driver to journald.
Based on the discussions above, I'm closing this as fixed with Giuseppe's last comment. If this does not fix your CRI-O percpu issues @roarora , please open a new BZ.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days