Bug 2270717 (CVE-2024-3056)
Summary: | CVE-2024-3056 podman: kernel: containers in shared IPC namespace are vulnerable to denial of service attack | ||||||
---|---|---|---|---|---|---|---|
Product: | [Other] Security Response | Reporter: | Robb Gatica <rgatica> | ||||
Component: | vulnerability | Assignee: | Product Security <prodsec-ir-bot> | ||||
Status: | NEW --- | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | unspecified | CC: | 906096237, acaringi, allarkin, aquini, askrabec, bhu, chwhite, cye, cyin, dan.cermak, dbohanno, debarbos, dfreiber, drow, dvlasenk, esandeen, ezulian, gscrivan, hkrzesin, jarod, jburrell, jdenham, jfaracco, jlelli, joe.lawrence, jshortt, jstancek, jwyatt, kcarcia, ldoskova, lgoncalv, llong, lzampier, mheon, mleitner, mmilgram, mstowell, nmurray, ptalbert, rparrazo, rrobaina, rvrbovsk, rysulliv, scweaver, security-response-team, sidakwo, sukulkar, tglozar, tsweeney, tyberry, vkumar, wcosta, williams, wmealing, ycote, ykopkova, zhijwang | ||||
Target Milestone: | --- | Keywords: | Security | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: |
A flaw was found in Podman. This issue may allow an attacker to create a specially crafted container that, when configured to share the same IPC with at least one other container, can create a large number of IPC resources in /dev/shm. The malicious container will continue to exhaust resources until it is out-of-memory (OOM) killed. While the malicious container's cgroup will be removed, the IPC resources it created are not. Those resources are tied to the IPC namespace that will not be removed until all containers using it are stopped, and one non-malicious container is holding the namespace open. The malicious container is restarted, either automatically or by attacker control, repeating the process and increasing the amount of memory consumed. With a container configured to restart always, such as `podman run --restart=always`, this can result in a memory-based denial of service of the system.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | Type: | --- | |||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 2302003 | ||||||
Bug Blocks: | 2270713 | ||||||
Attachments: |
|
Description
Robb Gatica
2024-03-21 14:56:42 UTC
the issue is that the allocated memory is assigned to the container cgroup that has a memory limit, if that limit is hit the kernel refuses to allocate more. But if the container exits, the shared memory is not freed and it is still allocated to the first cgroup, that is not accessible anymore from user space (but it seems to be still referenced internally in the kernel). When the container exits, we create a new one, that uses the same cgroup name and it also has a limit set. In this way the container can be restarted multiple times and each restart can leak memory since the cgroup seems to not be freed internally. I've not looked into the code, so this analysis is only based on my observations from user space, it might be completely wrong :-) If I am not wrong, then I think the sensible thing to do in this case would be to migrate the memory allocated to the cgroup that was destroyed to another cgroup using that same IPC. In this way from userspace we can avoid the issue by making sure each cgroup has a limit set, and that there are no leaks once a cgroup is deleted. I think we should add "Waiman Long <llong>" to the conversation. It seems I am not allowed to do it Added llong to CC list (In reply to Giuseppe Scrivano from comment #11) > the issue is that the allocated memory is assigned to the container cgroup > that has a memory limit, if that limit is hit the kernel refuses to allocate > more. > > But if the container exits, the shared memory is not freed and it is still > allocated to the first cgroup, that is not accessible anymore from user > space (but it seems to be still referenced internally in the kernel). > > When the container exits, we create a new one, that uses the same cgroup > name and it also has a limit set. > > In this way the container can be restarted multiple times and each restart > can leak memory since the cgroup seems to not be freed internally. > > I've not looked into the code, so this analysis is only based on my > observations from user space, it might be completely wrong :-) > > If I am not wrong, then I think the sensible thing to do in this case would > be to migrate the memory allocated to the cgroup that was destroyed to > another cgroup using that same IPC. In this way from userspace we can avoid > the issue by making sure each cgroup has a limit set, and that there are no > leaks once a cgroup is deleted. > > I think we should add "Waiman Long <llong>" to the conversation. > It seems I am not allowed to do it There is actually upstream discussion about this specific problem. In the case of shared memory, memory ownership is assigned to the memory cgroup of the first process that uses it. References to that shared memory can be present in other memory cgroups. When the owning cgroup has exited all its processes and to be destroyed, it remained in the zombie state because of the additional references to the shared memory. AFAIK, there was no consensus on the best way forward the last time I checked. I need to check again to see if there is progress on this issue. Created podman tracking bugs for this issue: Affects: fedora-all [bug 2302003] Created attachment 2076557 [details]
report email
We are a security team from multiple universities. The earlier 'report' received by the Podman Security Team was from us. We would like to provide some supplementary information regarding additional attack vectors not mentioned in the original report.
- Network namespace sharing: If two malicious containers share a network namespace and are given the ‘net_admin’ privilege, they can coordinate to bypass cgroup restrictions by reproducing the DoS attack steps, just as they would if they share an IPC namespace. The difference is that when sharing a network namespace, malicious containers consume memory by creating network devices.
- PID Namespace sharing: By sharing a PID namespace, two malicious containers can bypass the cgroup limit on the number of processes and launch a DoS attack on the host system. Specifically, an attacker can create such containers with a non-functional 'init' process (specified using 'podman run --init-path=init_process') that cannot properly handle orphaned processes. This allows the container to generate lots of zombie processes. If similar attack steps in IPC namespace sharing are repeated, one container can continuously generate zombie processes and restart once it hits the cgroup PID limits. As a result, zombie processes will accumulate within the shared namespace, eventually exhausting the host system's PID resources.
Zhen Xu, Huazhong University of Science and Technology
Zhi Li, Huazhong University of Science and Technology
Weijie Liu, Nankai University
XiaoFeng Wang, Indiana University Bloomington
|