Bug 1855748
| Summary: | crictl stats shows values of the whole pod and not of the container | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Roman Mohr <rmohr> |
| Component: | Node | Assignee: | Peter Hunt <pehunt> |
| Status: | CLOSED ERRATA | QA Contact: | Weinan Liu <weinliu> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.4 | CC: | aos-bugs, jokerman, pehunt, rphillips |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-27 16:13:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
fixed upstream by attached pr This will be resolved next sprint still waiting on fix to go in upstream upstream PR has been merged Weinan, I believe you've verified based on a slight misunderstanding of the source of the problem. For clarification on the original report: > One can see that this container has a memory limit of 40MB applied and how much memory is acutally consumed. Yes this is a difference between CRI stats and docker stats > When I start the same pod on openshift and I run > ``` > # crictl stats fb8662e85892e > CONTAINER CPU % MEM DISK INODES > fb8662e85892e 1.62 210.2MB 12.72MB 20 > > ``` This is the real bug. If you run a pod with two different containers, they should have different stats reports. before the fix, they had the same stats report > I get the memory consumption of the whole pod. I confirmed that `fb8662e85892e` indeed points to the right pid `351019`. > ``` > # crictl inspect fb8662e85892e > { > "pid": 351019, > "sandboxId": "052f3e0fedaf1b797f500da998b71b9709c6934d9985916312d7d6f8ada3e0e1" > } > ``` > However this sandbox actually belongs to /usr/bin/pod (349553): > ``` > # runc state 052f3e0fedaf1b797f500da998b71b9709c6934d9985916312d7d6f8ada3e0e1 > { > "ociVersion": "1.0.1-dev", > "id": "052f3e0fedaf1b797f500da998b71b9709c6934d9985916312d7d6f8ada3e0e1", > "pid": 349553, > >``` This is as designed. The "sandbox" really is a collection of containers. We represent it with the pause container (which runs the process /usr/bin/pod). The pause container is a running container, so it has its own pid. "actually belongs to /usr/bin/pod" is expected. the real bug here is that we account for the memory in the pause container for the container's stats, when reporting through CRI. Though, it's much easier to verify if there are two containers in the pod (plus the pause container), as crictl stats will output the information for both, giving you a clear picture of whether they're different, which should now be the case As per comment 10, issue got fixed on $ oc version Client Version: 4.5.2 Server Version: 4.6.0-0.nightly-2020-08-06-131904 Kubernetes Version: v4.6.0-202008061010.p0-dirty Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |
Description of problem: I am currently diagnosing an OOM problem on a Pod with a container with limits. When I run ``` $ docker stats 1349ed010bee CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 1349ed010bee k8s_volumecontainerdisk_virt-launcher-vmi-nocloud-zm5ds_default_88e6e011-33c7-4173-a84c-a0fe83012e02_0 0.39% 1.711MiB / 38.14MiB 4.49% 0B / 0B 1.73MB / 0B 1 ``` One can see that this container has a memory limit of 40MB applied and how much memory is acutally consumed. When I start the same pod on openshift and I run ``` # crictl stats fb8662e85892e CONTAINER CPU % MEM DISK INODES fb8662e85892e 1.62 210.2MB 12.72MB 20 ``` I get the memory consumption of the whole pod. I confirmed that `fb8662e85892e` indeed points to the right pid `351019`. ``` # crictl inspect fb8662e85892e { "pid": 351019, "sandboxId": "052f3e0fedaf1b797f500da998b71b9709c6934d9985916312d7d6f8ada3e0e1" } ``` However this sandbox actually belongs to /usr/bin/pod (349553): ``` # runc state 052f3e0fedaf1b797f500da998b71b9709c6934d9985916312d7d6f8ada3e0e1 { "ociVersion": "1.0.1-dev", "id": "052f3e0fedaf1b797f500da998b71b9709c6934d9985916312d7d6f8ada3e0e1", "pid": 349553, ``` maybe that is the source of the confusion and stats takes the wrong entrypoint (e.g. the parent sandbox). Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: I would expect to see the memory consumption of the container and not of the pod. To make diagnosing issues easier. Additional info: