Bug 1506651 - [CRI-O] CRI-O increasing memory usage over time
Summary: [CRI-O] CRI-O increasing memory usage over time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.7.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.9.0
Assignee: Antonio Murdaca
QA Contact: Walid A.
URL:
Whiteboard: aos-scalability-37
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-26 13:47 UTC by Sebastian Jug
Modified: 2018-03-28 14:09 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-28 14:08:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Graph of CRI-O memory usage over 7 runs (31.35 KB, image/png)
2017-10-26 13:47 UTC, Sebastian Jug
no flags Details
CRI-O /metrics output on initial process start (before) (6.54 KB, text/plain)
2017-10-26 13:48 UTC, Sebastian Jug
no flags Details
CRI-O /metrics output steady state after many scale runs (after) (6.82 KB, text/plain)
2017-10-26 13:49 UTC, Sebastian Jug
no flags Details
Tar of 5 run heap dumps & metrics (1.10 MB, application/x-xz)
2017-10-27 12:13 UTC, Sebastian Jug
no flags Details
Graph of CRI-O memory usage over several runs (after fix) (27.09 KB, image/png)
2017-10-30 18:23 UTC, Sebastian Jug
no flags Details
cri-o pidstat memory usage on 4th run deploying 500 pausepods (552.81 KB, image/png)
2018-02-14 17:41 UTC, Walid A.
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:09:21 UTC

Description Sebastian Jug 2017-10-26 13:47:35 UTC
Created attachment 1343750 [details]
Graph of CRI-O memory usage over 7 runs

Description of problem:
There may be a potential memory/goroutine leak going on, as we see the memory usage consistently increase over time.

Version-Release number of selected component (if applicable):
atomic-openshift-3.7.0-0.143.7.git.0.ee33a8f.el7.x86_64
crio version 1.8.0-dev, commit: "9ec09fa3ae89a5cda511a95bf51e3a8856a66632"
runc version 1.0.0-rc4+dev, commit: 74a17296470088de3805e138d3d87c62e613dfc4

How reproducible:
Every time


Steps to Reproduce:
1. Run scale test involving large number of pods
2. Record memory usage over time.
3. Repeat Steps 1-2.


Actual results:
CRI-O memory usage continues to grow over time and does not appear to stabilize.

Expected results:
CRI-O memory usage should remain the same or return to low usage after test completion.

Comment 1 Sebastian Jug 2017-10-26 13:48:39 UTC
Created attachment 1343751 [details]
CRI-O /metrics output on initial process start (before)

Comment 2 Sebastian Jug 2017-10-26 13:49:09 UTC
Created attachment 1343752 [details]
CRI-O /metrics output steady state after many scale runs (after)

Comment 3 Antonio Murdaca 2017-10-26 15:18:20 UTC
would be awesome to get goroutine stack dump on this (if you enabled --profile, http://localhost:6060/debug/pprof/goroutine?debug=3 should give you goroutines dump). I'll try to reproduce this myself on my host by starting/stopping a bunch of pods but it's not reproducing.

Comment 4 Sebastian Jug 2017-10-27 12:13:32 UTC
Created attachment 1344276 [details]
Tar of 5 run heap dumps & metrics

We can see that there seem to be progressively more IO wait goroutines in the heap coming from potentially the cmux vendor package?

Additionally the duration of the goroutines progressively increases.

In the metrics we see all memory related metrics increasing run after run, the open file descriptors seem to be heavily tied to the goroutine count.

Comment 5 Antonio Murdaca 2017-10-27 12:18:49 UTC
Is this version of openshift using cadvisor? Cause that could explain the cmux overhead.

As for open file number, I opened a PR to fix the systemd unit as well.

Comment 6 Antonio Murdaca 2017-10-27 12:21:22 UTC
I'll look at the attached files in the following hours.

Comment 7 Sebastian Jug 2017-10-27 12:25:10 UTC
Yes, we are using cadvisor CRI-O compatible version of OpenShift.

Comment 8 Antonio Murdaca 2017-10-27 12:37:15 UTC
Alright, I think cmux isn't really performing well then, we might need a patch. I'll debug this further, do you have access to a cluster that exhibits this issue now?

Comment 9 Sebastian Jug 2017-10-27 13:20:21 UTC
Yes, I'll reach out to your privately and give you the details.

Comment 10 Antonio Murdaca 2017-10-27 22:01:45 UTC
Reproduced on my host with kubernetes. We're leaking goroutines in our multiplexed grpc/http server. I already have a fix I'm testing.

Comment 11 Antonio Murdaca 2017-10-27 22:14:40 UTC
The fix is here https://github.com/kubernetes-incubator/cri-o/pull/1082

Needs a cadvisor patch as well to change where to find the info endpoint:

diff --git a/container/crio/client.go b/container/crio/client.go
index b588552..9dcafaa 100644
--- a/container/crio/client.go
+++ b/container/crio/client.go
@@ -24,7 +24,7 @@ import (
 )

 const (
-       CrioSocket            = "/var/run/crio.sock"
+       CrioSocket            = "/var/run/crio/info.sock"
        maxUnixSocketPathSize = len(syscall.RawSockaddrUnix{}.Path)
 )

Comment 13 Antonio Murdaca 2017-10-30 16:48:08 UTC
This is the PR that should actually fixing this issue: https://github.com/kubernetes-incubator/cri-o/pull/1090

Sebastian is testing it out.

Comment 14 Sebastian Jug 2017-10-30 18:23:27 UTC
Created attachment 1345564 [details]
Graph of CRI-O memory usage over several runs (after fix)

Looks like this issue has been resolved. (see graph for verification)
Thanks!

Comment 16 Antonio Murdaca 2017-11-03 17:45:17 UTC
cri-o v1.0.2 has been released with the fix for this BZ

Comment 18 DeShuai Ma 2018-01-04 08:55:53 UTC
@Sebastian Jug help verify the bug, thanks

Comment 19 DeShuai Ma 2018-02-07 06:10:57 UTC
From email communication result, Sebastian Jug is working on reproducing and verify the bug.

Comment 20 Walid A. 2018-02-14 17:32:49 UTC
Verified on a 4 nodes OCP 3.9 cluster m4.xlarge instance types in AWS EC2, 1 master/etcd, 2 compute nodes, 1 infra node.

Executed 4 backto-back Node Vertical tests deploying 500 pause-pods on each compute, and monitoring cri-o process memory usage with pidstat from pbench.  cri-o RSS memory usage started around 74MB and did not exceed 130-150 MB after 500 pods were deployed on the the compute node.  When the pausepods were deleted, cri-o process memory returned back to the 74MB range

# runc --version
runc version spec: 1.0.0

# oc describe node | grep Runtime
 Container Runtime Version:  cri-o://1.9.2

# openshift version
openshift v3.9.0-0.41.0
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.8

pidstat memory usage graph attached.

pbench data in next private comment.

Comment 22 Walid A. 2018-02-14 17:41:17 UTC
Created attachment 1396074 [details]
cri-o pidstat memory usage on 4th run deploying 500 pausepods

Comment 26 errata-xmlrpc 2018-03-28 14:08:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.