Bug 1706625

Summary: etcd-quorum-guard reporting extremely high memory usage
Product: OpenShift Container Platform Reporter: Samuel Padgett <spadgett>
Component: EtcdAssignee: Robert Krawitz <rkrawitz>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: ccoleman, gblomqui, rkrawitz, sjenning
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:48:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Prometheus 3 day view of one of the pods
none
container_memory_rss
none
container_memory_working_set_bytes none

Description Samuel Padgett 2019-05-05 18:47:35 UTC
The etcd-quorum-guard pods are all reporting ~10Gi memory usage. The console is showing this using the query:

pod_name:container_memory_usage_bytes:sum{pod_name='etcd-quorum-guard-7b55ddf465-bsr42',namespace='openshift-machine-config-operator'}

`oc adm top` agrees:

❯ oc adm top pod etcd-quorum-guard-7b55ddf465-stgns -n openshift-machine-config-operator
NAME                                 CPU(cores)   MEMORY(bytes)
etcd-quorum-guard-7b55ddf465-stgns   5m           10166Mi

Version 4.1.0-0.ci-2019-05-02-194100

Comment 1 Samuel Padgett 2019-05-05 18:48:30 UTC
Created attachment 1564066 [details]
Prometheus 3 day view of one of the pods

Comment 2 Samuel Padgett 2019-05-05 18:49:21 UTC
Created attachment 1564067 [details]
container_memory_rss

Comment 3 Samuel Padgett 2019-05-05 18:49:48 UTC
Created attachment 1564068 [details]
container_memory_working_set_bytes

Comment 6 Samuel Padgett 2019-05-05 21:09:52 UTC
`ps aux` and `free` from inside the container:

sh-4.2# ps aux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          1  0.0  0.0   4372   688 ?        Ss   May02   0:00 /bin/sleep infinity
root      16144  0.0  0.0  11828  2904 pts/0    Ss   20:59   0:00 sh
root      16446  0.0  0.0  51752  3492 pts/0    R+   21:04   0:00 ps aux

sh-4.2# free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        3.1G        171M        8.8M         12G         11G
Swap:            0B          0B          0B

Comment 9 Greg Blomquist 2019-05-06 15:39:01 UTC
*** Bug 1706635 has been marked as a duplicate of this bug. ***

Comment 15 ge liu 2019-05-08 04:29:31 UTC
Checked latest payload(4.1.0-0.nightly-2019-05-08-012425), the pr have not pushed in.

Comment 16 ge liu 2019-05-10 03:08:07 UTC
Recreated and Verified with Beta5 final build: 4.1.0-rc.2, the memory cost is 3M only.
# oc adm top pods etcd-quorum-guard-9cdb6f6c4-l822f -n openshift-machine-config-operator
NAME                                CPU(cores)   MEMORY(bytes)   
etcd-quorum-guard-9cdb6f6c4-l822f   6m           3Mi

Comment 18 errata-xmlrpc 2019-06-04 10:48:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758