Bug 1757314 - [OCPv3.11] kibana-proxy container OOM killed
Summary: [OCPv3.11] kibana-proxy container OOM killed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 3.11.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Stefan Schimanski
QA Contact: Anping Li
URL:
Whiteboard:
Depends On: 1759169
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-01 08:06 UTC by Angelo Gabrieli
Modified: 2023-09-07 20:42 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1759169 (view as bug list)
Environment:
Last Closed: 2020-02-19 19:53:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
grafana-proxy mem usage (26.63 KB, image/png)
2019-10-07 12:18 UTC, Standa Laznicka
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0402 0 None None None 2020-02-19 19:53:57 UTC

Description Angelo Gabrieli 2019-10-01 08:06:08 UTC
Description of problem:
kibana-proxy container from kibana pods were killed for OOM by just a simple HTTP GET to kibana route or to kibana svc:

Sep 30 11:38:17 node-3 kernel: oauth-proxy invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=968
Sep 30 11:38:17 node-3 kernel: oauth-proxy cpuset=docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope mems_allowed=0
Sep 30 11:38:17 node-3 kernel: CPU: 3 PID: 43442 Comm: oauth-proxy Kdump: loaded Tainted: G        W      ------------ T 3.10.0-957.el7.x86_64 #1
Sep 30 11:38:17 node-3 kernel: Hardware name: Red Hat OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014
Sep 30 11:38:17 node-3 kernel: Call Trace:
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d61dc1>] dump_stack+0x19/0x1b
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d5c7ea>] dump_header+0x90/0x229
Sep 30 11:38:17 node-3 kernel: [<ffffffff997b9dc6>] ? find_lock_task_mm+0x56/0xc0
Sep 30 11:38:17 node-3 kernel: [<ffffffff997ba274>] oom_kill_process+0x254/0x3d0
Sep 30 11:38:17 node-3 kernel: [<ffffffff99900a2c>] ? selinux_capable+0x1c/0x40
Sep 30 11:38:17 node-3 kernel: [<ffffffff99834f16>] mem_cgroup_oom_synchronize+0x546/0x570
Sep 30 11:38:17 node-3 kernel: [<ffffffff99834390>] ? mem_cgroup_charge_common+0xc0/0xc0
Sep 30 11:38:17 node-3 kernel: [<ffffffff997bab04>] pagefault_out_of_memory+0x14/0x90
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d5acf2>] mm_fault_error+0x6a/0x157
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6f7a8>] __do_page_fault+0x3c8/0x500
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6f9c6>] trace_do_page_fault+0x56/0x150
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6ef42>] do_async_page_fault+0x22/0xf0
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6b788>] async_page_fault+0x28/0x30
Sep 30 11:38:17 node-3 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/
docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope killed as a result of limit of /kubepods.slice/kubepods-burstable.s
lice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.sco
pe
Sep 30 11:38:17 node-3 kernel: memory: usage 524288kB, limit 524288kB, failcnt 1834
Sep 30 11:38:17 node-3 kernel: memory+swap: usage 524288kB, limit 524288kB, failcnt 0
Sep 30 11:38:17 node-3 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Sep 30 11:38:17 node-3 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope: cache:1584KB rss:522704KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:522704KB inactive_file:844KB active_file:740KB unevictable:0KB
Sep 30 11:38:17 node-3 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Sep 30 11:38:17 node-3 kernel: [43367] 1000080000 43367   246541   132083     299        0           968 oauth-proxy
Sep 30 11:38:17 node-3 kernel: Memory cgroup out of memory: Kill process 13784 (oauth-proxy) score 1977 or sacrifice child
Sep 30 11:38:17 node-3 kernel: Killed process 43367 (oauth-proxy) total-vm:986164kB, anon-rss:521632kB, file-rss:6700kB, shmem-rss:0kB
Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.444147     691 kubelet.go:1865] SyncLoop (PLEG): "logging-kibana-6-bx8qz_openshift-logging(dc5d931a-dfa3-11e9-aa40-fa163ea64b17)", event: &pleg.PodLifecycleEvent{ID:"dc5d931a-dfa3-11e9-aa40-fa163ea64b17", Type:"ContainerDied", Data:"a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c"}
Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.853633     691 kuberuntime_manager.go:513] Container {Name:kibana-proxy Image:registry.redhat.io/openshift3/oauth-proxy:v3.11.117 Command:[] Args:[--upstream-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --https-address=:3000 -provider=openshift -client-id=kibana-proxy -client-secret-file=/secret/oauth-secret -cookie-secret-file=/secret/session-secret -upstream=http://localhost:5601 -scope=user:info user:check-access user:list-projects --tls-cert=/secret/server-cert --tls-key=/secret/server-key -pass-access-token -skip-provider-button] WorkingDir: Ports:[{Name:oaproxy HostPort:0 ContainerPort:3000 Protocol:TCP HostIP:}] EnvFrom:[] Env:[{Name:OAP_DEBUG Value:true ValueFrom:nil} {Name:OCP_AUTH_PROXY_MEMORY_LIMIT Value: ValueFrom:&EnvVarSource{FieldRef:nil,ResourceFieldRef:&ResourceFieldSelector{ContainerName:kibana-proxy,Resource:limits.memory,Divisor:0,},ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[memory:{i:{value:536870912 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:268435456 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]} VolumeMounts:[{Name:kibana-proxy ReadOnly:true MountPath:/secret SubPath: MountPropagation:<nil>} {Name:aggregated-logging-kibana-token-pc9zf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[],Drop:[KILL MKNOD SETGID SETUID],},Privileged:nil,SELinuxOptions:nil,RunAsUser:*1000080000,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.854051     691 kuberuntime_manager.go:757] checking backoff for container "kibana-proxy" in pod "logging-kibana-6-bx8qz_openshift-logging(dc5d931a-dfa3-11e9-aa40-fa163ea64b17)"




Version-Release number of selected component (if applicable):
OCPv3.11.117 ; kibana oauth proxy image: registry.redhat.io/openshift3/oauth-proxy:v3.11.117


How reproducible:
always: while true; do curl <kibana route or kibana svc>; sleep 1; done


Steps to Reproduce:
1. while true; do curl <kibana route or kibana svc>; sleep 1; done
2.
3.


Actual results:
kibana-proxy container were killed for OOM after few minutes; increasing the requests/limits it just takes more time for the RAM to be exhausted


Expected results:
kibana-proxy not OOM killed and RAM freed

Additional info:

Comment 2 Jeff Cantrill 2019-10-02 19:47:04 UTC
Mo,

Anyone from your team that can comment?  We have not seen memory issues in the past with this component

Comment 3 Mo 2019-10-02 20:04:55 UTC
Standa can look into it.

We rebuild the TLS cert pool on every login now and that can easily lead to high memory usage during a burst of logins.  Otherwise I am not aware of anything that will cause an OOM.

Comment 4 Standa Laznicka 2019-10-07 12:18:59 UTC
Created attachment 1623124 [details]
grafana-proxy mem usage

Comment 11 Standa Laznicka 2020-01-08 14:50:55 UTC
https://github.com/openshift/oauth-proxy/pull/135 merged today, including the fixes needed

Comment 13 Anping Li 2020-01-20 11:35:36 UTC
Verified using oauth-proxy:v3.11.165

Comment 15 errata-xmlrpc 2020-02-19 19:53:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0402


Note You need to log in before you can comment on or make changes to this bug.