1757314 – [OCPv3.11] kibana-proxy container OOM killed

Bug 1757314 - [OCPv3.11] kibana-proxy container OOM killed

Summary: [OCPv3.11] kibana-proxy container OOM killed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	apiserver-auth
Sub Component:
Version:	3.11.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Stefan Schimanski
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:	1759169
Blocks:
TreeView+	depends on / blocked

Reported:	2019-10-01 08:06 UTC by Angelo Gabrieli
Modified:	2023-09-07 20:42 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1759169 (view as bug list)
Environment:
Last Closed:	2020-02-19 19:53:43 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
grafana-proxy mem usage (26.63 KB, image/png) 2019-10-07 12:18 UTC, Standa Laznicka	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:0402	0	None	None	None	2020-02-19 19:53:57 UTC

Description Angelo Gabrieli 2019-10-01 08:06:08 UTC

Description of problem:
kibana-proxy container from kibana pods were killed for OOM by just a simple HTTP GET to kibana route or to kibana svc:

Sep 30 11:38:17 node-3 kernel: oauth-proxy invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=968
Sep 30 11:38:17 node-3 kernel: oauth-proxy cpuset=docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope mems_allowed=0
Sep 30 11:38:17 node-3 kernel: CPU: 3 PID: 43442 Comm: oauth-proxy Kdump: loaded Tainted: G        W      ------------ T 3.10.0-957.el7.x86_64 #1
Sep 30 11:38:17 node-3 kernel: Hardware name: Red Hat OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014
Sep 30 11:38:17 node-3 kernel: Call Trace:
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d61dc1>] dump_stack+0x19/0x1b
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d5c7ea>] dump_header+0x90/0x229
Sep 30 11:38:17 node-3 kernel: [<ffffffff997b9dc6>] ? find_lock_task_mm+0x56/0xc0
Sep 30 11:38:17 node-3 kernel: [<ffffffff997ba274>] oom_kill_process+0x254/0x3d0
Sep 30 11:38:17 node-3 kernel: [<ffffffff99900a2c>] ? selinux_capable+0x1c/0x40
Sep 30 11:38:17 node-3 kernel: [<ffffffff99834f16>] mem_cgroup_oom_synchronize+0x546/0x570
Sep 30 11:38:17 node-3 kernel: [<ffffffff99834390>] ? mem_cgroup_charge_common+0xc0/0xc0
Sep 30 11:38:17 node-3 kernel: [<ffffffff997bab04>] pagefault_out_of_memory+0x14/0x90
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d5acf2>] mm_fault_error+0x6a/0x157
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6f7a8>] __do_page_fault+0x3c8/0x500
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6f9c6>] trace_do_page_fault+0x56/0x150
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6ef42>] do_async_page_fault+0x22/0xf0
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6b788>] async_page_fault+0x28/0x30
Sep 30 11:38:17 node-3 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/
docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope killed as a result of limit of /kubepods.slice/kubepods-burstable.s
lice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.sco
pe
Sep 30 11:38:17 node-3 kernel: memory: usage 524288kB, limit 524288kB, failcnt 1834
Sep 30 11:38:17 node-3 kernel: memory+swap: usage 524288kB, limit 524288kB, failcnt 0
Sep 30 11:38:17 node-3 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Sep 30 11:38:17 node-3 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope: cache:1584KB rss:522704KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:522704KB inactive_file:844KB active_file:740KB unevictable:0KB
Sep 30 11:38:17 node-3 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Sep 30 11:38:17 node-3 kernel: [43367] 1000080000 43367   246541   132083     299        0           968 oauth-proxy
Sep 30 11:38:17 node-3 kernel: Memory cgroup out of memory: Kill process 13784 (oauth-proxy) score 1977 or sacrifice child
Sep 30 11:38:17 node-3 kernel: Killed process 43367 (oauth-proxy) total-vm:986164kB, anon-rss:521632kB, file-rss:6700kB, shmem-rss:0kB
Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.444147     691 kubelet.go:1865] SyncLoop (PLEG): "logging-kibana-6-bx8qz_openshift-logging(dc5d931a-dfa3-11e9-aa40-fa163ea64b17)", event: &pleg.PodLifecycleEvent{ID:"dc5d931a-dfa3-11e9-aa40-fa163ea64b17", Type:"ContainerDied", Data:"a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c"}
Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.853633     691 kuberuntime_manager.go:513] Container {Name:kibana-proxy Image:registry.redhat.io/openshift3/oauth-proxy:v3.11.117 Command:[] Args:[--upstream-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --https-address=:3000 -provider=openshift -client-id=kibana-proxy -client-secret-file=/secret/oauth-secret -cookie-secret-file=/secret/session-secret -upstream=http://localhost:5601 -scope=user:info user:check-access user:list-projects --tls-cert=/secret/server-cert --tls-key=/secret/server-key -pass-access-token -skip-provider-button] WorkingDir: Ports:[{Name:oaproxy HostPort:0 ContainerPort:3000 Protocol:TCP HostIP:}] EnvFrom:[] Env:[{Name:OAP_DEBUG Value:true ValueFrom:nil} {Name:OCP_AUTH_PROXY_MEMORY_LIMIT Value: ValueFrom:&EnvVarSource{FieldRef:nil,ResourceFieldRef:&ResourceFieldSelector{ContainerName:kibana-proxy,Resource:limits.memory,Divisor:0,},ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[memory:{i:{value:536870912 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:268435456 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]} VolumeMounts:[{Name:kibana-proxy ReadOnly:true MountPath:/secret SubPath: MountPropagation:<nil>} {Name:aggregated-logging-kibana-token-pc9zf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[],Drop:[KILL MKNOD SETGID SETUID],},Privileged:nil,SELinuxOptions:nil,RunAsUser:*1000080000,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.854051     691 kuberuntime_manager.go:757] checking backoff for container "kibana-proxy" in pod "logging-kibana-6-bx8qz_openshift-logging(dc5d931a-dfa3-11e9-aa40-fa163ea64b17)"




Version-Release number of selected component (if applicable):
OCPv3.11.117 ; kibana oauth proxy image: registry.redhat.io/openshift3/oauth-proxy:v3.11.117


How reproducible:
always: while true; do curl <kibana route or kibana svc>; sleep 1; done


Steps to Reproduce:
1. while true; do curl <kibana route or kibana svc>; sleep 1; done
2.
3.


Actual results:
kibana-proxy container were killed for OOM after few minutes; increasing the requests/limits it just takes more time for the RAM to be exhausted


Expected results:
kibana-proxy not OOM killed and RAM freed

Additional info:

Comment 2 Jeff Cantrill 2019-10-02 19:47:04 UTC

Mo,

Anyone from your team that can comment?  We have not seen memory issues in the past with this component

Comment 3 Mo 2019-10-02 20:04:55 UTC

Standa can look into it.

We rebuild the TLS cert pool on every login now and that can easily lead to high memory usage during a burst of logins.  Otherwise I am not aware of anything that will cause an OOM.

Comment 4 Standa Laznicka 2019-10-07 12:18:59 UTC

Created attachment 1623124 [details]
grafana-proxy mem usage

Comment 11 Standa Laznicka 2020-01-08 14:50:55 UTC

https://github.com/openshift/oauth-proxy/pull/135 merged today, including the fixes needed

Comment 13 Anping Li 2020-01-20 11:35:36 UTC

Verified using oauth-proxy:v3.11.165

Comment 15 errata-xmlrpc 2020-02-19 19:53:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0402

Note You need to log in before you can comment on or make changes to this bug.