Description of problem: kibana-proxy container from kibana pods were killed for OOM by just a simple HTTP GET to kibana route or to kibana svc: Sep 30 11:38:17 node-3 kernel: oauth-proxy invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=968 Sep 30 11:38:17 node-3 kernel: oauth-proxy cpuset=docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope mems_allowed=0 Sep 30 11:38:17 node-3 kernel: CPU: 3 PID: 43442 Comm: oauth-proxy Kdump: loaded Tainted: G W ------------ T 3.10.0-957.el7.x86_64 #1 Sep 30 11:38:17 node-3 kernel: Hardware name: Red Hat OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014 Sep 30 11:38:17 node-3 kernel: Call Trace: Sep 30 11:38:17 node-3 kernel: [<ffffffff99d61dc1>] dump_stack+0x19/0x1b Sep 30 11:38:17 node-3 kernel: [<ffffffff99d5c7ea>] dump_header+0x90/0x229 Sep 30 11:38:17 node-3 kernel: [<ffffffff997b9dc6>] ? find_lock_task_mm+0x56/0xc0 Sep 30 11:38:17 node-3 kernel: [<ffffffff997ba274>] oom_kill_process+0x254/0x3d0 Sep 30 11:38:17 node-3 kernel: [<ffffffff99900a2c>] ? selinux_capable+0x1c/0x40 Sep 30 11:38:17 node-3 kernel: [<ffffffff99834f16>] mem_cgroup_oom_synchronize+0x546/0x570 Sep 30 11:38:17 node-3 kernel: [<ffffffff99834390>] ? mem_cgroup_charge_common+0xc0/0xc0 Sep 30 11:38:17 node-3 kernel: [<ffffffff997bab04>] pagefault_out_of_memory+0x14/0x90 Sep 30 11:38:17 node-3 kernel: [<ffffffff99d5acf2>] mm_fault_error+0x6a/0x157 Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6f7a8>] __do_page_fault+0x3c8/0x500 Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6f9c6>] trace_do_page_fault+0x56/0x150 Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6ef42>] do_async_page_fault+0x22/0xf0 Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6b788>] async_page_fault+0x28/0x30 Sep 30 11:38:17 node-3 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/ docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope killed as a result of limit of /kubepods.slice/kubepods-burstable.s lice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.sco pe Sep 30 11:38:17 node-3 kernel: memory: usage 524288kB, limit 524288kB, failcnt 1834 Sep 30 11:38:17 node-3 kernel: memory+swap: usage 524288kB, limit 524288kB, failcnt 0 Sep 30 11:38:17 node-3 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Sep 30 11:38:17 node-3 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope: cache:1584KB rss:522704KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:522704KB inactive_file:844KB active_file:740KB unevictable:0KB Sep 30 11:38:17 node-3 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Sep 30 11:38:17 node-3 kernel: [43367] 1000080000 43367 246541 132083 299 0 968 oauth-proxy Sep 30 11:38:17 node-3 kernel: Memory cgroup out of memory: Kill process 13784 (oauth-proxy) score 1977 or sacrifice child Sep 30 11:38:17 node-3 kernel: Killed process 43367 (oauth-proxy) total-vm:986164kB, anon-rss:521632kB, file-rss:6700kB, shmem-rss:0kB Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.444147 691 kubelet.go:1865] SyncLoop (PLEG): "logging-kibana-6-bx8qz_openshift-logging(dc5d931a-dfa3-11e9-aa40-fa163ea64b17)", event: &pleg.PodLifecycleEvent{ID:"dc5d931a-dfa3-11e9-aa40-fa163ea64b17", Type:"ContainerDied", Data:"a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c"} Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.853633 691 kuberuntime_manager.go:513] Container {Name:kibana-proxy Image:registry.redhat.io/openshift3/oauth-proxy:v3.11.117 Command:[] Args:[--upstream-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --https-address=:3000 -provider=openshift -client-id=kibana-proxy -client-secret-file=/secret/oauth-secret -cookie-secret-file=/secret/session-secret -upstream=http://localhost:5601 -scope=user:info user:check-access user:list-projects --tls-cert=/secret/server-cert --tls-key=/secret/server-key -pass-access-token -skip-provider-button] WorkingDir: Ports:[{Name:oaproxy HostPort:0 ContainerPort:3000 Protocol:TCP HostIP:}] EnvFrom:[] Env:[{Name:OAP_DEBUG Value:true ValueFrom:nil} {Name:OCP_AUTH_PROXY_MEMORY_LIMIT Value: ValueFrom:&EnvVarSource{FieldRef:nil,ResourceFieldRef:&ResourceFieldSelector{ContainerName:kibana-proxy,Resource:limits.memory,Divisor:0,},ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[memory:{i:{value:536870912 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:268435456 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]} VolumeMounts:[{Name:kibana-proxy ReadOnly:true MountPath:/secret SubPath: MountPropagation:<nil>} {Name:aggregated-logging-kibana-token-pc9zf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[],Drop:[KILL MKNOD SETGID SETUID],},Privileged:nil,SELinuxOptions:nil,RunAsUser:*1000080000,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it. Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.854051 691 kuberuntime_manager.go:757] checking backoff for container "kibana-proxy" in pod "logging-kibana-6-bx8qz_openshift-logging(dc5d931a-dfa3-11e9-aa40-fa163ea64b17)" Version-Release number of selected component (if applicable): OCPv3.11.117 ; kibana oauth proxy image: registry.redhat.io/openshift3/oauth-proxy:v3.11.117 How reproducible: always: while true; do curl <kibana route or kibana svc>; sleep 1; done Steps to Reproduce: 1. while true; do curl <kibana route or kibana svc>; sleep 1; done 2. 3. Actual results: kibana-proxy container were killed for OOM after few minutes; increasing the requests/limits it just takes more time for the RAM to be exhausted Expected results: kibana-proxy not OOM killed and RAM freed Additional info:
Mo, Anyone from your team that can comment? We have not seen memory issues in the past with this component
Standa can look into it. We rebuild the TLS cert pool on every login now and that can easily lead to high memory usage during a burst of logins. Otherwise I am not aware of anything that will cause an OOM.
Created attachment 1623124 [details] grafana-proxy mem usage
https://github.com/openshift/oauth-proxy/pull/135 merged today, including the fixes needed
Verified using oauth-proxy:v3.11.165
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0402