Bug 1762750

Summary: [4.1] oauth-proxy container OOM killed
Product: OpenShift Container Platform Reporter: Standa Laznicka <slaznick>
Component: apiserver-authAssignee: Stefan Schimanski <sttts>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, jcantril, mfojtik, rmeggins, sttts
Target Milestone: ---   
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1762748 Environment:
Last Closed: 2020-01-09 09:16:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1762748    
Bug Blocks: 1749765    

Description Standa Laznicka 2019-10-17 11:47:32 UTC
This bug was initially created as a copy of Bug #1762748


Description of problem:
kibana-proxy container from kibana pods were killed for OOM by just a simple HTTP GET to kibana route or to kibana svc:

Sep 30 11:38:17 node-3 kernel: oauth-proxy invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=968
Sep 30 11:38:17 node-3 kernel: oauth-proxy cpuset=docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope mems_allowed=0
Sep 30 11:38:17 node-3 kernel: CPU: 3 PID: 43442 Comm: oauth-proxy Kdump: loaded Tainted: G        W      ------------ T 3.10.0-957.el7.x86_64 #1
Sep 30 11:38:17 node-3 kernel: Hardware name: Red Hat OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014
Sep 30 11:38:17 node-3 kernel: Call Trace:
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d61dc1>] dump_stack+0x19/0x1b
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d5c7ea>] dump_header+0x90/0x229
Sep 30 11:38:17 node-3 kernel: [<ffffffff997b9dc6>] ? find_lock_task_mm+0x56/0xc0
Sep 30 11:38:17 node-3 kernel: [<ffffffff997ba274>] oom_kill_process+0x254/0x3d0
Sep 30 11:38:17 node-3 kernel: [<ffffffff99900a2c>] ? selinux_capable+0x1c/0x40
Sep 30 11:38:17 node-3 kernel: [<ffffffff99834f16>] mem_cgroup_oom_synchronize+0x546/0x570
Sep 30 11:38:17 node-3 kernel: [<ffffffff99834390>] ? mem_cgroup_charge_common+0xc0/0xc0
Sep 30 11:38:17 node-3 kernel: [<ffffffff997bab04>] pagefault_out_of_memory+0x14/0x90
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d5acf2>] mm_fault_error+0x6a/0x157
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6f7a8>] __do_page_fault+0x3c8/0x500
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6f9c6>] trace_do_page_fault+0x56/0x150
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6ef42>] do_async_page_fault+0x22/0xf0
Sep 30 11:38:17 node-3 kernel: [<ffffffff99d6b788>] async_page_fault+0x28/0x30
Sep 30 11:38:17 node-3 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/
docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope killed as a result of limit of /kubepods.slice/kubepods-burstable.s
lice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.sco
pe
Sep 30 11:38:17 node-3 kernel: memory: usage 524288kB, limit 524288kB, failcnt 1834
Sep 30 11:38:17 node-3 kernel: memory+swap: usage 524288kB, limit 524288kB, failcnt 0
Sep 30 11:38:17 node-3 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Sep 30 11:38:17 node-3 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poddc5d931a_dfa3_11e9_aa40_fa163ea64b17.slice/docker-a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c.scope: cache:1584KB rss:522704KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:522704KB inactive_file:844KB active_file:740KB unevictable:0KB
Sep 30 11:38:17 node-3 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Sep 30 11:38:17 node-3 kernel: [43367] 1000080000 43367   246541   132083     299        0           968 oauth-proxy
Sep 30 11:38:17 node-3 kernel: Memory cgroup out of memory: Kill process 13784 (oauth-proxy) score 1977 or sacrifice child
Sep 30 11:38:17 node-3 kernel: Killed process 43367 (oauth-proxy) total-vm:986164kB, anon-rss:521632kB, file-rss:6700kB, shmem-rss:0kB
Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.444147     691 kubelet.go:1865] SyncLoop (PLEG): "logging-kibana-6-bx8qz_openshift-logging(dc5d931a-dfa3-11e9-aa40-fa163ea64b17)", event: &pleg.PodLifecycleEvent{ID:"dc5d931a-dfa3-11e9-aa40-fa163ea64b17", Type:"ContainerDied", Data:"a65ce7857b306c3eef1d020828f8705adf9ef84c256d011ad0ea9725c881473c"}
Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.853633     691 kuberuntime_manager.go:513] Container {Name:kibana-proxy Image:registry.redhat.io/openshift3/oauth-proxy:v3.11.117 Command:[] Args:[--upstream-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt --https-address=:3000 -provider=openshift -client-id=kibana-proxy -client-secret-file=/secret/oauth-secret -cookie-secret-file=/secret/session-secret -upstream=http://localhost:5601 -scope=user:info user:check-access user:list-projects --tls-cert=/secret/server-cert --tls-key=/secret/server-key -pass-access-token -skip-provider-button] WorkingDir: Ports:[{Name:oaproxy HostPort:0 ContainerPort:3000 Protocol:TCP HostIP:}] EnvFrom:[] Env:[{Name:OAP_DEBUG Value:true ValueFrom:nil} {Name:OCP_AUTH_PROXY_MEMORY_LIMIT Value: ValueFrom:&EnvVarSource{FieldRef:nil,ResourceFieldRef:&ResourceFieldSelector{ContainerName:kibana-proxy,Resource:limits.memory,Divisor:0,},ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[memory:{i:{value:536870912 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:268435456 scale:0} d:{Dec:<nil>} s: Format:BinarySI}]} VolumeMounts:[{Name:kibana-proxy ReadOnly:true MountPath:/secret SubPath: MountPropagation:<nil>} {Name:aggregated-logging-kibana-token-pc9zf ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:&Capabilities{Add:[],Drop:[KILL MKNOD SETGID SETUID],},Privileged:nil,SELinuxOptions:nil,RunAsUser:*1000080000,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Sep 30 11:38:18 node-3 atomic-openshift-node: I0930 11:38:18.854051     691 kuberuntime_manager.go:757] checking backoff for container "kibana-proxy" in pod "logging-kibana-6-bx8qz_openshift-logging(dc5d931a-dfa3-11e9-aa40-fa163ea64b17)"




Version-Release number of selected component (if applicable):
OCPv3.11.117 ; kibana oauth proxy image: registry.redhat.io/openshift3/oauth-proxy:v3.11.117


How reproducible:
always: while true; do curl <kibana route or kibana svc>; sleep 1; done


Steps to Reproduce:
1. while true; do curl <kibana route or kibana svc>; sleep 1; done
2.
3.


Actual results:
kibana-proxy container were killed for OOM after few minutes; increasing the requests/limits it just takes more time for the RAM to be exhausted


Expected results:
kibana-proxy not OOM killed and RAM freed

Additional info:

Comment 1 Standa Laznicka 2019-12-03 11:59:39 UTC
Target release was changed to 3.11.z for some reason, but this will be fixed as a part of https://bugzilla.redhat.com/show_bug.cgi?id=1749765, so I am changing it back to 4.1.z

Comment 3 Anping Li 2019-12-18 10:40:56 UTC
Passed in 4.1.28-201912131227
Wed Dec 18 05:17:41 EST 2019
POD                       NAME           CPU(cores)   MEMORY(bytes)
kibana-7c67899dbc-4zbzn   kibana         6m           97Mi
kibana-7c67899dbc-4zbzn   kibana-proxy   13m          10Mi
Wed Dec 18 05:17:43 EST 2019
POD                       NAME           CPU(cores)   MEMORY(bytes)
kibana-7c67899dbc-4zbzn   kibana-proxy   13m          10Mi
kibana-7c67899dbc-4zbzn   kibana         6m           97Mi
[anli@preserve-docker-slave 74916]$ head top_kibana.log
POD                       NAME           CPU(cores)   MEMORY(bytes)
kibana-7c67899dbc-4zbzn   kibana         9m           91Mi
kibana-7c67899dbc-4zbzn   kibana-proxy   0m           6Mi
Wed Dec 18 04:55:11 EST 2019
POD                       NAME           CPU(cores)   MEMORY(bytes)
kibana-7c67899dbc-4zbzn   kibana         9m           91Mi
kibana-7c67899dbc-4zbzn   kibana-proxy   0m           6Mi
Wed Dec 18 05:39:04 EST 2019
POD                       NAME           CPU(cores)   MEMORY(bytes)   
kibana-7c67899dbc-4zbzn   kibana         7m           99Mi            
kibana-7c67899dbc-4zbzn   kibana-proxy   0m           10Mi

Comment 5 errata-xmlrpc 2020-01-09 09:16:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0010