Bug 1759169
| Summary: | [4.3] oauth-proxy container OOM killed | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Standa Laznicka <slaznick> | |
| Component: | apiserver-auth | Assignee: | Standa Laznicka <slaznick> | |
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 3.11.0 | CC: | anli, aos-bugs, gparente, jcantril, mfojtik, scheng, wsun | |
| Target Milestone: | --- | |||
| Target Release: | 4.3.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause:
oauth-proxy was reloading CA certificates for each request and storing them in memory
Consequence:
high memory consumption caused the oauth-proxy container to be killed
Fix:
cache the CA certificates unless they change
Result:
memory consumption for the oauth-proxy process dropped significantly when multiple requests against it got issued
|
Story Points: | --- | |
| Clone Of: | 1757314 | |||
| : | 1762748 (view as bug list) | Environment: | ||
| Last Closed: | 2020-05-13 21:27:06 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1757314, 1762748 | |||
|
Description
Standa Laznicka
2019-10-07 14:20:22 UTC
The memory increased, could you check it? Fri Nov 8 18:48:23 CST 2019 NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 140m 492Mi NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h Fri Nov 8 18:48:49 CST 2019 NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 140m 540Mi NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h Fri Nov 8 18:49:15 CST 2019 NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 136m 576Mi NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h Fri Nov 8 18:49:24 CST 2019 NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 136m 576Mi NAME READY STATUS RESTARTS AGE NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h Fri Nov 8 18:49:51 CST 2019 NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 137m 628Mi NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h Fri Nov 8 18:50:17 CST 2019 NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 137m 673Mi NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h Fri Nov 8 18:50:45 CST 2019 NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 137m 717Mi NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h Fri Nov 8 18:50:55 CST 2019 NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 137m 717Mi NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 129m 717Mi NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h Fri Nov 8 18:51:21 CST 2019 NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 141m 148Mi NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h Fri Nov 8 18:51:30 CST 2019 NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 141m 148Mi NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h Fri Nov 8 18:51:39 CST 2019 NAME CPU(cores) MEMORY(bytes) kibana-598df8b964-gq5lm 131m 148Mi NAME READY STATUS RESTARTS AGE kibana-598df8b964-gq5lm 2/2 Running 0 10h Could you please answer these questions so that I can understand what happened: The memory increased when? When you ran the reproducer? Is the first report already from when the reproducer ran? Which version of cluster did you use? The bug was about permanent memory increase with each request but there is an obvious memory usage drop in reports from "Fri Nov 8 18:50:55 CST 2019" and "Fri Nov 8 18:51:21 CST 2019", which I assume is when you stopped the reproducer, which would therefore mean that the bug was fixed? Yes, when I run the reproducer. the memory increased. (In reply to Standa Laznicka from comment #3) > Could you please answer these questions so that I can understand what > happened: > > The memory increased when? When you ran the reproducer? Yes, when I run the reproducer. > > Is the first report already from when the reproducer ran? Which version of > cluster did you use? It is v4.3 > > The bug was about permanent memory increase with each request but there is > an obvious memory usage drop in reports from "Fri Nov 8 18:50:55 CST 2019" > and "Fri Nov 8 18:51:21 CST 2019", which I assume is when you stopped the > reproducer, which would therefore mean that the bug was fixed? When the memory usage dropped, the reproducer is still running. there may be some improvement. But it seems that is not enought. as the size 717Mi is near by the memory limit 760Mi. I had observed the pod been restarted Anping, can you also specify the version of oauth-proxy image you tested against? I tested the oauth-proxy which gets distributed with the 4.3 cluster and has the fix. I bombed the container with some 2k request in 10 seconds and I did not see any unreasonable memory consumption increase, if any at all. If you're indeed testing the fixed oauth-proxy version, can you add the --containers flag to `oc adm top` to see that it is actually the proxy which consumes the resources? Verified in v4.3.0-201911080552. The kibana pods wasn't restarted in 4 hours. [43]$ head check.logs Already on project "openshift-logging" on server "https://api.qe-anlia3.qe.gcp.devcluster.openshift.com:6443". Tue Nov 12 10:01:46 EST 2019 NAME READY STATUS RESTARTS AGE kibana-766bf67db5-rcqj6 2/2 Running 0 8m8s POD NAME CPU(cores) MEMORY(bytes) kibana-766bf67db5-rcqj6 kibana 8m 101Mi kibana-766bf67db5-rcqj6 kibana-proxy 0m 11Mi Tue Nov 12 10:01:52 EST 2019 NAME READY STATUS RESTARTS AGE kibana-766bf67db5-rcqj6 2/2 Running 0 8m14s [anli@preserve-docker-slave 43]$ tail check.logs kibana-766bf67db5-rcqj6 2/2 Running 0 16h POD NAME CPU(cores) MEMORY(bytes) kibana-766bf67db5-rcqj6 kibana 10m 95Mi kibana-766bf67db5-rcqj6 kibana-proxy 24m 12Mi Wed Nov 13 02:07:52 EST 2019 NAME READY STATUS RESTARTS AGE kibana-766bf67db5-rcqj6 2/2 Running 0 16h POD NAME CPU(cores) MEMORY(bytes) kibana-766bf67db5-rcqj6 kibana-proxy 24m 12Mi kibana-766bf67db5-rcqj6 kibana 10m 95Mi Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062 |