Description of problem: We have a build defined in ci process. On 4.2 OCP cluster, it consumes much more MEM than it used to be on 311 cluster. Version-Release number of selected component (if applicable): $ oc get clusterversion --kubeconfig ./build01.config NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-10-11-230724 True False 61d Cluster version is 4.2.0-0.nightly-2019-10-11-230724 How reproducible: Always. Steps to Reproduce: 1. CI rehearsal on master build on OCP 4.2 cluster. See an example https://github.com/openshift/release/pull/6331 2. Tuning the MEM limit for artifacts job: starting from 1G (inherited from 311 cluster) to 16G 3. Actual results: Still failing even with 16G as MEM limit. Expected results: Additional info: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_release/6331/rehearse-6331-pull-ci-openshift-origin-master-images/16 Will upload more logs later.
Some input during the turning the limits: 1G ... $ oc --kubeconfig build01.config get pod -n ci-op-hkw72spt artifacts-build NAME READY STATUS RESTARTS AGE artifacts-build 0/1 Init:Error 0 58m 4G ... $ oc --kubeconfig build01.config -n ci-op-hxnb16tj get pod NAME READY STATUS RESTARTS AGE artifacts-build 0/1 Init:OOMKilled 0 73s 8G ... $ oc --kubeconfig build01.config -n ci-op-nbir7n6t get pod NAME READY STATUS RESTARTS AGE artifacts-build 0/1 Error 0 8m22s
the most interesting failure mode to me is this one which happens at 8gig+ limits: error: build error: Image used by e53c8abb5114ef4ad8fd485b...3f144c0c9529f5a70f0dcbc7c94: image is in use by a container We should get the buildah team's opinion on what could cause that. We will probably also need to engage them to profile the memory usage since it's almost certainly all occurring within the buildah execution (to understand why the builds are consuming upwards of 8gigs of memory or more in 4.x). Guessing we have some sort of runaway recursive activity or bad file processing.
full buildah output for the "image in use" error: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/6331/rehearse-6331-pull-ci-openshift-origin-master-images/13/build-log.txt
Moving to active development branch (4.4) for investigation.
If this happens in 4.2 but not 4.3, then I think that points to needing https://github.com/containers/storage/pull/435, which was first included in storage v1.13.4. The 4.2 branch currently uses v1.13.2, which predates that fix. The 4.3 branch uses a version between v1.15.2 and v1.15.3 which includes that fix.
After applying the workaround (foo=bar) 2G $ oc --kubeconfig ~/.kube/build01.config -n ci-op-571cm788 get pod artifacts-build NAME READY STATUS RESTARTS AGE artifacts-build 0/1 Init:OOMKilled 0 113s 4G Working. It seems that the MEM on 4.2 is still worse than 3.11. See the prometheus screenshot. https://user-images.githubusercontent.com/4013349/71142154-8bcdc580-21e4-11ea-8f69-12aa237a7cc4.png But the workaround definitely unblock me.
@Nalin assigning this to you. Based on the state from last month: 1. Adding an arbitrary environment variable to the BuildConfig's build strategy eliminated "image in use" errors and reduced the memory consumed. Is this a separate bug we need to investigate? 2. This may have been fixed in 4.3 - you suggested we need to upgrade the version of containers/storage to 1.13.4. Is this correct?
oc get clusterversions.config.openshift.io --context build01 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.0 True False 4d7h Cluster version is 4.3.0 https://user-images.githubusercontent.com/4013349/73690596-b42f5780-469e-11ea-8505-ebdb72113370.png https://user-images.githubusercontent.com/4013349/73690613-c0b3b000-469e-11ea-9603-2fe874f0ad8a.png With this change, it works well now: https://github.com/openshift/release/commit/e3a07db74c9a0bd03302e3eaaa4d87ad1ed552ca Note that we still need to change the values in `request` comparing to 311 cluster. But the numbers are much better than 4.2.0
Comparison on 3.11: https://prometheus-k8s-openshift-monitoring.svc.ci.openshift.org/graph?g0.range_input=12h&g0.expr=sum(max_over_time(container_memory_rss%7Bpod_name%3D%22artifacts-build%22%7D%5B3h%5D))%20by%20(namespace%2C%20pod)&g0.tab=0 Usual usage is at ~80M On 4.3: https://prometheus-k8s-openshift-monitoring.apps.build01.ci.devcluster.openshift.com/graph?g0.range_input=6h&g0.expr=sum(max_over_time(container_memory_rss%7Bpod%3D%22artifacts-build%22%7D%5B3h%5D))%20by%20(namespace%2C%20pod)&g0.tab=0 Usual usage is ~500M This is a critical blocker for using OpenShift 4.x for the CI build farm. Could we get the priority of this to reflect that, please?
Created attachment 1657633 [details] 3.11 memory usage
Created attachment 1657634 [details] 4.3 memory usage
Also, to be clear, we are seeing this WITH the env-var workaround: https://github.com/openshift/ci-tools/pull/337/files
Split out the env-var workaround into https://bugzilla.redhat.com/show_bug.cgi?id=1798192 - this shouldn't be necessary.
Thanks @Steve for providing the memory graphs - these provided a key insight into how builds work in OCP 4 vs. 3.11. In OCP 3.x builds run in a Docker container that is completely unmanaged by kubernetes. Build pods mount the Docker socket from the host and essentially run a "remote Docker" build. As a result these workloads are effectively invisible to the cluster - I don't think we had any means of reporting "total build resource consumption" in real time. The observed memory usage represents the build container sending commands to the Docker socket, and awaiting responses. I find it hard to believe real OCP image builds would consume less than 80Mb of memory. In OCP 4.x, builds run completely within the build container via buildah. The resources consumed represent the total consumed by buildah and the "sub-containers" that are created during the build process. Building OCP 4 projects could easily consume 400+Mb, and the metrics reported in kubernetes are a better representation of what is consumed. TL;DR build containers in OCP 4.x will report more memory and CPU usage - by 1-2 orders of magnitude depending on the workload. However, this will represent the true usage of the build workloads.
I think everything adam said is accurate but i do want to note that even on 3.x we ran the docker build container under the same cgroup as the build-pod, so the build was constrained by whatever resource limits were applied to the pod (and thus would have run into OOM kills if it was exceeding whatever limit was configured on the build in 3.x). So if 3.x was able to run with a 100meg limit(were they? i haven't seen what the build resource limit was configured to in 3.x), then indeed the builds were running within that memory limit. The only thing that wouldn't be accounted for in 3.x is the memory usage of the docker daemon itself (the logic needed to manage the container and commit the image/push the image).
Actually I take it back. Since the pod cgroup was used in 3.x, the 3.x prometheus data should reflect the full usage just like 4.x (though the query required may not be the query steve used). So I think further investigation of docker build vs buildah memory usage is warranted here.
Hi Ben, This is the current limit/resource setup of openshift/origin/master for 3.11: https://github.com/openshift/release/blob/ce3af828b5b6c1057accd6aafa8944fb5962065a/ci-operator/config/openshift/origin/openshift-origin-master.yaml#L111-L116 We hit the problem when we tried to run the same build with the same limit/resource and then started to tune their numbers to make the build work.
> In OCP 3.x builds run in a Docker container that is completely unmanaged by kubernetes. Build pods mount the Docker socket from the host and essentially run a "remote Docker" build. As a result these workloads are effectively invisible to the cluster - I don't think we had any means of reporting "total build resource consumption" in real time. The observed memory usage represents the build container sending commands to the Docker socket, and awaiting responses. I find it hard to believe real OCP image builds would consume less than 80Mb of memory. This is incorrect, 3.11 clusters correctly attributed the child build to the pod cgroup. However the naive prometheus queries won't account that under the container, you have to check the pod. You have to look at memory where {container_name=""}, not sum({container_name!=""})
Here is the correct query for 3.11 to tell you max memory usage: https://prometheus-k8s-openshift-monitoring.svc.ci.openshift.org/graph?g0.range_input=1h&g0.expr=container_memory_max_usage_bytes%7Bcontainer%3D%22%22%2Cpod_name%3D~%22.*-build%22%7D&g0.tab=0 Console build uses 11g
Looks like there's a 2-3gb spike at the end of 4.3 build (probably during push) that is extremely suspicious. Just looking at the spikes I would say it looks like buildah reading the entire image into memory or something during a push, which would obviously not be good. That is likely to trigger an OOM right at the end, and should not be happening.
@Hongkai can you please attach the build YAML that is generated for the `artifacts-build`? We need a clear apples/apples comparison to determine if memory consumption is in fact increasing with buildah, and if so where in the build process is this happening.
Created attachment 1658048 [details] 4.3 prometheus screen shot
Adam, I will bring the yaml tomorrow.
Buildah improvements landed in https://github.com/openshift/builder/pull/126 - we need to verify if these move us closer to 3.11 parity with respect to memory consumption. Running a build of the openshift-tests image in origin is a good test benchmark test case.
Clayton mentioned that the DPTP team saw a major regression in memory usage from 4.2->4.3 (they upgraded their cluster from 4.2 to 4.3 and it started using more memory). So while this bug was originally filed against 4.2, it sounds like there may be additional issues in 4.3. Secondly, when we resolve this, please ensure that an e2e test is added that tests our memory usage during builds so we don't regress this in the future (a simple test would be to create a build with a memory limit that's just enough to allow the build to succeed.. if that test ever starts failing, we know we've increased our memory usage unexpectedly).
Created attachment 1662268 [details] samples-operator-builds Interesting results using the Samples Operator as a test build repo [1]. When run without resource limits, the build pod consumed ~5.5Gi of RAM. With 4Gi resource limits (identical to what is in openshift/release [2]), the memory limit is respected and build duration was nearly identical. [1] https://github.com/openshift/cluster-samples-operator [2] https://github.com/openshift/release/blob/590adedec8ea113740918590a6c295fb3a0c5a8c/ci-operator/config/openshift/cluster-samples-operator/openshift-cluster-samples-operator-master.yaml
Reassigning Nalin as he has been looking at this and the move to containers auto-assigned Mrunal.
so we think at least some fixes have been delivered (in 4.3?) for this issue, right?
I'm going to add Naline as a needinfo for Ben's query: https://bugzilla.redhat.com/show_bug.cgi?id=1784163#c36. I'm not sure Mrunal will have the answer.
(In reply to Ben Parees from comment #36) > so we think at least some fixes have been delivered (in 4.3?) for this > issue, right? Compared to 4.2, 4.3 includes https://github.com/containers/storage/pull/435, which definitely affected memory usage, so yes.
so can we close this out? not clear if there are still issues being seen in 4.3+
If it's enough to fix the out-of-memory part in 4.3 and later, I think we can close this. If we need to backport the change to 4.2, then we can attach this one to https://github.com/openshift/builder/pull/125.
i don't think it's worth backporting to 4.2 at this point
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days