Description of problem: Xmx and Xms are not part of MAVEN_OPTS with build configuration memory limit included. The issue is regression against previous releases. Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2019-11-11-115927 (https://projects.engineering.redhat.com/browse/LPINTEROP-680) How reproducible: Always Steps to Reproduce: $ oc new-app --name=sample-java quay.io/wildfly/wildfly-centos7:18.0~https://github.com/openshiftdemos/os-sample-java-web.git # wait until the build 1 is completed $ oc get pods/sample-java-1-build -w NAME READY STATUS sample-java-1-build 0/1 Completed $ oc logs pod/sample-java-1-build | grep MAVEN_OPTS INFO Using MAVEN_OPTS -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError # update the memory limits for build config $ oc edit buildconfig.build.openshift.io/sample-java < resources: {} --- > resources: > limits: > memory: 1Gi $ oc start-build buildconfig.build.openshift.io/sample-java build.build.openshift.io/sample-java-2 started # wait until the build 2 is completed $ oc get pods/sample-java-2-build -w NAME READY STATUS sample-java-2-build 0/1 Completed Actual results: # heap size setup is missing $ oc logs pod/sample-java-2-build | grep MAVEN_OPTS INFO Using MAVEN_OPTS -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError Expected results: # initial and maximum heap size are part of options for maven $ oc logs pod/sample-java-2-build | grep MAVEN_OPTS INFO Using MAVEN_OPTS -Xms128m -Xmx512m -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError
@Petr can you please provide an example of this working with memory limits in 4.2, or other prior release? To the best of my knowledge we do not have any specific knowledge which translates pod resource limits into MAVEN_OPTS passed into the build.
Hi Adam, just follow the steps to reproduce. "Actual results" is the output from OCP 4.3 run, "Expected results" is the output from OCP 4.2 and prior releases.
I can say we certainty that openshift builds do not translate resource limits to MAVEN_OPTS ... this is a function of the s2i scripts in the wildfly builder image. It is very akin to what we do in with the openshift /jenkins image in setting the JVM heap based on what is in /sys/fs/cgroup/memory/memory.limit_in_bytes (in fact we collaborated with jboss a few years ago to establish such practices). And I in fact see the use of that file in the very complicated looking jboss container-limits file and java-default-options file in the wildfly image. I tried Petr's oc new-app example on 4.3, and then added the memory limit as he articulated. When I look at the build pod, I see the resources set in all the containers i.e. resources: limits: memory: 1Gi requests: memory: 1Gi Note the requests there is not something the build controller adds. I confirmed that via debug. My guess is it is added by either the kubelet or api server. That said, I see the same thing when running this on 4.2. resources: limits: memory: 1Gi requests: memory: 1Gi Though I see the expected -xmx -xms with the 4.2 run even with that requests entry. But at this point, the relevant pieces are either beneath us in k8s or in the wildfly/jboss s2i code. This bug does not belong in openshift build. A version of the wildfly image with debug that explains what they are seeing in /sys/fs/cgroup/memory/memory.limit_in_bytes and why they are producing or not producing heap params is needed. Then use that in the 4.2 and 4.3 runs. Perhaps opening an issue against https://github.com/wildfly/wildfly-s2i will help with that. Based on what is discovered, then either a tweak to wildfly to work on 4.3 is needed, or a lower level bug against the openshift node or coreos components.
Among other things, RHEL coreos got updated between 4.2 to 4.3 ... so there very well may be differences wrt to cgroups that wildfly will need to adjust to if you want the same heap parameter behavior
RHEL coreos got updated to 8.1 we believe another thing to consider is that with those linux level updates, the JVMs auto tuning in container envs took over. AGain, questions for the wildfly folks.
Hi Gabe, thanks for looking into it, I'll file an issue to Wildfly guys for heads up.
Sounds good Petr.
We learned today that some buildah related changes quite possibly has resulted in the change of behavior between 4.2 and 4.3 I've cc:ed Nalin D. from the containers team Nalin - per our slack discussion, please update the JBOSS issue tracker noted here (https://issues.jboss.org/browse/CLOUD-3434) as needed with whatever reactions wildfly (and most likely by extension the downstream EAP images) need wrt their current analysis of /sys/fs/cgroup/memory/memory.limit_in_bytes
Just in case the Jira updates are not being received by the current CC user list; copy&paste from there: I haven't found issues with the current limits calculation implemented by the launch scripts. It looks like there is a problem in the underlying system that prepares the cgroups limits. Maybe RHEL coreos? We can see the problem in the build-config logs if we increase the log level to 5 and execute a build with the resource limits applied and without them. These are the two traces I have captured in OCP 4.3: 1. With resource limits applied: I1127 15:39:05.733161 1 builder.go:329] openshift-builder v4.3.0-201911220712+69cbcb2-dirty I1127 15:39:05.736924 1 builder.go:330] redacted build: {"kind":"Build","apiVersion":"build.openshift.io/v1","metadata":{"name":"sample-java-2","namespace":"wildfly-s2i-demo","selfLink":"/apis/build.openshift.io/v1/namespaces/wildfly-s2i-demo/builds/sample-java-2","uid":"7b76d305-231e-44a1-9e9d-7356ea5502f3","resourceVersion":"99681","creationTimestamp":"2019-11-27T15:38:45Z","labels":{"app":"sample-java","buildconfig":"sample-java","openshift.io/build-config.name":"sample-java","openshift.io/build.start-policy":"Serial"},"annotations":{"openshift.io/build-config.name":"sample-java","openshift.io/build.number":"2"},"ownerReferences":[{"apiVersion":"build.openshift.io/v1","kind":"BuildConfig","name":"sample-java","uid":"e8772513-2a6e-4033-a366-8f946b84d204","controller":true}]},"spec":{"serviceAccount":"builder","source":{"type":"Git","git":{"uri":"https://github.com/openshiftdemos/os-sample-java-web.git"}},"strategy":{"type":"Source","sourceStrategy":{"from":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/wildfly-centos7-dev@sha256:942c35a3c4dff09fb4a027b07512bb4c5ec13e4317a946a3670001ac8d18a42a"},"pullSecret":{"name":"builder-dockercfg-5kqx9"},"env":[{"name":"SCRIPT_DEBUG","value":"true"},{"name":"BUILD_LOGLEVEL","value":"5"}]}},"output":{"to":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/sample-java:latest"},"pushSecret":{"name":"builder-dockercfg-5kqx9"}},"resources":{"limits":{"memory":"1Gi"},"requests":{"memory":"1Gi"}},"postCommit":{},"nodeSelector":null,"triggeredBy":[{"message":"Manually triggered"}]},"status":{"phase":"New","outputDockerImageReference":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/sample-java:latest","config":{"kind":"BuildConfig","namespace":"wildfly-s2i-demo","name":"sample-java"},"output":{}}} Caching blobs under "/var/cache/blobs". I1127 15:39:05.948209 1 util_linux.go:56] found cgroup parent kubepods-burstable-pod934e5159_6c54_4da9_8b4d_f14101c64446.slice I1127 15:39:05.948246 1 builder.go:337] Running build with cgroup limits: api.CGroupLimits{MemoryLimitBytes:92233720368547, CPUShares:0, CPUPeriod:0, CPUQuota:0, MemorySwap:92233720368547, Parent:"kubepods-burstable-pod934e5159_6c54_4da9_8b4d_f14101c64446.slice"} 2. Without resource limits: I1127 16:06:42.370930 1 builder.go:329] openshift-builder v4.3.0-201911220712+69cbcb2-dirty I1127 16:06:42.379421 1 builder.go:330] redacted build: {"kind":"Build","apiVersion":"build.openshift.io/v1","metadata":{"name":"sample-java-3","namespace":"wildfly-s2i-demo","selfLink":"/apis/build.openshift.io/v1/namespaces/wildfly-s2i-demo/builds/sample-java-3","uid":"bd18265f-85ce-4c5d-a6df-d3cbf003424d","resourceVersion":"107218","creationTimestamp":"2019-11-27T16:06:31Z","labels":{"app":"sample-java","buildconfig":"sample-java","openshift.io/build-config.name":"sample-java","openshift.io/build.start-policy":"Serial"},"annotations":{"openshift.io/build-config.name":"sample-java","openshift.io/build.number":"3"},"ownerReferences":[{"apiVersion":"build.openshift.io/v1","kind":"BuildConfig","name":"sample-java","uid":"e8772513-2a6e-4033-a366-8f946b84d204","controller":true}]},"spec":{"serviceAccount":"builder","source":{"type":"Git","git":{"uri":"https://github.com/openshiftdemos/os-sample-java-web.git"}},"strategy":{"type":"Source","sourceStrategy":{"from":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/wildfly-centos7-dev@sha256:942c35a3c4dff09fb4a027b07512bb4c5ec13e4317a946a3670001ac8d18a42a"},"pullSecret":{"name":"builder-dockercfg-5kqx9"},"env":[{"name":"SCRIPT_DEBUG","value":"true"},{"name":"BUILD_LOGLEVEL","value":"5"}]}},"output":{"to":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/sample-java:latest"},"pushSecret":{"name":"builder-dockercfg-5kqx9"}},"resources":{},"postCommit":{},"nodeSelector":null,"triggeredBy":[{"message":"Manually triggered"}]},"status":{"phase":"New","outputDockerImageReference":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/sample-java:latest","config":{"kind":"BuildConfig","namespace":"wildfly-s2i-demo","name":"sample-java"},"output":{}}} Caching blobs under "/var/cache/blobs". I1127 16:06:42.566612 1 util_linux.go:56] found cgroup parent kubepods-besteffort-pod7163698b_a8ab_4991_beb2_83053fc20ce0.slice I1127 16:06:42.566673 1 builder.go:337] Running build with cgroup limits: api.CGroupLimits{MemoryLimitBytes:92233720368547, CPUShares:0, CPUPeriod:0, CPUQuota:0, MemorySwap:92233720368547, Parent:"kubepods-besteffort-pod7163698b_a8ab_4991_beb2_83053fc20ce0.slice"} Notice that on both executions the memory in bytes passed to the cgroups is the same: builder.go:337] Running build with cgroup limits: api.CGroupLimits{MemoryLimitBytes:92233720368547 I would expect different values, indeed this is how it works in OCP 4.2. I executed the same test using CodeReady Containers and the memory limits passed to cgroups changes according to the resource limits. The launch scripts read the same value from /sys/fs/cgroup/memory/memory.limit_in_bytes, but in OCP 4.3 this value never changes.
Yes, we've confirmed there was a change in behavior in crio. We are tracking a bug to revert the change in behavior. Sorry I don't have a link handy, Nalin might.
I'm going to re-open this one, Nalin please close as duplicate of "bug to revert the change in behavior".
Nalin is currently driving investigation with https://github.com/cri-o/cri-o/pull/2992
Peter Hunt, does your CRIO PR https://github.com/cri-o/cri-o/pull/3381 address this BZ? If not, can you please take a look at this one as Nalin has many 4.5 BZ's in hand.
I believe we fixed this with https://github.com/cri-o/cri-o/pull/2997, in CRI-O 1.16.2 for 4.3, and with https://github.com/cri-o/cri-o/pull/2998 in 1.17.0 and later, for 4.4 and later.
Agreed, this should be fixed. Moving to modified
Assigning to Jindrich to handle any packging needs, I'm not sure there's any for this one, and setting to Post.
Tom, cri-o is maintained by Lokesh.
Checked with cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8 under 4.5.0-0.nightly-2020-05-19-041951, it's already fixed. $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME wj45uos520a-vzjzr-master-0 Ready master 50m v1.18.2 192.168.0.19 <none> Red Hat Enterprise Linux CoreOS 45.81.202005182312-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.18.0-17.dev.rhao s4.5.gitdea34b9.el8 wj45uos520a-vzjzr-master-1 Ready master 47m v1.18.2 192.168.0.37 <none> Red Hat Enterprise Linux CoreOS 45.81.202005182312-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.18.0-17.dev.rhao s4.5.gitdea34b9.el8 wj45uos520a-vzjzr-master-2 Ready master 47m v1.18.2 192.168.0.28 <none> Red Hat Enterprise Linux CoreOS 45.81.202005182312-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.18.0-17.dev.rhao s4.5.gitdea34b9.el8 wj45uos520a-vzjzr-rhel-0 Ready worker 17m v1.18.2 192.168.0.17 10.0.98.71 Red Hat Enterprise Linux Server 7.8 (Maipo) 3.10.0-1127.8.2.el7.x86_64 cri-o://1.18.0-17.dev.rhao s4.5.gitdea34b9.el7 wj45uos520a-vzjzr-worker-0 Ready worker 35m v1.18.2 192.168.0.14 <none> Red Hat Enterprise Linux CoreOS 45.81.202005182312-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.18.0-17.dev.rhao s4.5.gitdea34b9.el8 wj45uos520a-vzjzr-worker-1 Ready worker 35m v1.18.2 192.168.0.18 <none> Red Hat Enterprise Linux CoreOS 45.81.202005182312-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.18.0-17.dev.rhao s4.5.gitdea34b9.el8 $ oc new-app --name=sample-java quay.io/wildfly/wildfly-centos7:18.0~https://github.com/openshiftdemos/os-sample-java-web.git --> Found container image 38b29f9 (7 months old) from quay.io for "quay.io/wildfly/wildfly-centos7:18.0" WildFly 18.0.0.Final -------------------- Platform for building and running JEE applications on WildFly 18.0.0.Final Tags: builder, wildfly, wildfly18 * An image stream tag will be created as "wildfly-centos7:18.0" that will track the source image * A source build using source code from https://github.com/openshiftdemos/os-sample-java-web.git will be created * The resulting image will be pushed to image stream tag "sample-java:latest" * Every time "wildfly-centos7:18.0" changes a new build will be triggered * This image will be deployed in deployment config "sample-java" * Ports 8080/tcp, 8778/tcp will be load balanced by service "sample-java" * Other containers can access this service through the hostname "sample-java" --> Creating resources ... imagestream.image.openshift.io "wildfly-centos7" created imagestream.image.openshift.io "sample-java" created buildconfig.build.openshift.io "sample-java" created deploymentconfig.apps.openshift.io "sample-java" created service "sample-java" created --> Success Build scheduled, use 'oc logs -f bc/sample-java' to track its progress. Application is not exposed. You can expose services to the outside world by executing one or more of the commands below: 'oc expose svc/sample-java' Run 'oc status' to view your app. $ oc get pods sample-java-1-build -w NAME READY STATUS RESTARTS AGE sample-java-1-build 1/1 Running 0 66s sample-java-1-build 0/1 Completed 0 2m4s $ oc logs pod/sample-java-1-build | grep MAVEN_OPTS INFO Using MAVEN_OPTS -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError $ oc edit bc buildconfig.build.openshift.io/sample-java edited $ oc start-build buildconfig.build.openshift.io/sample-java build.build.openshift.io/sample-java-2 started $ oc get pods/sample-java-2-build -w NAME READY STATUS RESTARTS AGE sample-java-2-build 0/1 PodInitializing 0 8s sample-java-2-build 0/1 Completed 0 96s $ oc logs pod/sample-java-2-build | grep MAVEN_OPTS INFO Using MAVEN_OPTS -Xms128m -Xmx512m -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409