Bug 1772858 - JAVA S2I Xmx and Xms are not part of MAVEN_OPTS if memory limit is included
Summary: JAVA S2I Xmx and Xms are not part of MAVEN_OPTS if memory limit is included
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Lokesh Mandvekar
QA Contact: weiwei jiang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-15 10:49 UTC by Petr Kremensky
Modified: 2020-07-13 17:12 UTC (History)
13 users (show)

Fixed In Version: cri-o-1.18.1-1.dev.rhaos4.5.git60ac541.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:12:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CLOUD-3434 0 Major Closed OCP 4.3 JAVA S2I Xmx and Xms are not part of MAVEN_OPTS if memory limit is included 2020-06-23 08:14:57 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:12:31 UTC

Description Petr Kremensky 2019-11-15 10:49:56 UTC
Description of problem:
Xmx and Xms are not part of MAVEN_OPTS with build configuration memory limit included.

The issue is regression against previous releases.

Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2019-11-11-115927 (https://projects.engineering.redhat.com/browse/LPINTEROP-680)

How reproducible:
Always

Steps to Reproduce:
$ oc new-app --name=sample-java quay.io/wildfly/wildfly-centos7:18.0~https://github.com/openshiftdemos/os-sample-java-web.git
# wait until the build 1 is completed
$ oc get pods/sample-java-1-build -w
NAME                  READY   STATUS   
sample-java-1-build   0/1     Completed
$ oc logs pod/sample-java-1-build | grep MAVEN_OPTS
INFO Using MAVEN_OPTS -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError

# update the memory limits for build config
$ oc edit buildconfig.build.openshift.io/sample-java
<   resources: {}
---
>   resources:
>     limits:
>       memory: 1Gi

$ oc start-build buildconfig.build.openshift.io/sample-java
build.build.openshift.io/sample-java-2 started

# wait until the build 2 is completed
$ oc get pods/sample-java-2-build -w
NAME                  READY   STATUS   
sample-java-2-build   0/1     Completed

Actual results:
# heap size setup is missing
$ oc logs pod/sample-java-2-build | grep MAVEN_OPTS
INFO Using MAVEN_OPTS -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError

Expected results:
# initial and maximum heap size are part of options for maven
$ oc logs pod/sample-java-2-build | grep MAVEN_OPTS
INFO Using MAVEN_OPTS -Xms128m -Xmx512m -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError

Comment 1 Adam Kaplan 2019-11-18 14:06:02 UTC
@Petr can you please provide an example of this working with memory limits in 4.2, or other prior release? To the best of my knowledge we do not have any specific knowledge which translates pod resource limits into MAVEN_OPTS passed into the build.

Comment 2 Petr Kremensky 2019-11-18 14:25:16 UTC
Hi Adam, just follow the steps to reproduce. "Actual results" is the output from OCP 4.3 run, "Expected results" is the output from OCP 4.2 and prior releases.

Comment 3 Gabe Montero 2019-11-19 21:20:59 UTC
I can say we certainty that openshift builds do not translate resource limits to MAVEN_OPTS ... this is a function of the s2i scripts in the wildfly builder image.

It is very akin to what we do in with the openshift /jenkins image in setting the JVM heap based on what is in /sys/fs/cgroup/memory/memory.limit_in_bytes (in fact we 
collaborated with jboss a few years ago to establish such practices).

And I in fact see the use of that file in the very complicated looking jboss container-limits file and java-default-options file in the wildfly image.

I tried Petr's oc new-app example on 4.3, and then added the memory limit as he articulated.  When I look at the build pod, I see the resources set in
all the containers

i.e.

    resources:
      limits:
        memory: 1Gi
      requests:
        memory: 1Gi

Note the requests there is not something the build controller adds.  I confirmed that via debug.  My guess is it is added by either the kubelet or api server.

That said, I see the same thing when running this on 4.2.

    resources:
      limits:
        memory: 1Gi
      requests:
        memory: 1Gi


Though I see the expected -xmx -xms with the 4.2 run even with that requests entry.  

But at this point, the relevant pieces are either beneath us in k8s or in the wildfly/jboss s2i code.  This bug does not belong in openshift build.

A version of the wildfly image with debug that explains what they are seeing in /sys/fs/cgroup/memory/memory.limit_in_bytes and why they are producing or not producing heap params is needed.  

Then use that in the 4.2 and 4.3 runs.

Perhaps opening an issue against https://github.com/wildfly/wildfly-s2i will help with that.

Based on what is discovered, then either a tweak to wildfly to work on 4.3 is needed, or a lower level bug against the openshift node or coreos components.

Comment 4 Gabe Montero 2019-11-19 21:29:03 UTC
Among other things, RHEL coreos got updated between 4.2 to 4.3 ... so there very well may be differences wrt to cgroups that wildfly will need to adjust to if you want the same heap parameter behavior

Comment 5 Gabe Montero 2019-11-19 21:37:13 UTC
RHEL coreos got updated to 8.1 we believe 

another thing to consider is that with those linux level updates, the JVMs auto tuning in container envs took over.

AGain, questions for the wildfly folks.

Comment 6 Petr Kremensky 2019-11-20 14:49:45 UTC
Hi Gabe, thanks for looking into it, I'll file an issue to Wildfly guys for heads up.

Comment 7 Gabe Montero 2019-11-20 15:01:25 UTC
Sounds good Petr.

Comment 8 Gabe Montero 2019-11-21 21:52:14 UTC
We learned today that some buildah related changes quite possibly has resulted in the change of behavior between 4.2 and 4.3

I've cc:ed Nalin D. from the containers team

Nalin - per our slack discussion, please update the JBOSS issue tracker noted here (https://issues.jboss.org/browse/CLOUD-3434) as needed with whatever reactions wildfly (and most likely by extension the downstream EAP images) need wrt their current analysis of /sys/fs/cgroup/memory/memory.limit_in_bytes

Comment 9 Yeray Borges 2019-11-28 09:02:32 UTC
Just in case the Jira updates are not being received by the current CC user list; copy&paste from there:

I haven't found issues with the current limits calculation implemented by the launch scripts. It looks like there is a problem in the underlying system that prepares the cgroups limits. Maybe RHEL coreos?

We can see the problem in the build-config logs if we increase the log level to 5 and execute a build with the resource limits applied and without them.

These are the two traces I have captured in OCP 4.3:


1. With resource limits applied:

I1127 15:39:05.733161       1 builder.go:329] openshift-builder v4.3.0-201911220712+69cbcb2-dirty
I1127 15:39:05.736924       1 builder.go:330] redacted build: {"kind":"Build","apiVersion":"build.openshift.io/v1","metadata":{"name":"sample-java-2","namespace":"wildfly-s2i-demo","selfLink":"/apis/build.openshift.io/v1/namespaces/wildfly-s2i-demo/builds/sample-java-2","uid":"7b76d305-231e-44a1-9e9d-7356ea5502f3","resourceVersion":"99681","creationTimestamp":"2019-11-27T15:38:45Z","labels":{"app":"sample-java","buildconfig":"sample-java","openshift.io/build-config.name":"sample-java","openshift.io/build.start-policy":"Serial"},"annotations":{"openshift.io/build-config.name":"sample-java","openshift.io/build.number":"2"},"ownerReferences":[{"apiVersion":"build.openshift.io/v1","kind":"BuildConfig","name":"sample-java","uid":"e8772513-2a6e-4033-a366-8f946b84d204","controller":true}]},"spec":{"serviceAccount":"builder","source":{"type":"Git","git":{"uri":"https://github.com/openshiftdemos/os-sample-java-web.git"}},"strategy":{"type":"Source","sourceStrategy":{"from":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/wildfly-centos7-dev@sha256:942c35a3c4dff09fb4a027b07512bb4c5ec13e4317a946a3670001ac8d18a42a"},"pullSecret":{"name":"builder-dockercfg-5kqx9"},"env":[{"name":"SCRIPT_DEBUG","value":"true"},{"name":"BUILD_LOGLEVEL","value":"5"}]}},"output":{"to":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/sample-java:latest"},"pushSecret":{"name":"builder-dockercfg-5kqx9"}},"resources":{"limits":{"memory":"1Gi"},"requests":{"memory":"1Gi"}},"postCommit":{},"nodeSelector":null,"triggeredBy":[{"message":"Manually triggered"}]},"status":{"phase":"New","outputDockerImageReference":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/sample-java:latest","config":{"kind":"BuildConfig","namespace":"wildfly-s2i-demo","name":"sample-java"},"output":{}}}
Caching blobs under "/var/cache/blobs".
I1127 15:39:05.948209       1 util_linux.go:56] found cgroup parent kubepods-burstable-pod934e5159_6c54_4da9_8b4d_f14101c64446.slice
I1127 15:39:05.948246       1 builder.go:337] Running build with cgroup limits: api.CGroupLimits{MemoryLimitBytes:92233720368547, CPUShares:0, CPUPeriod:0, CPUQuota:0, MemorySwap:92233720368547, Parent:"kubepods-burstable-pod934e5159_6c54_4da9_8b4d_f14101c64446.slice"}


2. Without resource limits:

I1127 16:06:42.370930       1 builder.go:329] openshift-builder v4.3.0-201911220712+69cbcb2-dirty
I1127 16:06:42.379421       1 builder.go:330] redacted build: {"kind":"Build","apiVersion":"build.openshift.io/v1","metadata":{"name":"sample-java-3","namespace":"wildfly-s2i-demo","selfLink":"/apis/build.openshift.io/v1/namespaces/wildfly-s2i-demo/builds/sample-java-3","uid":"bd18265f-85ce-4c5d-a6df-d3cbf003424d","resourceVersion":"107218","creationTimestamp":"2019-11-27T16:06:31Z","labels":{"app":"sample-java","buildconfig":"sample-java","openshift.io/build-config.name":"sample-java","openshift.io/build.start-policy":"Serial"},"annotations":{"openshift.io/build-config.name":"sample-java","openshift.io/build.number":"3"},"ownerReferences":[{"apiVersion":"build.openshift.io/v1","kind":"BuildConfig","name":"sample-java","uid":"e8772513-2a6e-4033-a366-8f946b84d204","controller":true}]},"spec":{"serviceAccount":"builder","source":{"type":"Git","git":{"uri":"https://github.com/openshiftdemos/os-sample-java-web.git"}},"strategy":{"type":"Source","sourceStrategy":{"from":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/wildfly-centos7-dev@sha256:942c35a3c4dff09fb4a027b07512bb4c5ec13e4317a946a3670001ac8d18a42a"},"pullSecret":{"name":"builder-dockercfg-5kqx9"},"env":[{"name":"SCRIPT_DEBUG","value":"true"},{"name":"BUILD_LOGLEVEL","value":"5"}]}},"output":{"to":{"kind":"DockerImage","name":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/sample-java:latest"},"pushSecret":{"name":"builder-dockercfg-5kqx9"}},"resources":{},"postCommit":{},"nodeSelector":null,"triggeredBy":[{"message":"Manually triggered"}]},"status":{"phase":"New","outputDockerImageReference":"image-registry.openshift-image-registry.svc:5000/wildfly-s2i-demo/sample-java:latest","config":{"kind":"BuildConfig","namespace":"wildfly-s2i-demo","name":"sample-java"},"output":{}}}
Caching blobs under "/var/cache/blobs".
I1127 16:06:42.566612       1 util_linux.go:56] found cgroup parent kubepods-besteffort-pod7163698b_a8ab_4991_beb2_83053fc20ce0.slice
I1127 16:06:42.566673       1 builder.go:337] Running build with cgroup limits: api.CGroupLimits{MemoryLimitBytes:92233720368547, CPUShares:0, CPUPeriod:0, CPUQuota:0, MemorySwap:92233720368547, Parent:"kubepods-besteffort-pod7163698b_a8ab_4991_beb2_83053fc20ce0.slice"}


Notice that on both executions the memory in bytes passed to the cgroups is the same:

builder.go:337] Running build with cgroup limits: api.CGroupLimits{MemoryLimitBytes:92233720368547



I would expect different values, indeed this is how it works in OCP 4.2. I executed the same test using CodeReady Containers and the memory limits passed to cgroups changes according to the resource limits.
The launch scripts read the same value from /sys/fs/cgroup/memory/memory.limit_in_bytes, but in OCP 4.3 this value never changes.

Comment 10 Ben Parees 2019-11-28 14:51:14 UTC
Yes, we've confirmed there was a change in behavior in crio. We are tracking a bug to revert the change in behavior.  Sorry I don't have a link handy, Nalin might.

Comment 11 Petr Kremensky 2019-12-05 07:33:51 UTC
I'm going to re-open this one, Nalin please close as duplicate of "bug to revert the change in behavior".

Comment 12 Gabe Montero 2019-12-05 18:27:12 UTC
Nalin is currently driving investigation with https://github.com/cri-o/cri-o/pull/2992

Comment 17 Tom Sweeney 2020-05-14 18:34:00 UTC
Peter Hunt, does your CRIO PR https://github.com/cri-o/cri-o/pull/3381 address this BZ?  

If not, can you please take a look at this one as Nalin has many 4.5 BZ's in hand.

Comment 18 Nalin Dahyabhai 2020-05-14 18:59:34 UTC
I believe we fixed this with https://github.com/cri-o/cri-o/pull/2997, in CRI-O 1.16.2 for 4.3, and with https://github.com/cri-o/cri-o/pull/2998 in 1.17.0 and later, for 4.4 and later.

Comment 19 Peter Hunt 2020-05-14 19:11:29 UTC
Agreed, this should be fixed. Moving to modified

Comment 20 Tom Sweeney 2020-05-14 21:24:18 UTC
Assigning to Jindrich to handle any packging needs, I'm not sure there's any for this one, and setting to Post.

Comment 21 Jindrich Novy 2020-05-15 03:12:23 UTC
Tom, cri-o is maintained by Lokesh.

Comment 25 weiwei jiang 2020-05-20 03:00:36 UTC
Checked with cri-o://1.18.0-17.dev.rhaos4.5.gitdea34b9.el8 under 4.5.0-0.nightly-2020-05-19-041951, it's already fixed.


$ oc get nodes -o wide 
NAME                         STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                CONTAINER-RUNTIME
wj45uos520a-vzjzr-master-0   Ready    master   50m   v1.18.2   192.168.0.19   <none>        Red Hat Enterprise Linux CoreOS 45.81.202005182312-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.18.0-17.dev.rhao
s4.5.gitdea34b9.el8
wj45uos520a-vzjzr-master-1   Ready    master   47m   v1.18.2   192.168.0.37   <none>        Red Hat Enterprise Linux CoreOS 45.81.202005182312-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.18.0-17.dev.rhao
s4.5.gitdea34b9.el8
wj45uos520a-vzjzr-master-2   Ready    master   47m   v1.18.2   192.168.0.28   <none>        Red Hat Enterprise Linux CoreOS 45.81.202005182312-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.18.0-17.dev.rhao
s4.5.gitdea34b9.el8
wj45uos520a-vzjzr-rhel-0     Ready    worker   17m   v1.18.2   192.168.0.17   10.0.98.71    Red Hat Enterprise Linux Server 7.8 (Maipo)                    3.10.0-1127.8.2.el7.x86_64    cri-o://1.18.0-17.dev.rhao
s4.5.gitdea34b9.el7
wj45uos520a-vzjzr-worker-0   Ready    worker   35m   v1.18.2   192.168.0.14   <none>        Red Hat Enterprise Linux CoreOS 45.81.202005182312-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.18.0-17.dev.rhao
s4.5.gitdea34b9.el8
wj45uos520a-vzjzr-worker-1   Ready    worker   35m   v1.18.2   192.168.0.18   <none>        Red Hat Enterprise Linux CoreOS 45.81.202005182312-0 (Ootpa)   4.18.0-147.8.1.el8_1.x86_64   cri-o://1.18.0-17.dev.rhao
s4.5.gitdea34b9.el8

$ oc new-app --name=sample-java quay.io/wildfly/wildfly-centos7:18.0~https://github.com/openshiftdemos/os-sample-java-web.git
--> Found container image 38b29f9 (7 months old) from quay.io for "quay.io/wildfly/wildfly-centos7:18.0"
                                                    
    WildFly 18.0.0.Final 
    -------------------- 
    Platform for building and running JEE applications on WildFly 18.0.0.Final

    Tags: builder, wildfly, wildfly18

    * An image stream tag will be created as "wildfly-centos7:18.0" that will track the source image
    * A source build using source code from https://github.com/openshiftdemos/os-sample-java-web.git will be created
      * The resulting image will be pushed to image stream tag "sample-java:latest"
      * Every time "wildfly-centos7:18.0" changes a new build will be triggered
    * This image will be deployed in deployment config "sample-java"
    * Ports 8080/tcp, 8778/tcp will be load balanced by service "sample-java"
      * Other containers can access this service through the hostname "sample-java"

--> Creating resources ...
    imagestream.image.openshift.io "wildfly-centos7" created
    imagestream.image.openshift.io "sample-java" created
    buildconfig.build.openshift.io "sample-java" created
    deploymentconfig.apps.openshift.io "sample-java" created
    service "sample-java" created
--> Success
    Build scheduled, use 'oc logs -f bc/sample-java' to track its progress.
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose svc/sample-java' 
    Run 'oc status' to view your app.

$ oc get pods sample-java-1-build -w 
NAME                  READY   STATUS    RESTARTS   AGE
sample-java-1-build   1/1     Running   0          66s
sample-java-1-build   0/1     Completed   0          2m4s

$ oc logs pod/sample-java-1-build | grep MAVEN_OPTS
INFO Using MAVEN_OPTS -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError

$ oc edit bc 
buildconfig.build.openshift.io/sample-java edited

$ oc start-build buildconfig.build.openshift.io/sample-java
build.build.openshift.io/sample-java-2 started

$ oc get pods/sample-java-2-build -w
NAME                  READY   STATUS            RESTARTS   AGE
sample-java-2-build   0/1     PodInitializing   0          8s
sample-java-2-build   0/1     Completed         0          96s

$ oc logs pod/sample-java-2-build | grep MAVEN_OPTS
INFO Using MAVEN_OPTS -Xms128m -Xmx512m -XX:+UseParallelOldGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError

Comment 27 errata-xmlrpc 2020-07-13 17:12:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.