Description of problem: On a fresh 3.7.23 deployment, hawkular metrics fails to start with error: "Initial heap size set to a larger value than the maximum heap size" (Tries to create a pod w/ 3G and largest pod allowed on OSD is 2G) Version-Release number of selected component (if applicable): 3.7 How reproducible: Every Steps to Reproduce: 1. deploy metrics Actual results: # oc logs hawkular-metrics-gzj2x 2018-03-21 19:54:43 Starting Hawkular Metrics The service account has read permissions for its project. Proceeding The service account has permission to watch namespaces. Proceeding Creating the Hawkular Metrics keystore from the Secret's cert data Converting the PKCS12 keystore into a Java Keystore Importing keystore /opt/hawkular/auth/hawkular-metrics.pkcs12 to /opt/hawkular/auth/hawkular-metrics.keystore... Entry for alias hawkular-metrics successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled [Storing /opt/hawkular/auth/hawkular-metrics.keystore] Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /opt/hawkular/auth/hawkular-metrics.keystore -destkeystore /opt/hawkular/auth/hawkular-metrics.keystore -deststoretype pkcs12". Building the trust store Certificate was added to keystore Certificate was added to keystore Splitting up the Kubernetes CA into individual certificates Adding the Kubernetes CAs into the trust store Certificate was added to keystore Retrieving the Logging's CA and adding to the trust store, if Logging is available Could not get the logging secret! Status code: 403. The Hawkular Alerts integration with Logging might not work properly. -Xms1536m -Xmx1536m -XX:+UseParallelGC -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=40 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:MaxMetaspaceSize=100m -XX:+ExitOnOutOfMemoryError /opt/eap/bin/standalone.conf: line 105: max_mem: command not found ========================================================================= JBoss Bootstrap Environment JBOSS_HOME: /opt/eap JAVA: /usr/lib/jvm/java-1.8.0/bin/java JAVA_OPTS: -server -verbose:gc -Xloggc:"/opt/eap/standalone/log/gc.log" -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1536m -Xmx1303m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.logmanager,jdk.nashorn.api -Djava.awt.headless=true -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/.overlays/layer-base-jboss-eap-7.0.9.CP/org/jboss/logmanager/main/jboss-logmanager-2.0.7.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/opt/eap/jolokia.jar=port=8778,protocol=https,caCert=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt,clientPrincipal=cn=system:master-proxy,useSslClientAuthentication=true,extraClientCheck=true,host=0.0.0.0,discoveryEnabled=false -Djava.security.egd=file:/dev/./urandom -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump Expected results: Starts Additional info: # oc describe rc/hawkular-metrics Name: hawkular-metrics Namespace: openshift-infra Selector: name=hawkular-metrics Labels: metrics-infra=hawkular-metrics name=hawkular-metrics Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"ReplicationController","metadata":{"annotations":{},"creationTimestamp":"2018-03-21T17:58:21Z","generation":2,"labels":{"met... Replicas: 1 current / 1 desired Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: metrics-infra=hawkular-metrics name=hawkular-metrics Service Account: hawkular Containers: hawkular-metrics: Image: registry.reg-aws.openshift.com:443/openshift3/metrics-hawkular-metrics:v3.7 Ports: 8080/TCP, 8443/TCP, 8888/TCP Command: /opt/hawkular/scripts/hawkular-metrics-wrapper.sh -b 0.0.0.0 -Dhawkular.metrics.cassandra.nodes=hawkular-cassandra -Dhawkular.metrics.cassandra.use-ssl -Dhawkular.metrics.openshift.auth-methods=openshift-oauth,htpasswd -Dhawkular.metrics.openshift.htpasswd-file=/hawkular-account/hawkular-metrics.htpasswd -Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization -Dhawkular.metrics.default-ttl=7 -Dhawkular.metrics.admin-tenant=_hawkular_admin -Dhawkular-alerts.cassandra-nodes=hawkular-cassandra -Dhawkular-alerts.cassandra-use-ssl -Dhawkular.alerts.openshift.auth-methods=openshift-oauth,htpasswd -Dhawkular.alerts.openshift.htpasswd-file=/hawkular-account/hawkular-metrics.htpasswd -Dhawkular.alerts.allowed-cors-access-control-allow-headers=authorization -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true -Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true -Dcom.datastax.driver.FORCE_NIO=true -DKUBERNETES_MASTER_URL=https://kubernetes.default.svc -DUSER_WRITE_ACCESS=False -Dhawkular.metrics.jmx-reporting-enabled Limits: memory: 3Gi Requests: cpu: 100m memory: 3Gi Liveness: exec [/opt/hawkular/scripts/hawkular-metrics-liveness.py] delay=0s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [/opt/hawkular/scripts/hawkular-metrics-readiness.py] delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAMESPACE: (v1:metadata.namespace) MASTER_URL: https://kubernetes.default.svc JGROUPS_PASSWORD: XrZ1lKOPdJdtgWJLh TRUSTSTORE_AUTHORITIES: /hawkular-metrics-certs/tls.truststore.crt ENABLE_PROMETHEUS_ENDPOINT: True OPENSHIFT_KUBE_PING_NAMESPACE: (v1:metadata.namespace) OPENSHIFT_KUBE_PING_LABELS: metrics-infra=hawkular-metrics,name=hawkular-metrics STARTUP_TIMEOUT: 500 Mounts: /hawkular-account from hawkular-metrics-account (rw) /hawkular-metrics-certs from hawkular-metrics-certs (rw) Volumes: hawkular-metrics-certs: Type: Secret (a volume populated by a Secret) SecretName: hawkular-metrics-certs Optional: false hawkular-metrics-account: Type: Secret (a volume populated by a Secret) SecretName: hawkular-metrics-account Optional: false Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 1 replication-controller Normal SuccessfulCreate Created pod: hawkular-metrics-6lhcs 1h 1h 1 replication-controller Normal SuccessfulDelete Deleted pod: hawkular-metrics-6lhcs 1h 1h 1 replication-controller Normal SuccessfulCreate Created pod: hawkular-metrics-9kwzq 9m 9m 1 replication-controller Normal SuccessfulDelete Deleted pod: hawkular-metrics-9kwzq 8m 8m 1 replication-controller Normal SuccessfulCreate Created pod: hawkular-metrics-gzj2x
My apologies - my initial assessment was wrong. It's not that the pod size was too large, it's that the Xms is set larger than the Xmx: JAVA_OPTS: -server -verbose:gc -Xloggc:"/opt/eap/standalone/log/gc.log" -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1536m -Xmx1303m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.logmanager,jdk.nashorn.api -Djava.awt.headless=true -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/.overlays/layer-base-jboss-eap-7.0.9.CP/org/jboss/logmanager/main/jboss-logmanager-2.0.7.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/opt/eap/jolokia.jar=port=8778,protocol=https,caCert=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt,clientPrincipal=cn=system:master-proxy,useSslClientAuthentication=true,extraClientCheck=true,host=0.0.0.0,discoveryEnabled=false -Djava.security.egd=file:/dev/./urandom -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump There is no container memory size limit on the openshift-infra project.
(In reply to Dan Yocum from comment #1) > My apologies - my initial assessment was wrong. It's not that the pod size > was too large, it's that the Xms is set larger than the Xmx: > > JAVA_OPTS: -server -verbose:gc -Xloggc:"/opt/eap/standalone/log/gc.log" > -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading > -Xms1536m -Xmx1303m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m > -Djava.net.preferIPv4Stack=true > -Djboss.modules.system.pkgs=org.jboss.logmanager,jdk.nashorn.api > -Djava.awt.headless=true > -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/ > base/.overlays/layer-base-jboss-eap-7.0.9.CP/org/jboss/logmanager/main/jboss- > logmanager-2.0.7.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/ > jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar > -Djava.util.logging.manager=org.jboss.logmanager.LogManager > -javaagent:/opt/eap/jolokia.jar=port=8778,protocol=https,caCert=/var/run/ > secrets/kubernetes.io/serviceaccount/ca.crt,clientPrincipal=cn=system:master- > proxy,useSslClientAuthentication=true,extraClientCheck=true,host=0.0.0.0, > discoveryEnabled=false -Djava.security.egd=file:/dev/./urandom > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump > > > > There is no container memory size limit on the openshift-infra project. Thanks for the info. We are able to reproduce with brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/metrics-hawkular-metrics:v3.7.39.
There is a work-around. It's ugly, but it overrides the JAVA_OPTS env var that are passed in the hawkular-metrics-wrapper.sh script. oc scale --replicas=0 rc/hawkular-metrics oc env rc/hawkular-metrics JAVA_OPTS=/'-server -verbose:gc -Xloggc:"/opt/eap/standalone/log/gc.log" -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1536m -Xms1536m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.logmanager,jdk.nashorn.api -Djava.awt.headless=true -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/.overlays/layer-base-jboss-eap-7.0.9.CP/org/jboss/logmanager/main/jboss-logmanager-2.0.7.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/opt/eap/jolokia.jar=port=8778,protocol=https,caCert=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt,clientPrincipal=cn=system:master-proxy,useSslClientAuthentication=true,extraClientCheck=true,host=0.0.0.0,discoveryEnabled=false -Djava.security.egd=file:/dev/./urandom -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump' oc scale --replicas=1 rc/hawkular-metrics
Sorry - there's a typo in the above command (an extra '/' that is in the old 3.0 oc cli docs). This is the right command: oc scale --replicas=0 rc/hawkular-metrics oc env rc/hawkular-metrics JAVA_OPTS='-server -verbose:gc -Xloggc:"/opt/eap/standalone/log/gc.log" -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1536m -Xms1536m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.logmanager,jdk.nashorn.api -Djava.awt.headless=true -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/.overlays/layer-base-jboss-eap-7.0.9.CP/org/jboss/logmanager/main/jboss-logmanager-2.0.7.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/opt/eap/jolokia.jar=port=8778,protocol=https,caCert=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt,clientPrincipal=cn=system:master-proxy,useSslClientAuthentication=true,extraClientCheck=true,host=0.0.0.0,discoveryEnabled=false -Djava.security.egd=file:/dev/./urandom -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump' oc scale --replicas=1 rc/hawkular-metrics Collapse All Comments Expand All Comments Add Comment Unwrap comments Show CC Changes
John just saw another typo: oc scale --replicas=0 rc/hawkular-metrics oc env rc/hawkular-metrics JAVA_OPTS='-server -verbose:gc -Xloggc:"/opt/eap/standalone/log/gc.log" -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1536m -Xmx1536m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.logmanager,jdk.nashorn.api -Djava.awt.headless=true -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/.overlays/layer-base-jboss-eap-7.0.9.CP/org/jboss/logmanager/main/jboss-logmanager-2.0.7.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/opt/eap/jolokia.jar=port=8778,protocol=https,caCert=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt,clientPrincipal=cn=system:master-proxy,useSslClientAuthentication=true,extraClientCheck=true,host=0.0.0.0,discoveryEnabled=false -Djava.security.egd=file:/dev/./urandom -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump' oc scale --replicas=1 rc/hawkular-metrics
I just deployed a new cluster and apparently the rc now set the JAVA_OPTS to this: ... -Xms1303m -Xmx1303m ... Is this right??
(In reply to Dan Yocum from comment #8) > I just deployed a new cluster and apparently the rc now set the JAVA_OPTS to > this: > > ... -Xms1303m -Xmx1303m ... > > Is this right?? Sorry for the late response. Yes, that looks right. What is the status with this issue?
I don't think it's right - we talked about it in https://bugzilla.redhat.com/show_bug.cgi?id=1559477#c15. The heap should be 50% of the container limit which is 3GB,so these should be 1536m. 1303m caused INTERNAL_SERVER_ERROR seen in https://bugzilla.redhat.com/show_bug.cgi?id=1559477#c10
Tested with metrics-hawkular-metrics-v3.7.46-1, hawkular-metrics pod runs well # oc get po NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-l5jjt 1/1 Running 0 14m hawkular-metrics-pdszs 1/1 Running 0 14m heapster-nk2kw 1/1 Running 0 14m Xmx and Xms are used the same value, it is 50% of the hawkular-metrics container memory limit ***************************************************************************** JBoss Bootstrap Environment JBOSS_HOME: /opt/eap JAVA: /usr/lib/jvm/java-1.8.0/bin/java JAVA_OPTS: -server -verbose:gc -Xloggc:"/opt/eap/standalone/log/gc.log" -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1536m -Xmx1536m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.logmanager,jdk.nashorn.api -Djava.awt.headless=true -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-2.0.3.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/opt/eap/jolokia.jar=port=8778,protocol=https,caCert=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt,clientPrincipal=cn=system:master-proxy,useSslClientAuthentication=true,extraClientCheck=true,host=0.0.0.0,discoveryEnabled=false -Djava.security.egd=file:/dev/./urandom -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump *****************************************************************************
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1576