Bug 1351166

Summary: hawkular pod in metrics deployment OOMs but POD stays in status 'running'
Product: OpenShift Container Platform Reporter: Miheer Salunke <misalunk>
Component: HawkularAssignee: Matt Wringe <mwringe>
Status: CLOSED ERRATA QA Contact: chunchen <chunchen>
Severity: high Docs Contact:
Priority: high    
Version: 3.2.0CC: aos-bugs, mwringe, tdawson, wsun
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-27 09:38:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 7 Matt Wringe 2016-08-05 13:47:50 UTC
This should be fixed for 3.3. Setting ON_QA

Comment 8 chunchen 2016-08-08 07:58:41 UTC
It's fixed, checked with latest logging 3.3 images. I simulate this situation via stopping the java process [1] in the hawkular container, after that, though the hawkular pod status is still *Running*, but the container in the hawkular pod is marked as not *Ready*, please refer to the below results:

The test results:
1. Check the hawkular pod and container status before stopping java process
[chunchen@F17-CCY daily]$ oc get pod
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-iq1e5 1/1 Running 0 11m
hawkular-metrics-4e62i 1/1 Running 0 11m
heapster-36bvw 1/1 Running 3 11m
metrics-deployer-1b1lb 0/1 Completed 0 13m

2. Stop the java process [1] in hawkular container
sh-4.2$ kill -9 JAVA-PROCESS-PID

3. Check the hawkular pod and container status again
[chunchen@F17-CCY daily]$ oc get pod
NAME                         READY     STATUS      RESTARTS   AGE
hawkular-cassandra-1-iq1e5   1/1       Running     0          25m
hawkular-metrics-4e62i       0/1       Running     2          25m
heapster-36bvw               1/1       Running     3          25m
metrics-deployer-1b1lb       0/1       Completed   0          27m


[1] java process:
/usr/lib/jvm/java-1.8.0/bin/java -D[Standalone] -server -verbose:gc -Xloggc:/opt/eap/standalone/log/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=3M -XX:-TraceClassUnloading -Xms1303m -Xmx1303m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djboss.modules.system.pkgs=org.jboss.logmanager,org.jboss.byteman -Djava.awt.headless=true -Xbootclasspath/p:/opt/eap/jboss-modules.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/main/jboss-logmanager-2.0.3.Final-redhat-1.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/javax.json-1.0.4.jar:/opt/eap/modules/system/layers/base/org/jboss/logmanager/ext/main/jboss-logmanager-ext-1.0.0.Alpha2-redhat-1.jar -Djava.util.logging.manager=org.jboss.logmanager.LogManager -javaagent:/opt/eap/jolokia.jar=port=8778,protocol=https,caCert=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt,clientPrincipal=cn=system:master-proxy,useSslClientAuthentication=true,extraClientCheck=true,host=0.0.0.0,discoveryEnabled=false -Djava.security.egd=file:/dev/./urandom -Dorg.jboss.boot.log.file=/opt/eap/standalone/log/server.log -Dlogging.configuration=file:/opt/eap/standalone/configuration/logging.properties -jar /opt/eap/jboss-modules.jar -mp /opt/eap/modules org.jboss.as.standalone -Djboss.home.dir=/opt/eap -Djboss.server.base.dir=/opt/eap/standalone -Djavax.net.ssl.keyStore=/opt/hawkular/auth/hawkular-metrics.keystore -Djavax.net.ssl.keyStorePassword=5-Vr9QReUImUocS -Djavax.net.ssl.trustStore=/opt/hawkular/auth/hawkular-metrics.truststore -Djavax.net.ssl.trustStorePassword=5Vj4Q9h1580X6HF -b 0.0.0.0 -Dhawkular-metrics.cassandra-nodes=hawkular-cassandra -Dhawkular-metrics.cassandra-use-ssl -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true -Dorg.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true -Dhawkular-metrics.openshift.auth-methods=openshift-oauth,htpasswd -Dhawkular-metrics.openshift.htpasswd-file=/secrets/hawkular-metrics.htpasswd.file -Dhawkular.metrics.allowed-cors-access-control-allow-headers=authorization -Dhawkular.metrics.default-ttl=7 -DKUBERNETES_MASTER_URL=https://openshift-123.lab.sjc.redhat.com:8443 -DUSER_WRITE_ACCESS=true

Comment 10 errata-xmlrpc 2016-09-27 09:38:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933