Bug 1473409

Summary: hawkular cassandra in CrashLoopBackOff when using openshift_hosted_metrics_deploy=true
Product: OpenShift Container Platform Reporter: Javier Ramirez <javier.ramirez>
Component: HawkularAssignee: Matt Wringe <mwringe>
Status: CLOSED WORKSFORME QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.5.0CC: aos-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-21 12:11:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Javier Ramirez 2017-07-20 18:03:40 UTC
Description of problem:
I tried deploying a 3.5 cluster with openshift_hosted_metrics_deploy=true
and the hawkular-cassandra pod fails with:

The MAX_HEAP_SIZE envar is not set. Basing the MAX_HEAP_SIZE on the available memory limit for the pod (2000000000).
The memory limit is less than 2GB. Using 1/2 of available memory for the max_heap_size.
The MAX_HEAP_SIZE has been set to 953M
THE HEAP_NEWSIZE envar is not set. Setting to 200M based on the CPU_LIMIT of 2000. [100M per CPU core]
About to generate seeds
Trying to access the Seed list [try #1]
Trying to access the Seed list [try #2]
Trying to access the Seed list [try #3]
Setting seeds to be hawkular-cassandra-1-bzx1x
The previous version of Cassandra was 3.0.12.redhat-1. The current version is 3.0.12.redhat-1
cat: /etc/ld.so.conf.d/*.conf: No such file or directory
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
OpenJDK 64-Bit Server VM warning: Cannot open file /opt/apache-cassandra/logs/gc.log due to No such file or directory


Version-Release number of selected component (if applicable):
registry.access.redhat.com/openshift3/metrics-cassandra:3.5.0

How reproducible:
Always

Steps to Reproduce:
1. Put openshift_hosted_metrics_deploy=true on the ansible hosts
2. Run the advance installation

Actual results:
hawkular-cassandra is in CrashLoopBackOff and these are the logs:
The MAX_HEAP_SIZE envar is not set. Basing the MAX_HEAP_SIZE on the available memory limit for the pod (2000000000).
The memory limit is less than 2GB. Using 1/2 of available memory for the max_heap_size.
The MAX_HEAP_SIZE has been set to 953M
THE HEAP_NEWSIZE envar is not set. Setting to 200M based on the CPU_LIMIT of 2000. [100M per CPU core]
About to generate seeds
Trying to access the Seed list [try #1]
Trying to access the Seed list [try #2]
Trying to access the Seed list [try #3]
Setting seeds to be hawkular-cassandra-1-bzx1x
The previous version of Cassandra was 3.0.12.redhat-1. The current version is 3.0.12.redhat-1
cat: /etc/ld.so.conf.d/*.conf: No such file or directory
Picked up JAVA_TOOL_OPTIONS: -Duser.home=/home/jboss -Duser.name=jboss
OpenJDK 64-Bit Server VM warning: Cannot open file /opt/apache-cassandra/logs/gc.log due to No such file or directory


Expected results:
hawkular-cassandra to be deployed successfully

Additional info:

Comment 1 Javier Ramirez 2017-07-20 18:07:58 UTC
Inventory:
[OSEv3:children]
masters
nodes

[OSEv3:vars]
ansible_ssh_user=root
deployment_type=openshift-enterprise
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_hosted_metrics_deploy=true
openshift_master_default_subdomain=apps.test.example.com

[masters]
master1.example.com

[nodes]
master1.example.com openshift_node_labels="{'region': 'infra'}" openshift_schedulable=true
node1.example.com openshift_node_labels="{'region': 'primary', 'zone': 'east'}"
node2.example.com openshift_node_labels="{'region': 'primary', 'zone': 'west'}"

hawkular-cassandra rc:
apiVersion: v1
kind: ReplicationController
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"v1","kind":"ReplicationController","metadata":{"annotations":{},"creationTimestamp":"2017-07-20T17:12:22Z","generation":1,"labels":{"metrics-infra":"hawkular-cassandra","name":"hawkular-cassandra","type":"hawkular-cassandra"},"name":"hawkular-cassandra-1","namespace":"openshift-infra","selfLink":"/api/v1/namespaces/openshift-infra/replicationcontrollers/hawkular-cassandra-1","uid":"98418217-6d6e-11e7-a22e-525400f84619"},"spec":{"replicas":1,"selector":{"name":"hawkular-cassandra-1"},"template":{"metadata":{"creationTimestamp":null,"labels":{"metrics-infra":"hawkular-cassandra","name":"hawkular-cassandra-1","type":"hawkular-cassandra"}},"spec":{"containers":[{"command":["/opt/apache-cassandra/bin/cassandra-docker.sh","--cluster_name=hawkular-metrics","--data_volume=/cassandra_data","--internode_encryption=all","--require_node_auth=true","--enable_client_encryption=true","--require_client_auth=true","--keystore_file=/secret/cassandra.keystore","--keystore_password_file=/secret/cassandra.keystore.password","--truststore_file=/secret/cassandra.truststore","--truststore_password_file=/secret/cassandra.truststore.password","--cassandra_pem_file=/secret/cassandra.pem"],"env":[{"name":"CASSANDRA_MASTER","value":"true"},{"name":"CASSANDRA_DATA_VOLUME","value":"/cassandra_data"},{"name":"JVM_OPTS","value":"-Dcassandra.commitlog.ignorereplayerrors=true"},{"name":"POD_NAMESPACE","valueFrom":{"fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}},{"name":"MEMORY_LIMIT","valueFrom":{"resourceFieldRef":{"divisor":"0","resource":"limits.memory"}}},{"name":"CPU_LIMIT","valueFrom":{"resourceFieldRef":{"divisor":"1m","resource":"limits.cpu"}}}],"image":"registry.access.redhat.com/openshift3/metrics-cassandra:3.5.0","imagePullPolicy":"IfNotPresent","lifecycle":{"postStart":{"exec":{"command":["/opt/apache-cassandra/bin/cassandra-poststart.sh"]}},"preStop":{"exec":{"command":["/opt/apache-cassandra/bin/cassandra-prestop.sh"]}}},"name":"hawkular-cassandra-1","ports":[{"containerPort":9042,"name":"cql-port","protocol":"TCP"},{"containerPort":9160,"name":"thift-port","protocol":"TCP"},{"containerPort":7000,"name":"tcp-port","protocol":"TCP"},{"containerPort":7001,"name":"ssl-port","protocol":"TCP"}],"readinessProbe":{"exec":{"command":["/opt/apache-cassandra/bin/cassandra-docker-ready.sh"]},"failureThreshold":3,"periodSeconds":10,"successThreshold":1,"timeoutSeconds":1},"resources":{"limits":{"memory":"2G"},"requests":{"memory":"1G"}},"terminationMessagePath":"/dev/termination-log","volumeMounts":[{"mountPath":"/cassandra_data","name":"cassandra-data"},{"mountPath":"/secret","name":"hawkular-cassandra-secrets"}]}],"dnsPolicy":"ClusterFirst","restartPolicy":"Always","securityContext":{"supplementalGroups":[65534]},"serviceAccount":"cassandra","serviceAccountName":"cassandra","terminationGracePeriodSeconds":30,"volumes":[{"emptyDir":{},"name":"cassandra-data"},{"name":"hawkular-cassandra-secrets","secret":{"defaultMode":420,"secretName":"hawkular-cassandra-secrets"}}]}}},"status":{"observedGeneration":1,"replicas":0}}'
  creationTimestamp: null
  generation: 1
  labels:
    metrics-infra: hawkular-cassandra
    name: hawkular-cassandra
    type: hawkular-cassandra
  name: hawkular-cassandra-1
spec:
  replicas: 1
  selector:
    name: hawkular-cassandra-1
  template:
    metadata:
      creationTimestamp: null
      labels:
        metrics-infra: hawkular-cassandra
        name: hawkular-cassandra-1
        type: hawkular-cassandra
    spec:
      containers:
      - command:
        - /opt/apache-cassandra/bin/cassandra-docker.sh
        - --cluster_name=hawkular-metrics
        - --data_volume=/cassandra_data
        - --internode_encryption=all
        - --require_node_auth=true
        - --enable_client_encryption=true
        - --require_client_auth=true
        - --keystore_file=/secret/cassandra.keystore
        - --keystore_password_file=/secret/cassandra.keystore.password
        - --truststore_file=/secret/cassandra.truststore
        - --truststore_password_file=/secret/cassandra.truststore.password
        - --cassandra_pem_file=/secret/cassandra.pem
        env:
        - name: CASSANDRA_MASTER
          value: "true"
        - name: CASSANDRA_DATA_VOLUME
          value: /cassandra_data
        - name: JVM_OPTS
          value: -Dcassandra.commitlog.ignorereplayerrors=true
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: MEMORY_LIMIT
          valueFrom:
            resourceFieldRef:
              divisor: "0"
              resource: limits.memory
        - name: CPU_LIMIT
          valueFrom:
            resourceFieldRef:
              divisor: 1m
              resource: limits.cpu
        image: registry.access.redhat.com/openshift3/metrics-cassandra:3.5.0
        imagePullPolicy: IfNotPresent
        lifecycle:
          postStart:
            exec:
              command:
              - /opt/apache-cassandra/bin/cassandra-poststart.sh
          preStop:
            exec:
              command:
              - /opt/apache-cassandra/bin/cassandra-prestop.sh
        name: hawkular-cassandra-1
        ports:
        - containerPort: 9042
          name: cql-port
          protocol: TCP
        - containerPort: 9160
          name: thift-port
          protocol: TCP
        - containerPort: 7000
          name: tcp-port
          protocol: TCP
        - containerPort: 7001
          name: ssl-port
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /opt/apache-cassandra/bin/cassandra-docker-ready.sh
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            memory: 2G
          requests:
            memory: 1G
        terminationMessagePath: /dev/termination-log
        volumeMounts:
        - mountPath: /cassandra_data
          name: cassandra-data
        - mountPath: /secret
          name: hawkular-cassandra-secrets
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      securityContext:
        supplementalGroups:
        - 65534
      serviceAccount: cassandra
      serviceAccountName: cassandra
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: cassandra-data
   - name: hawkular-cassandra-secrets
        secret:
          defaultMode: 420
          secretName: hawkular-cassandra-secrets
status:
  replicas: 0

Comment 2 Javier Ramirez 2017-07-21 12:11:25 UTC
I can't reproduce this bug with latest versions:

atomic-openshift-node-3.5.5.31-1.git.0.b6f55a2.el7.x86_64
atomic-openshift-clients-3.5.5.31-1.git.0.b6f55a2.el7.x86_64
atomic-openshift-master-3.5.5.31-1.git.0.b6f55a2.el7.x86_64
tuned-profiles-atomic-openshift-node-3.5.5.31-1.git.0.b6f55a2.el7.x86_64
atomic-openshift-utils-3.5.91-1.git.0.28b3ddb.el7.noarch
atomic-openshift-docker-excluder-3.5.5.31-1.git.0.b6f55a2.el7.noarch
atomic-openshift-excluder-3.5.5.31-1.git.0.b6f55a2.el7.noarch
atomic-openshift-3.5.5.31-1.git.0.b6f55a2.el7.x86_64
atomic-openshift-sdn-ovs-3.5.5.31-1.git.0.b6f55a2.el7.x86_64


        image: registry.access.redhat.com/openshift3/metrics-cassandra:3.5.0
        image: registry.access.redhat.com/openshift3/metrics-hawkular-metrics:3.5.0
        image: registry.access.redhat.com/openshift3/metrics-heapster:3.5.0

# cat /etc/ansible/hosts 
[OSEv3:children]
masters
nodes

[OSEv3:vars]
ansible_ssh_user=root
deployment_type=openshift-enterprise
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]
openshift_hosted_metrics_deploy=true
openshift_master_default_subdomain=apps.test.example.com

[masters]
master35.example.com

[nodes]
master35.example.com 
nodeinfra.example.com openshift_node_labels="{'region': 'infra'}"
nodeprimary.example.com openshift_node_labels="{'region': 'primary'}"

So closing it now.