Bug 1439551

Summary: Cassandra pod is CrashLoopBackOff after metrics 3.6.0 was deployed
Product: OpenShift Container Platform Reporter: Xia Zhao <xiazhao>
Component: HawkularAssignee: Ruben Vargas Palma <rvargasp>
Status: CLOSED CURRENTRELEASE QA Contact: Junqi Zhao <juzhao>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.0CC: aos-bugs, lxia, mwringe, wsun, xxia
Target Milestone: ---Keywords: Regression
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-21 18:38:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
inventory file for metrics 3.6.0 stacks' deployment
none
the events inside openshift-infra project
none
cassandra_log
none
hawkular_log
none
heapster_log none

Description Xia Zhao 2017-04-06 08:44:43 UTC
Created attachment 1269225 [details]
inventory file for metrics 3.6.0 stacks' deployment

Description of problem:
Cassandra pod is CrashLoopBackOff after logging 3.6.0 was deployed, even the ansible script execution finished successfully:
    # oc get po
    NAME                         READY     STATUS             RESTARTS   AGE
    hawkular-cassandra-1-gknch   0/1       CrashLoopBackOff   15         58m
    hawkular-metrics-hrqn6       0/1       Running            6          58m
    heapster-3gn76               0/1       Running            6          58m
     
    # oc logs -f hawkular-cassandra-1-gknch
    ...
    INFO  [main] 2017-04-06 08:06:06,806 StorageService.java:756 - Starting up server gossip
    Unable to create ssl socket
    Fatal configuration error; unable to start server.  See log for stacktrace.
    ERROR [main] 2017-04-06 08:06:06,886 CassandraDaemon.java:709 - Fatal configuration error
    org.apache.cassandra.exceptions.ConfigurationException: Unable to create ssl socket
            at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:518) ~[apache-cassandra-3.0.9.redhat-2.jar:3.0.9.redhat-2]
            at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:498) ~[apache-cassandra-3.0.9.redhat-2.jar:3.0.9.redhat-2]

Version-Release number of selected component (if applicable):
openshift-ansible-3.6.15-1.git.0.d2b88f8.el7.noarch
openshift-ansible-playbooks-3.6.15-1.git.0.d2b88f8.el7.noarch
openshift-ansible-roles-3.6.15-1.git.0.d2b88f8.el7.noarch

metrics images on ops registry:
openshift3/metrics-cassandra                               3.6.0               8c8f72b8d25d

# openshift version
openshift v3.6.16
kubernetes v1.5.2+43a9be4
etcd 3.1.0

How reproducible:
always

Steps to Reproduce:
1.Deploy metrics 3.6.0 stacks on OCP 3.6.0 by running ansible scripts
2.Check HCH pods' status

Actual results:
Cassandra pod is CrashLoopBackOff 

Expected results:
HCH pods are in running status and metrics working fine

Additional info:
ansible inventory file attached
HCH logs & events attached

Comment 1 Xia Zhao 2017-04-06 08:46:33 UTC
Created attachment 1269227 [details]
the events inside openshift-infra project

Comment 2 Xia Zhao 2017-04-06 08:51:27 UTC
Created attachment 1269231 [details]
cassandra_log

Comment 3 Xia Zhao 2017-04-06 08:51:56 UTC
Created attachment 1269232 [details]
hawkular_log

Comment 4 Xia Zhao 2017-04-06 08:52:21 UTC
Created attachment 1269233 [details]
heapster_log

Comment 6 Xia Zhao 2017-04-06 08:54:51 UTC
Blocks metrics 3.6.0 tests.

Comment 7 Matt Wringe 2017-04-06 12:52:16 UTC
There is no 3.6.0 tagged metric components, so I don't know what exactly you are running.

If you are using something like the latest tagged images, you will need the latest openshift-ansible to install them (eg checkout it out from master).

Can you please let us know what is the tags on the metric components and what version of openshift-ansible you are using?