Created attachment 1326542 [details] hawkular logs and configuration Description of problem: Hawkular is not becoming ready and after reading the logs (attached) I have spotted two errors: * The first error seems to come after an alter table: 2017-09-13 11:46:29,930 INFO [org.cassalog.core.CassalogImpl] (ServerService Thread Pool -- 71) Applying ChangeSet -- version: 4.0 ALTER TABLE conditions ADD activeRules set<text> -- 2017-09-13 11:46:29,976 ERROR [org.jboss.as.ejb3.invocation] (ServerService Thread Pool -- 71) WFLYEJB0034: EJB Invocation failed on component CassCluster for method public com.datastax.driver.core.Session org.hawkular.alerts.engine.impl.CassCluster.getSession(): javax.ejb.EJBException: java.lang.IllegalStateException: WFLYEE0042: Failed to construct component instance at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInOurTx(CMTTxInterceptor.java:187) ... Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Invalid column name activerules because it conflicts with an existing column Version-Release number of selected component (if applicable): * The second one is about a missing class that should be included (confirmed myself) in the netty-all-4.0.35.Final-redhat-1.jar dependency 2017-09-13 11:47:00,496 WARN [io.netty.channel.DefaultChannelPipeline] (cluster3-nio-worker-0) An exception was thrown by a user handler's exceptionCaught() method while handling the following exception:: java.lang.NoClassDefFoundError: Could not initialize class io.netty.handler.timeout.IdleStateEvent How reproducible: Only on customer environment Additional info: - nodetool shows all three nodes up/normal - service-IP of hawkular-cassandra is resolved and requests seem to be forwarded: > sh-4.2$ curl http://hawkular-cassandra:9042 > curl: (56) Recv failure: Connection reset by peer > sh-4.2$ curl http://hawkular-cassandra:7000 > curl: (7) Failed connect to hawkular-cassandra:7000; Connection refused - cassandra cluster removed and recreated from scratch --> still same error in Hawkular Hawkular - scale down/scale up -> still same error - deleted and new replica started by rc -> still same error
I have requested the description of the output of: $ cqlsh --ssl -e "describe table hawkular_alerts.conditions" And confirm the netty-all library is included in the hawkular pod.
The versions of the images: metrics-hawkular-metrics-3.5.0-37 metrics-heapster-3.5.0-27 metrics-cassandra-3.5.0-34 (3 instances)
*** Bug 1482099 has been marked as a duplicate of this bug. ***