Red Hat Bugzilla – Bug 1309192
The latest cassandra image encounter fatal exception during initialization
Last modified: 2016-09-29 22:17:15 EDT
@Brenton could this be a problem with the build of the image?
What is the output of 'oc get service hawkular-cassandra-nodes'
This also looks to be the same issue opened here: https://bugzilla.redhat.com/show_bug.cgi?id=1307170
The logs the first time Cassandra fails would be very useful here. The logs attached are the failures which occur when the Cassandra instance restarts with a left over files in the pod storage.
The issue usually occurs when Cassandra tries to connect to an invalid endpoint and ends up in an error state.
This can be caused by the 'hawkular-cassandra-nodes' service not being a headless service (it tries to connect to the service endpoint instead of the individual components behind the service).
Or if the 'hawkular-cassandra-nodes' hostname is resolving to something other than the Cassandra instances.
OK, understand. Thanks for the confirmation.
Met the similar errors in logging-deployer pod like "Invalid value: 9300: must be equal to targetPort when clusterIP = None".
Tested with below EFK images:
[chunchen@F17-CCY daily]$ oc get svc
NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE
logging-es 172.31.13.232 <none> 9200/TCP component=es,provider=openshift 11m
logging-es-ops 172.31.187.78 <none> 9200/TCP component=es-ops,provider=openshift 11m
logging-kibana 172.31.59.142 <none> 443/TCP component=kibana,provider=openshift 11m
logging-kibana-ops 172.31.83.99 <none> 443/TCP component=kibana-ops,provider=openshift 11m
Yes, this issue is also expected when running the 3.1 images on 3.2 and is due to a backwards compatibility issue.
Images meant for 3.1 will not necessarily work with 3.2 since OpenShift is not fully backwards compatible between releases.
We should hopefully have the images meant for 3.2 built soon
Can you explain the backwards incompatibility you are referring to? It's a pretty serious problem at upgrade time if 3.1 logging images don't work with 3.2. We have to be able to upgrade in a rolling manner. I'm CC'ing Jordan on this bug.
Do you have the service yaml that is considered invalid?
Nevermind, I see it
Validation change in https://github.com/openshift/origin/blob/49f17578b3ab07b9975eed0d8ed0de2122ae1d63/Godeps/_workspace/src/k8s.io/kubernetes/pkg/api/validation/validation.go#L1739-L1743
Change between (newly) invalid service definition and current valid definition:
Upstream change that tightened the validation:
Possible fix in https://github.com/openshift/origin/pull/7495
Our options are:
1. Disable the validation, continuing to allow invalid targetPort values for headless services. This will allow bad values in the system, and if anything dealing with headless services starts using the targetPort (not sure why they would, but still...) it would be invalid.
2. Override targetPort values for headless services to match the port (ignoring what is specified). This means users won't be told they're setting an invalid targetPort, and will have no feedback as to why their specified value is getting thrown out.
3. Break compatibility for invalid values that were tolerated prior to 3.2
Option 4: push for either 1 or 2 upstream. It's a compatibility issue for them too
#1 upstream sounds correct to me, because this breaks backwards compatibility for any Kube deployment.
The breaking validation change for headless service targetPort fields is being reverted in https://github.com/openshift/origin/pull/7495
The case-sensitivity change will remain, since that is part of the parser now in use for performance reasons, and because documented API fields still work correctly.
Tested on OSE 3.2 with latest metrics images, metrics pods can be running well, and this bug is fixed:
openshift3/metrics-hawkular-metrics latest 0939fae5e762
openshift3/metrics-deployer latest 5b12fd896d9d
openshift3/metrics-heapster latest 91e9f7156877
openshift3/metrics-cassandra latest 6798b0f4381a
Please change bug status to ON_QA, I will then close it. Thanks!
Set to verified based on my comment #22
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.