Bug 1471251

Summary: 3.4.1 White spaces in the cert prevents Origin Metrics from starting
Product: OpenShift Container Platform Reporter: Eric Jones <erjones>
Component: HawkularAssignee: Juraci Paixão Kröhling <jcosta>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.4.1CC: aos-bugs, cbucur, erich, erjones, hgomes, jcantril, jcosta, juzhao, mwringe, pweil, snegrea, stwalter
Target Milestone: ---   
Target Release: 3.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
When either a certificate within the chain at `serviceaccount/ca.crt` or any of the certificates within the provided truststore file contain a white space after the `BEGIN CERTIFICATE` declaration, the Java keytool rejects the certificate with an error, causing Origin Metrics to fail to start. As a workaround, Origin Metrics will now attempt to remove the spaces before feeding the certificate to the Keytool, but admins should make sure their certificates don't contain such spaces.
Story Points: ---
Clone Of:
: 1500464 1500471 1503450 (view as bug list) Environment:
Last Closed: 2017-12-07 07:10:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1500464, 1500471, 1503450    

Description Eric Jones 2017-07-14 20:08:04 UTC
Description of problem:
Brand new Cassandra failing to stay stable, restarting constantly.

The events for the project indicate:
Error syncing pod, skipping: failed to "StartContainer" for "hawkular-cassandra-1" with RunContainerError: "PostStart handler: Error executing in Docker Container: 126"


Version-Release number of selected component (if applicable):
ocp 3.4.1
heapster image 3.4.1-21
hawkular-metrics image 3.4.1-27
cassandra image 3.4.1-25

Additional info:
I will attach cassandra pod logs, openshift-infra events, RCs, and pod yamls shortly

Comment 2 Juraci Paixão Kröhling 2017-07-17 11:39:56 UTC
Would it be possible to enter a debug container, and check the script `/opt/apache-cassandra/bin/cassandra-poststart.sh` ? The following is to be checked:

1) Permissions for the script `/opt/apache-cassandra/bin/cassandra-poststart.sh` . Please, provide the output of `ls -l /opt/apache-cassandra/bin/`

2) On one terminal session connected to the debug container, please start Cassandra as it would be started by OpenShift. On a second terminal session, run the poststart script. Alternatively, you might want to just run the post start script on an existing container. One way of achieving this is to edit the RC `hawkular-cassandra-1` to remove the `postStart` command from the lifecycle. It should start fine, and then, enter the container (`oc exec -it CONTAINER_ID bash`) and execute manually the script `/opt/apache-cassandra/bin/cassandra-poststart.sh`

3) Please, confirm that it's identical to the version that ships with the official image: https://github.com/openshift/origin-metrics/blob/v1.4.1/cassandra/cassandra-poststart.sh

Comment 3 Juraci Paixão Kröhling 2017-07-17 11:48:21 UTC
This looks like a duplicate of #1447066 . Should this be closed in favor of the other one?

Comment 6 Matt Wringe 2017-07-18 13:30:20 UTC
*** Bug 1447066 has been marked as a duplicate of this bug. ***

Comment 8 Matt Wringe 2017-07-18 20:52:04 UTC
We also don't have an image pull policy of always on those RC. They may want to update the RC and set to always and then scale down and back up their Cassandra pod.

Comment 48 Codrin Bucur 2017-10-09 11:54:30 UTC
What is the status on this? Seems to be occurring also on OCP 3.5.

Comment 49 Juraci Paixão Kröhling 2017-10-09 12:02:22 UTC
Codrin, this specific report is about certificates containing white spaces at the line containing "--- BEGIN... --- ": such certificates, while valid for OpenSSL and other tools, break when using the Java Keytool. 

The real solution is to *not* have white spaces within the cert data (ie: after the "... BEGIN C...---"). The workaround we are providing on this issue is a regex that would remove such spaces before feeding the certificate data to the Java Keytool. This workaround is implemented here:

https://github.com/openshift/origin-metrics/commit/d61da9fe461d4c1a13ff375e6c6af885fed0d2b6

Comment 58 Junqi Zhao 2017-10-16 13:04:15 UTC
Tested with metrics-hawkular-metrics:3.4.1-39, verification steps please see Comment 54

Comment 60 Junqi Zhao 2017-10-19 02:31:44 UTC
Tested with metrics-hawkular-metrics:3.4.1-40.
Steps:
1. Add more spaces to the end of "-----BEGIN CERTIFICATE-----" in /etc/origin/master/ca-bundle.crt.
2. Restart server and deploy metrics 3.4 by using image metrics-hawkular-metrics:3.4.1-40.
3. #oc rsh ${HAWKULAR_METRICS_PODS};
   sh-4.2$cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

   #oc rsh ${HAWKULAR_CASSANDRA_PODS};
   sh-4.2$cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

   #oc rsh ${HEAPSTER_PODS};
   sh-4.2$cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

/var/run/secrets/kubernetes.io/serviceaccount/ca.crt is the same with /etc/origin/master/ca-bundle.crt, all have one space in the end: "-----BEGIN CERTIFICATE----- "

4. Sanity testing of Metrics, it works well.

env:
# openshift version
openshift v3.4.1.44.29
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Comment 63 errata-xmlrpc 2017-12-07 07:10:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3389