Bug 1419962 - [IntService_public_295] After clean then install, Cassandra show keystore/password error
Summary: [IntService_public_295] After clean then install, Cassandra show keystore/pas...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Jeff Cantrill
QA Contact: Peng Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-07 14:09 UTC by Peng Li
Modified: 2017-07-24 14:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-04-12 18:49:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
pods.yaml of successful install (16.07 KB, text/plain)
2017-02-07 19:16 UTC, Jeff Cantrill
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0903 0 normal SHIPPED_LIVE OpenShift Container Platform atomic-openshift-utils bug fix and enhancement 2017-04-12 22:45:42 UTC

Description Peng Li 2017-02-07 14:09:59 UTC
Description of problem:
After uninstall Metrics using ansible by set openshift_metrics_install_metrics=false, then install it again, Cassandra keeps in CrashLoopBackOff.
Check its log, can see java trace info like:
Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect
	at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:780) ~[na:1.8.0_111]
	at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56) ~[na:1.8.0_111]
	at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224) ~[na:1.8.0_111]
	at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70) ~[na:1.8.0_111]

Version-Release number of selected component (if applicable):
OCP 3.5
openshift-ansible master branch

How reproducible:
always

Steps to Reproduce:
1. install Metrics using ansible by set openshift_metrics_install_metrics=true, run playbook
2. uninstall Metrics using ansible set above value to false, run playbook
3. install it again 

inventory file

[oo_first_master]
$MASTER ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/root/.ssh/libra.pem" openshift_public_hostname=$MASTER

[oo_first_master:vars]
deployment_type=openshift-enterprise
openshift_release=v3.5.0

openshift_metrics_install_metrics=true

openshift_metrics_hawkular_hostname=hawkular-metrics.$SUBDOMAIN
openshift_metrics_project=openshift-infra


openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
openshift_metrics_image_version=3.5.0

Actual results:
# oc get pod
NAME                         READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-v3sbs   0/1       CrashLoopBackOff   6          11m
hawkular-metrics-v907x       0/1       Running            1          12m
heapster-sm0zq               0/1       Running            1          12m


Expected results:
User should be able to install/uninstall many times.

Additional info:
ansible log is attached.

Comment 3 Jeff Cantrill 2017-02-07 19:15:20 UTC
Working with your machine I could see the issue but I am unable to reproduce on a fresh AMI using:

[oo_first_master]
127.0.0.1 ansible_connection=local

[oo_first_master:vars]
ansible_user=root 
ansible_ssh_user=vagrant
ansible_ssh_private_key_file=/home/jeff.cantrill/.ssh/id_rsa 
ansible_become=true
containerized=true
docker_protect_installed_version=true
deployment_type=openshift-enterprise
openshift_release=v3.5
openshift_logging_install_logging=true
required_packages=[]


openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
openshift_metrics_image_version=3.5.0
openshift_metrics_hawkular_hostname=hawkular-metrics.54.210.49.58.xip.io

Running:
Server https://172.18.3.23:8443
openshift v3.5.0.17+c55cf2b
kubernetes v1.5.2+43a9be4

I'm attaching the pod specs so you can see the image sha's.

Additional question is how you are getting 'openshift-ansible'? I clones the repo and ran from HEAD.

Comment 4 Jeff Cantrill 2017-02-07 19:16:09 UTC
Created attachment 1248492 [details]
pods.yaml of successful install

Comment 5 Jeff Cantrill 2017-02-07 19:16:53 UTC
Additionally, I am running the playbook like: 

ansible-playbook -i ../inventory.enterprise playbooks/common/openshift-cluster/openshift_metrics.yml -e openshift_metrics_install_metrics=true

Comment 6 Matt Wringe 2017-02-07 19:57:42 UTC
The last caused by makes it clear the issue is that the keystore password is wrong:

"Caused by: java.security.UnrecoverableKeyException: Password verification failed"

Grabbing the keystore and keystore password from the secret does validate that the password is incorrect.

I think this is a bug with ansible creating the secrets.

Comment 7 Jeff Cantrill 2017-02-08 00:57:41 UTC
This may be related to the certs directory that remains on the master node.  I'll investigate first thing tomorrow.

Comment 8 Peng Li 2017-02-08 12:50:08 UTC
(In reply to Jeff Cantrill from comment #3)
> Working with your machine I could see the issue but I am unable to reproduce
> on a fresh AMI using:
> 
> [oo_first_master]
> 127.0.0.1 ansible_connection=local
> 
> [oo_first_master:vars]
> ansible_user=root 
> ansible_ssh_user=vagrant
> ansible_ssh_private_key_file=/home/jeff.cantrill/.ssh/id_rsa 
> ansible_become=true
> containerized=true
> docker_protect_installed_version=true
> deployment_type=openshift-enterprise
> openshift_release=v3.5
> openshift_logging_install_logging=true
> required_packages=[]
> 
> 
> openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
> openshift_metrics_image_version=3.5.0
> openshift_metrics_hawkular_hostname=hawkular-metrics.54.210.49.58.xip.io
> 
> Running:
> Server https://172.18.3.23:8443
> openshift v3.5.0.17+c55cf2b
> kubernetes v1.5.2+43a9be4
> 
> I'm attaching the pod specs so you can see the image sha's.
> 
> Additional question is how you are getting 'openshift-ansible'? I clones the
> repo and ran from HEAD.

Yes, I also clone the repo and ran with the master branch.

Comment 9 Jeff Cantrill 2017-02-08 17:04:13 UTC
Confirmed on the referenced server, though not sure why I am unable to reproduce locally.  The issue is certs and passwords are stored under the master config directory and we are probably regenerating the keystores using new passwords but not writing them back out to the secrets.

Comment 10 Jeff Cantrill 2017-02-08 20:49:45 UTC
fixed in https://github.com/openshift/openshift-ansible/pull/3297

Comment 11 openshift-github-bot 2017-02-10 14:13:20 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/7d081c4b321971cc499a4fc499ad1bbaceea823f
bug 1419962. fix openshift_metrics pwd issue after reinstall where cassandra has incorrect pwd exception

https://github.com/openshift/openshift-ansible/commit/f4d7caa7f0a24037adc2f56b2020e3aaec79d938
Merge pull request #3297 from jcantrill/bz_1419962_cassandra_pwd_failure

bug 1419962. fix openshift_metrics pwd issue after reinstall where ca…

Comment 13 Peng Li 2017-02-13 03:34:41 UTC
verified with master branch, tried install/uninstall cycle several times, Metrics could be installed/uninstalled correctly.

Comment 15 errata-xmlrpc 2017-04-12 18:49:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903


Note You need to log in before you can comment on or make changes to this bug.