Description of problem:
After uninstall Metrics using ansible by set openshift_metrics_install_metrics=false, then install it again, Cassandra keeps in CrashLoopBackOff.
Check its log, can see java trace info like:
Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect
at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:780) ~[na:1.8.0_111]
at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56) ~[na:1.8.0_111]
at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224) ~[na:1.8.0_111]
at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70) ~[na:1.8.0_111]
Version-Release number of selected component (if applicable):
openshift-ansible master branch
Steps to Reproduce:
1. install Metrics using ansible by set openshift_metrics_install_metrics=true, run playbook
2. uninstall Metrics using ansible set above value to false, run playbook
3. install it again
$MASTER ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/root/.ssh/libra.pem" openshift_public_hostname=$MASTER
# oc get pod
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-v3sbs 0/1 CrashLoopBackOff 6 11m
hawkular-metrics-v907x 0/1 Running 1 12m
heapster-sm0zq 0/1 Running 1 12m
User should be able to install/uninstall many times.
ansible log is attached.
Working with your machine I could see the issue but I am unable to reproduce on a fresh AMI using:
I'm attaching the pod specs so you can see the image sha's.
Additional question is how you are getting 'openshift-ansible'? I clones the repo and ran from HEAD.
Created attachment 1248492 [details]
pods.yaml of successful install
Additionally, I am running the playbook like:
ansible-playbook -i ../inventory.enterprise playbooks/common/openshift-cluster/openshift_metrics.yml -e openshift_metrics_install_metrics=true
The last caused by makes it clear the issue is that the keystore password is wrong:
"Caused by: java.security.UnrecoverableKeyException: Password verification failed"
Grabbing the keystore and keystore password from the secret does validate that the password is incorrect.
I think this is a bug with ansible creating the secrets.
This may be related to the certs directory that remains on the master node. I'll investigate first thing tomorrow.
(In reply to Jeff Cantrill from comment #3)
> Working with your machine I could see the issue but I am unable to reproduce
> on a fresh AMI using:
> 127.0.0.1 ansible_connection=local
> Server https://172.18.3.23:8443
> openshift v188.8.131.52+c55cf2b
> kubernetes v1.5.2+43a9be4
> I'm attaching the pod specs so you can see the image sha's.
> Additional question is how you are getting 'openshift-ansible'? I clones the
> repo and ran from HEAD.
Yes, I also clone the repo and ran with the master branch.
Confirmed on the referenced server, though not sure why I am unable to reproduce locally. The issue is certs and passwords are stored under the master config directory and we are probably regenerating the keystores using new passwords but not writing them back out to the secrets.
fixed in https://github.com/openshift/openshift-ansible/pull/3297
Commits pushed to master at https://github.com/openshift/openshift-ansible
bug 1419962. fix openshift_metrics pwd issue after reinstall where cassandra has incorrect pwd exception
Merge pull request #3297 from jcantrill/bz_1419962_cassandra_pwd_failure
bug 1419962. fix openshift_metrics pwd issue after reinstall where ca…
verified with master branch, tried install/uninstall cycle several times, Metrics could be installed/uninstalled correctly.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.