Description of problem: After uninstall Metrics using ansible by set openshift_metrics_install_metrics=false, then install it again, Cassandra keeps in CrashLoopBackOff. Check its log, can see java trace info like: Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:780) ~[na:1.8.0_111] at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56) ~[na:1.8.0_111] at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224) ~[na:1.8.0_111] at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70) ~[na:1.8.0_111] Version-Release number of selected component (if applicable): OCP 3.5 openshift-ansible master branch How reproducible: always Steps to Reproduce: 1. install Metrics using ansible by set openshift_metrics_install_metrics=true, run playbook 2. uninstall Metrics using ansible set above value to false, run playbook 3. install it again inventory file [oo_first_master] $MASTER ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/root/.ssh/libra.pem" openshift_public_hostname=$MASTER [oo_first_master:vars] deployment_type=openshift-enterprise openshift_release=v3.5.0 openshift_metrics_install_metrics=true openshift_metrics_hawkular_hostname=hawkular-metrics.$SUBDOMAIN openshift_metrics_project=openshift-infra openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/ openshift_metrics_image_version=3.5.0 Actual results: # oc get pod NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-v3sbs 0/1 CrashLoopBackOff 6 11m hawkular-metrics-v907x 0/1 Running 1 12m heapster-sm0zq 0/1 Running 1 12m Expected results: User should be able to install/uninstall many times. Additional info: ansible log is attached.
Working with your machine I could see the issue but I am unable to reproduce on a fresh AMI using: [oo_first_master] 127.0.0.1 ansible_connection=local [oo_first_master:vars] ansible_user=root ansible_ssh_user=vagrant ansible_ssh_private_key_file=/home/jeff.cantrill/.ssh/id_rsa ansible_become=true containerized=true docker_protect_installed_version=true deployment_type=openshift-enterprise openshift_release=v3.5 openshift_logging_install_logging=true required_packages=[] openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/ openshift_metrics_image_version=3.5.0 openshift_metrics_hawkular_hostname=hawkular-metrics.54.210.49.58.xip.io Running: Server https://172.18.3.23:8443 openshift v3.5.0.17+c55cf2b kubernetes v1.5.2+43a9be4 I'm attaching the pod specs so you can see the image sha's. Additional question is how you are getting 'openshift-ansible'? I clones the repo and ran from HEAD.
Created attachment 1248492 [details] pods.yaml of successful install
Additionally, I am running the playbook like: ansible-playbook -i ../inventory.enterprise playbooks/common/openshift-cluster/openshift_metrics.yml -e openshift_metrics_install_metrics=true
The last caused by makes it clear the issue is that the keystore password is wrong: "Caused by: java.security.UnrecoverableKeyException: Password verification failed" Grabbing the keystore and keystore password from the secret does validate that the password is incorrect. I think this is a bug with ansible creating the secrets.
This may be related to the certs directory that remains on the master node. I'll investigate first thing tomorrow.
(In reply to Jeff Cantrill from comment #3) > Working with your machine I could see the issue but I am unable to reproduce > on a fresh AMI using: > > [oo_first_master] > 127.0.0.1 ansible_connection=local > > [oo_first_master:vars] > ansible_user=root > ansible_ssh_user=vagrant > ansible_ssh_private_key_file=/home/jeff.cantrill/.ssh/id_rsa > ansible_become=true > containerized=true > docker_protect_installed_version=true > deployment_type=openshift-enterprise > openshift_release=v3.5 > openshift_logging_install_logging=true > required_packages=[] > > > openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/ > openshift_metrics_image_version=3.5.0 > openshift_metrics_hawkular_hostname=hawkular-metrics.54.210.49.58.xip.io > > Running: > Server https://172.18.3.23:8443 > openshift v3.5.0.17+c55cf2b > kubernetes v1.5.2+43a9be4 > > I'm attaching the pod specs so you can see the image sha's. > > Additional question is how you are getting 'openshift-ansible'? I clones the > repo and ran from HEAD. Yes, I also clone the repo and ran with the master branch.
Confirmed on the referenced server, though not sure why I am unable to reproduce locally. The issue is certs and passwords are stored under the master config directory and we are probably regenerating the keystores using new passwords but not writing them back out to the secrets.
fixed in https://github.com/openshift/openshift-ansible/pull/3297
Commits pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/7d081c4b321971cc499a4fc499ad1bbaceea823f bug 1419962. fix openshift_metrics pwd issue after reinstall where cassandra has incorrect pwd exception https://github.com/openshift/openshift-ansible/commit/f4d7caa7f0a24037adc2f56b2020e3aaec79d938 Merge pull request #3297 from jcantrill/bz_1419962_cassandra_pwd_failure bug 1419962. fix openshift_metrics pwd issue after reinstall where ca…
verified with master branch, tried install/uninstall cycle several times, Metrics could be installed/uninstalled correctly.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0903