Bug 1419962

Summary: [IntService_public_295] After clean then install, Cassandra show keystore/password error
Product: OpenShift Container Platform Reporter: Peng Li <penli>
Component: InstallerAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Peng Li <penli>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.5.0CC: aos-bugs, jokerman, mmccomas, penli
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-12 18:49:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
pods.yaml of successful install none

Description Peng Li 2017-02-07 14:09:59 UTC
Description of problem:
After uninstall Metrics using ansible by set openshift_metrics_install_metrics=false, then install it again, Cassandra keeps in CrashLoopBackOff.
Check its log, can see java trace info like:
Caused by: java.io.IOException: Keystore was tampered with, or password was incorrect
	at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:780) ~[na:1.8.0_111]
	at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:56) ~[na:1.8.0_111]
	at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224) ~[na:1.8.0_111]
	at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:70) ~[na:1.8.0_111]

Version-Release number of selected component (if applicable):
OCP 3.5
openshift-ansible master branch

How reproducible:
always

Steps to Reproduce:
1. install Metrics using ansible by set openshift_metrics_install_metrics=true, run playbook
2. uninstall Metrics using ansible set above value to false, run playbook
3. install it again 

inventory file

[oo_first_master]
$MASTER ansible_user=root ansible_ssh_user=root ansible_ssh_private_key_file="/root/.ssh/libra.pem" openshift_public_hostname=$MASTER

[oo_first_master:vars]
deployment_type=openshift-enterprise
openshift_release=v3.5.0

openshift_metrics_install_metrics=true

openshift_metrics_hawkular_hostname=hawkular-metrics.$SUBDOMAIN
openshift_metrics_project=openshift-infra


openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
openshift_metrics_image_version=3.5.0

Actual results:
# oc get pod
NAME                         READY     STATUS             RESTARTS   AGE
hawkular-cassandra-1-v3sbs   0/1       CrashLoopBackOff   6          11m
hawkular-metrics-v907x       0/1       Running            1          12m
heapster-sm0zq               0/1       Running            1          12m


Expected results:
User should be able to install/uninstall many times.

Additional info:
ansible log is attached.

Comment 3 Jeff Cantrill 2017-02-07 19:15:20 UTC
Working with your machine I could see the issue but I am unable to reproduce on a fresh AMI using:

[oo_first_master]
127.0.0.1 ansible_connection=local

[oo_first_master:vars]
ansible_user=root 
ansible_ssh_user=vagrant
ansible_ssh_private_key_file=/home/jeff.cantrill/.ssh/id_rsa 
ansible_become=true
containerized=true
docker_protect_installed_version=true
deployment_type=openshift-enterprise
openshift_release=v3.5
openshift_logging_install_logging=true
required_packages=[]


openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
openshift_metrics_image_version=3.5.0
openshift_metrics_hawkular_hostname=hawkular-metrics.54.210.49.58.xip.io

Running:
Server https://172.18.3.23:8443
openshift v3.5.0.17+c55cf2b
kubernetes v1.5.2+43a9be4

I'm attaching the pod specs so you can see the image sha's.

Additional question is how you are getting 'openshift-ansible'? I clones the repo and ran from HEAD.

Comment 4 Jeff Cantrill 2017-02-07 19:16:09 UTC
Created attachment 1248492 [details]
pods.yaml of successful install

Comment 5 Jeff Cantrill 2017-02-07 19:16:53 UTC
Additionally, I am running the playbook like: 

ansible-playbook -i ../inventory.enterprise playbooks/common/openshift-cluster/openshift_metrics.yml -e openshift_metrics_install_metrics=true

Comment 6 Matt Wringe 2017-02-07 19:57:42 UTC
The last caused by makes it clear the issue is that the keystore password is wrong:

"Caused by: java.security.UnrecoverableKeyException: Password verification failed"

Grabbing the keystore and keystore password from the secret does validate that the password is incorrect.

I think this is a bug with ansible creating the secrets.

Comment 7 Jeff Cantrill 2017-02-08 00:57:41 UTC
This may be related to the certs directory that remains on the master node.  I'll investigate first thing tomorrow.

Comment 8 Peng Li 2017-02-08 12:50:08 UTC
(In reply to Jeff Cantrill from comment #3)
> Working with your machine I could see the issue but I am unable to reproduce
> on a fresh AMI using:
> 
> [oo_first_master]
> 127.0.0.1 ansible_connection=local
> 
> [oo_first_master:vars]
> ansible_user=root 
> ansible_ssh_user=vagrant
> ansible_ssh_private_key_file=/home/jeff.cantrill/.ssh/id_rsa 
> ansible_become=true
> containerized=true
> docker_protect_installed_version=true
> deployment_type=openshift-enterprise
> openshift_release=v3.5
> openshift_logging_install_logging=true
> required_packages=[]
> 
> 
> openshift_metrics_image_prefix=registry.ops.openshift.com/openshift3/
> openshift_metrics_image_version=3.5.0
> openshift_metrics_hawkular_hostname=hawkular-metrics.54.210.49.58.xip.io
> 
> Running:
> Server https://172.18.3.23:8443
> openshift v3.5.0.17+c55cf2b
> kubernetes v1.5.2+43a9be4
> 
> I'm attaching the pod specs so you can see the image sha's.
> 
> Additional question is how you are getting 'openshift-ansible'? I clones the
> repo and ran from HEAD.

Yes, I also clone the repo and ran with the master branch.

Comment 9 Jeff Cantrill 2017-02-08 17:04:13 UTC
Confirmed on the referenced server, though not sure why I am unable to reproduce locally.  The issue is certs and passwords are stored under the master config directory and we are probably regenerating the keystores using new passwords but not writing them back out to the secrets.

Comment 10 Jeff Cantrill 2017-02-08 20:49:45 UTC
fixed in https://github.com/openshift/openshift-ansible/pull/3297

Comment 11 openshift-github-bot 2017-02-10 14:13:20 UTC
Commits pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/7d081c4b321971cc499a4fc499ad1bbaceea823f
bug 1419962. fix openshift_metrics pwd issue after reinstall where cassandra has incorrect pwd exception

https://github.com/openshift/openshift-ansible/commit/f4d7caa7f0a24037adc2f56b2020e3aaec79d938
Merge pull request #3297 from jcantrill/bz_1419962_cassandra_pwd_failure

bug 1419962. fix openshift_metrics pwd issue after reinstall where ca…

Comment 13 Peng Li 2017-02-13 03:34:41 UTC
verified with master branch, tried install/uninstall cycle several times, Metrics could be installed/uninstalled correctly.

Comment 15 errata-xmlrpc 2017-04-12 18:49:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903