Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1529532

Summary: [RFE] etcd ca should be available on all nodes after installation
Product: OpenShift Container Platform Reporter: daniel <dmoessne>
Component: InstallerAssignee: Russell Teague <rteague>
Installer sub component: openshift-ansible QA Contact: Johnny Liu <jialiu>
Status: CLOSED WONTFIX Docs Contact:
Severity: low    
Priority: low CC: aos-bugs, jokerman, jolee, jrosenta, mharri, mmccomas, rteague, sdodson, sreber
Version: 3.5.0Keywords: Reopened
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-13 15:54:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description daniel 2017-12-28 14:34:40 UTC
Description of problem:

When OCP is installed only on the first node ansible picks is /etc/etcd/ca/ created/populated and from there all certs are created. However, in case one forgets to backup this dir (and doc is not very clear, see bz 1529522) and this very master fails the only way is to recreate all etcd certs via playbook (/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-etcd-ca.yml) to get a new CA and then certs. 
This is causing a (short) outage which could be an issue in heavily used clusters and could be avoided if the CA is also present on all other masters as then one 'just' needs to create new certs for a recovered or new master.

Version-Release number of the following components:
# rpm -q openshift-ansible
openshift-ansible-3.6.173.0.75-1.git.0.0a44128.el7.noarch

# rpm -q ansible
ansible-2.4.1.0-1.el7.noarch

# ansible --version
ansible 2.4.1.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]


How reproducible:

Steps to Reproduce:
1. install openshift as described by docs (adv inst)
2. check on all masters /etc/etcd/ca/
3. only on one ca is present

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated
n/a

but checking /etc/etcd/ca/ on all masters
[root@test150 ~]# for host in test152 test153 test154; do ssh $host ls -la /etc/etcd/ca/;done
total 48
drwx------. 5 root root   212 Dec  1 09:26 .
drwx------. 4 etcd etcd   215 Dec  1 09:20 ..
-rw-r--r--. 7 root root  1895 Dec  1 09:19 ca.crt
-rw-r--r--. 1 root root  3272 Dec  1 09:19 ca.key
drwx------. 2 root root   132 Dec  1 09:26 certs
drwx------. 2 root root     6 Dec  1 09:19 crl
drwx------. 2 root root    51 Dec  1 09:19 fragments
-rw-r--r--. 1 root root   522 Dec  1 09:26 index.txt
-rw-r--r--. 1 root root    20 Dec  1 09:26 index.txt.attr
-rw-r--r--. 1 root root    20 Dec  1 09:26 index.txt.attr.old
-rw-r--r--. 1 root root   464 Dec  1 09:26 index.txt.old
-rw-r--r--. 1 root root 12547 Dec  1 09:19 openssl.cnf
-rw-r--r--. 1 root root     3 Dec  1 09:26 serial
-rw-r--r--. 1 root root     3 Dec  1 09:26 serial.old
total 0
drwxr-xr-x. 2 root root   6 Dec  1 09:20 .
drwx------. 3 etcd etcd 192 Dec  1 09:20 ..
total 0
drwxr-xr-x. 2 root root   6 Dec  1 09:20 .
drwx------. 3 etcd etcd 193 Dec  1 09:20 ..
[root@test150 ~]# 


Expected results:
CA should be there on all masters which makes it easier to recover in case it has been missed to backup.


Additional info:

When running 
[root@test150 ~]# ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-etcd-ca.yml

[...]


afterwards, ca is present on all masters:
[root@test150 ~]# for host in test152 test153 test154; do ssh $host ls -la /etc/etcd/ca/;done
total 28
drwx------. 5 root root   125 Dec 28 14:13 .
drwx------. 4 etcd etcd   283 Dec 28 14:13 ..
-rw-r--r--. 1 root root  3790 Dec 28 14:13 ca.crt
-rw-r--r--. 1 root root  3272 Dec 28 14:13 ca.key
drwx------. 2 root root     6 Dec 28 14:13 certs
drwx------. 2 root root     6 Dec 28 14:13 crl
drwx------. 2 root root    51 Dec 28 14:13 fragments
-rw-r--r--. 1 root root     0 Dec 28 14:13 index.txt
-rw-r--r--. 1 root root 12547 Dec 28 14:13 openssl.cnf
-rw-r--r--. 1 root root     2 Dec 28 14:13 serial
total 28
drwx------. 5 root root   125 Dec 28 14:13 .
drwx------. 3 etcd etcd   241 Dec 28 14:13 ..
-rw-r--r--. 1 root root  3790 Dec 28 14:13 ca.crt
-rw-r--r--. 1 root root  3272 Dec 28 14:13 ca.key
drwx------. 2 root root     6 Dec 28 14:13 certs
drwx------. 2 root root     6 Dec 28 14:13 crl
drwx------. 2 root root    51 Dec 28 14:13 fragments
-rw-r--r--. 1 root root     0 Dec 28 14:13 index.txt
-rw-r--r--. 1 root root 12547 Dec 28 14:13 openssl.cnf
-rw-r--r--. 1 root root     2 Dec 28 14:13 serial
total 28
drwx------. 5 root root   125 Dec 28 14:13 .
drwx------. 3 etcd etcd   242 Dec 28 14:13 ..
-rw-r--r--. 1 root root  3790 Dec 28 14:13 ca.crt
-rw-r--r--. 1 root root  3272 Dec 28 14:13 ca.key
drwx------. 2 root root     6 Dec 28 14:13 certs
drwx------. 2 root root     6 Dec 28 14:13 crl
drwx------. 2 root root    51 Dec 28 14:13 fragments
-rw-r--r--. 1 root root     0 Dec 28 14:13 index.txt
-rw-r--r--. 1 root root 12547 Dec 28 14:13 openssl.cnf
-rw-r--r--. 1 root root     2 Dec 28 14:13 serial
[root@test150 ~]#

Comment 1 Scott Dodson 2019-01-28 19:37:57 UTC
This implementation will not be relevant in 4.0 and unless this is critical won't be fixed in 3.x.

Comment 6 Scott Dodson 2019-11-13 15:54:13 UTC
This will not be addressed. DR planning should include backing up and restoring certificate authority data.