Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1529532 - [RFE] etcd ca should be available on all nodes after installation
[RFE] etcd ca should be available on all nodes after installation
Status: NEW
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.5.0
Unspecified Unspecified
unspecified Severity low
: ---
: 4.0.0
Assigned To: Scott Dodson
Johnny Liu
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-12-28 09:34 EST by daniel
Modified: 2018-10-18 03:01 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 1529522 None CLOSED [DOCS] etcd backup procedure missing info to backup CA 2018-11-02 15:43 EDT
Red Hat Knowledge Base (Solution) 3630391 None None None 2018-09-28 09:30 EDT

  None (edit)
Description daniel 2017-12-28 09:34:40 EST
Description of problem:

When OCP is installed only on the first node ansible picks is /etc/etcd/ca/ created/populated and from there all certs are created. However, in case one forgets to backup this dir (and doc is not very clear, see bz 1529522) and this very master fails the only way is to recreate all etcd certs via playbook (/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-etcd-ca.yml) to get a new CA and then certs. 
This is causing a (short) outage which could be an issue in heavily used clusters and could be avoided if the CA is also present on all other masters as then one 'just' needs to create new certs for a recovered or new master.

Version-Release number of the following components:
# rpm -q openshift-ansible
openshift-ansible-3.6.173.0.75-1.git.0.0a44128.el7.noarch

# rpm -q ansible
ansible-2.4.1.0-1.el7.noarch

# ansible --version
ansible 2.4.1.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]


How reproducible:

Steps to Reproduce:
1. install openshift as described by docs (adv inst)
2. check on all masters /etc/etcd/ca/
3. only on one ca is present

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated
n/a

but checking /etc/etcd/ca/ on all masters
[root@test150 ~]# for host in test152 test153 test154; do ssh $host ls -la /etc/etcd/ca/;done
total 48
drwx------. 5 root root   212 Dec  1 09:26 .
drwx------. 4 etcd etcd   215 Dec  1 09:20 ..
-rw-r--r--. 7 root root  1895 Dec  1 09:19 ca.crt
-rw-r--r--. 1 root root  3272 Dec  1 09:19 ca.key
drwx------. 2 root root   132 Dec  1 09:26 certs
drwx------. 2 root root     6 Dec  1 09:19 crl
drwx------. 2 root root    51 Dec  1 09:19 fragments
-rw-r--r--. 1 root root   522 Dec  1 09:26 index.txt
-rw-r--r--. 1 root root    20 Dec  1 09:26 index.txt.attr
-rw-r--r--. 1 root root    20 Dec  1 09:26 index.txt.attr.old
-rw-r--r--. 1 root root   464 Dec  1 09:26 index.txt.old
-rw-r--r--. 1 root root 12547 Dec  1 09:19 openssl.cnf
-rw-r--r--. 1 root root     3 Dec  1 09:26 serial
-rw-r--r--. 1 root root     3 Dec  1 09:26 serial.old
total 0
drwxr-xr-x. 2 root root   6 Dec  1 09:20 .
drwx------. 3 etcd etcd 192 Dec  1 09:20 ..
total 0
drwxr-xr-x. 2 root root   6 Dec  1 09:20 .
drwx------. 3 etcd etcd 193 Dec  1 09:20 ..
[root@test150 ~]# 


Expected results:
CA should be there on all masters which makes it easier to recover in case it has been missed to backup.


Additional info:

When running 
[root@test150 ~]# ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-etcd-ca.yml

[...]


afterwards, ca is present on all masters:
[root@test150 ~]# for host in test152 test153 test154; do ssh $host ls -la /etc/etcd/ca/;done
total 28
drwx------. 5 root root   125 Dec 28 14:13 .
drwx------. 4 etcd etcd   283 Dec 28 14:13 ..
-rw-r--r--. 1 root root  3790 Dec 28 14:13 ca.crt
-rw-r--r--. 1 root root  3272 Dec 28 14:13 ca.key
drwx------. 2 root root     6 Dec 28 14:13 certs
drwx------. 2 root root     6 Dec 28 14:13 crl
drwx------. 2 root root    51 Dec 28 14:13 fragments
-rw-r--r--. 1 root root     0 Dec 28 14:13 index.txt
-rw-r--r--. 1 root root 12547 Dec 28 14:13 openssl.cnf
-rw-r--r--. 1 root root     2 Dec 28 14:13 serial
total 28
drwx------. 5 root root   125 Dec 28 14:13 .
drwx------. 3 etcd etcd   241 Dec 28 14:13 ..
-rw-r--r--. 1 root root  3790 Dec 28 14:13 ca.crt
-rw-r--r--. 1 root root  3272 Dec 28 14:13 ca.key
drwx------. 2 root root     6 Dec 28 14:13 certs
drwx------. 2 root root     6 Dec 28 14:13 crl
drwx------. 2 root root    51 Dec 28 14:13 fragments
-rw-r--r--. 1 root root     0 Dec 28 14:13 index.txt
-rw-r--r--. 1 root root 12547 Dec 28 14:13 openssl.cnf
-rw-r--r--. 1 root root     2 Dec 28 14:13 serial
total 28
drwx------. 5 root root   125 Dec 28 14:13 .
drwx------. 3 etcd etcd   242 Dec 28 14:13 ..
-rw-r--r--. 1 root root  3790 Dec 28 14:13 ca.crt
-rw-r--r--. 1 root root  3272 Dec 28 14:13 ca.key
drwx------. 2 root root     6 Dec 28 14:13 certs
drwx------. 2 root root     6 Dec 28 14:13 crl
drwx------. 2 root root    51 Dec 28 14:13 fragments
-rw-r--r--. 1 root root     0 Dec 28 14:13 index.txt
-rw-r--r--. 1 root root 12547 Dec 28 14:13 openssl.cnf
-rw-r--r--. 1 root root     2 Dec 28 14:13 serial
[root@test150 ~]#

Note You need to log in before you can comment on or make changes to this bug.