Bug 1563376

Summary: [3.7] Updating etcd does not update the etcd config with new variables
Product: OpenShift Container Platform Reporter: Russell Teague <rteague>
Component: InstallerAssignee: Russell Teague <rteague>
Status: CLOSED ERRATA QA Contact: Gaoyun Pei <gpei>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.7.0CC: aos-bugs, bleanhar, gsapienz, jialiu, jokerman, mmccomas, nnosenzo, openshift-bugs-escalate, rhowe
Target Milestone: ---   
Target Release: 3.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
In certain cases, an existing etcd installation may not have updated configuration variables causing services to fail. This ensures the etcd.conf file is verified during upgrades to ensure all variables are set as expected.
Story Points: ---
Clone Of: 1529575 Environment:
Last Closed: 2018-06-27 07:59:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1529575    
Bug Blocks: 1563375    

Description Russell Teague 2018-04-03 18:53:20 UTC
+++ This bug was initially created as a clone of Bug #1529575 +++

Description of problem:

ETCD_CA_FILE is deprecated and replaced by ETCD_TRUSTED_CA_FILE
ETCD_PEER_CA_FILE is deprecated and replaced by ETCD_PEER_TRUSTED_CA_FILE


```
#[cluster]
ETCD_QUOTA_BACKEND_BYTES=4294967296

#[security]
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_AUTH="true"
```
https://coreos.com/etcd/docs/latest/v2/configuration.html#security-flags



Version-Release number of the following components:
3.6 and 3.7 migrate playbooks. 

Actual results:
/etc/etcd/etcd.conf is not updated


Expected results:

The following values added: 
```
#[cluster]
ETCD_QUOTA_BACKEND_BYTES=4294967296

#[security]
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_AUTH="true"
```

The following removed:
```
ETCD_CA_FILE
ETCD_PEER_CA_FILE

```

--- Additional comment from Scott Dodson on 2018-01-02 09:29:19 EST ---

Is there actually any consequence of not updating them in version 3.2 of etcd? ... as in everything works without error as it is currently, correct?

--- Additional comment from Ryan Howe on 2018-02-07 09:45:02 EST ---

(In reply to Scott Dodson from comment #1)
> Is there actually any consequence of not updating them in version 3.2 of
> etcd? ... as in everything works without error as it is currently, correct?

Everything works now, but I we need to make sure that these values get set, if the we leave these unchanged are you 100% that a future updated of etcd will not result in issues due to these not getting set? 

When we do an update or migration we must make sure all installs are configured with the same needed variables. If not a future update might cause a production down due to missing this step.

--- Additional comment from Ryan Howe on 2018-02-07 10:18:31 EST ---

With out setting the following to true 

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"

The code defaults to false. Not sure what the impact of this is. 

https://github.com/coreos/etcd/blob/master/etcdmain/config.go#L180
https://github.com/coreos/etcd/blob/master/etcdmain/config.go#L187


Also a correction to the above. 

 ETCD_PEER_CERT_AUTH in the comments above should be ETCD_PEER_CLIENT_CERT_AUTH 
       *3.6 and 3.7 installer sets this correctly its just my typo.

--- Additional comment from Nicolas Nosenzo on 2018-03-07 05:38:11 EST ---

 I just tested this by running a plain "yum update" in a 3.5 cluster running etcd3.1, and it crashed while throwing messages about authority certificate issues (caused for those deprecated variables).

Increasing the priority as we have seen this as well on a customer production environment where the etcd cluster was upgraded from 3.1 to 3.2, as a result the etcd cluster got broken (As the etcd is not included within the "atomic-excluder" package, I'm re-opening BZ 1493034) for this.)

--- Additional comment from Scott Dodson on 2018-03-07 09:10:34 EST ---

If setting these two variables alleviates the problem then please set them as a workaround for now.

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"

--- Additional comment from Scott Dodson on 2018-03-12 14:53:41 EDT ---

We need to add all of the uncommented variables from https://github.com/openshift/openshift-ansible/commit/7c96c92cc3a71a8d00494b2e177afc3e130a58d4 during upgrades via the lineinfile module or potentially re-evaluate the template but that may be risky as the inputs to openshift-ansible may have changed or we may not evaluate all facts.

We should also go ahead and audit 3.3 config file changes and get those in. We have no immediate plans to push customers to upgrade to 3.3 but I imagine we'll roll that out in an errata within the year.

--- Additional comment from Scott Dodson on 2018-03-29 09:43:47 EDT ---

Summarizing:

During 3.6 and later upgrades we need to assert that the following configuration lines exist in /etc/etcd/etcd.conf

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt

Also need to backport the commit in comment 6 to release-3.6 to ensure that new installs of 3.6 get the required configuration items.

--- Additional comment from Russell Teague on 2018-03-29 11:09:13 EDT ---

Commit in comment 6 is already backported in [1] so new installs of 3.6 should set the correct values.


[1] https://github.com/openshift/openshift-ansible/pull/5424

--- Additional comment from Russell Teague on 2018-03-29 14:46:40 EDT ---

master: https://github.com/openshift/openshift-ansible/pull/7711

Comment 1 Russell Teague 2018-04-03 18:57:08 UTC
release-3.7: https://github.com/openshift/openshift-ansible/pull/7755

Comment 2 Russell Teague 2018-04-04 18:10:33 UTC
New release-3.7 PR: https://github.com/openshift/openshift-ansible/pull/7780

Comment 3 Russell Teague 2018-04-11 12:29:14 UTC
Commit is in build openshift-ansible-3.7.43-1.git.0.176ff8d.el7

Comment 4 Gaoyun Pei 2018-04-16 10:48:55 UTC
Prepare an openshift v3.6.173.0.21 cluster with etcd-3.1.9-2 using openshift-ansible-3.6.173.0.21-2.git.0.44a4038.el7.noarch.rpm, which still has the deprecated etcd options: "ETCD_CA_FILE", "ETCD_PEER_CA_FILE" in etcd conf.


Run 3.6 -> 3.7 upgrade using openshift-ansible-3.7.44-1.git.0.dbb912c.el7.noarch.

ansible-playbook -i host/host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_7/upgrade.yml

After upgrade finished, check etcd status.

[root@ip-172-18-3-39 ~]# ETCDCTL_API=3 etcdctl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --endpoints  https://`hostname`:2379  -w table endpoint status
+------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
|                 ENDPOINT                 |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://ip-172-18-3-39.ec2.internal:2379 | cd8fe9e886d1558e |  3.2.15 |   11 MB |      true |         3 |      12152 |
+------------------------------------------+------------------+---------+---------+-----------+-----------+------------+

[root@ip-172-18-3-39 ~]# rpm -q etcd
etcd-3.2.15-2.el7.x86_64

2. Check etcd conf file, it's the same as fresh 3.7 install.
The following values added: 
ETCD_QUOTA_BACKEND_BYTES=4294967296
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt

The following removed:
ETCD_CA_FILE
ETCD_PEER_CA_FILE

[root@ip-172-18-3-39 ~]# diff /etc/etcd/etcd.conf /etc/etcd/etcd.conf_back
27a28
> ETCD_CA_FILE=/etc/etcd/ca.crt
29a31
> ETCD_PEER_CA_FILE=/etc/etcd/ca.crt
34,38d35
< ETCD_QUOTA_BACKEND_BYTES=4294967296
< ETCD_CLIENT_CERT_AUTH="true"
< ETCD_PEER_CLIENT_CERT_AUTH="true"
< ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
< ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt


3. Create and delete project
[root@ip-172-18-12-110 ~]# oc new-project test1
Now using project "test1" on server "https://ip-172-18-12-110.ec2.internal:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.
[root@ip-172-18-12-110 ~]# oc delete project test1
project "test1" deleted

Comment 6 errata-xmlrpc 2018-06-27 07:59:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2009