Bug 1563376 - [3.7] Updating etcd does not update the etcd config with new variables
Summary: [3.7] Updating etcd does not update the etcd config with new variables
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.7.z
Assignee: Russell Teague
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On: 1529575
Blocks: 1563375
TreeView+ depends on / blocked
 
Reported: 2018-04-03 18:53 UTC by Russell Teague
Modified: 2018-06-27 07:59 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
In certain cases, an existing etcd installation may not have updated configuration variables causing services to fail. This ensures the etcd.conf file is verified during upgrades to ensure all variables are set as expected.
Clone Of: 1529575
Environment:
Last Closed: 2018-06-27 07:59:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2009 0 None None None 2018-06-27 07:59:48 UTC

Description Russell Teague 2018-04-03 18:53:20 UTC
+++ This bug was initially created as a clone of Bug #1529575 +++

Description of problem:

ETCD_CA_FILE is deprecated and replaced by ETCD_TRUSTED_CA_FILE
ETCD_PEER_CA_FILE is deprecated and replaced by ETCD_PEER_TRUSTED_CA_FILE


```
#[cluster]
ETCD_QUOTA_BACKEND_BYTES=4294967296

#[security]
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_AUTH="true"
```
https://coreos.com/etcd/docs/latest/v2/configuration.html#security-flags



Version-Release number of the following components:
3.6 and 3.7 migrate playbooks. 

Actual results:
/etc/etcd/etcd.conf is not updated


Expected results:

The following values added: 
```
#[cluster]
ETCD_QUOTA_BACKEND_BYTES=4294967296

#[security]
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_AUTH="true"
```

The following removed:
```
ETCD_CA_FILE
ETCD_PEER_CA_FILE

```

--- Additional comment from Scott Dodson on 2018-01-02 09:29:19 EST ---

Is there actually any consequence of not updating them in version 3.2 of etcd? ... as in everything works without error as it is currently, correct?

--- Additional comment from Ryan Howe on 2018-02-07 09:45:02 EST ---

(In reply to Scott Dodson from comment #1)
> Is there actually any consequence of not updating them in version 3.2 of
> etcd? ... as in everything works without error as it is currently, correct?

Everything works now, but I we need to make sure that these values get set, if the we leave these unchanged are you 100% that a future updated of etcd will not result in issues due to these not getting set? 

When we do an update or migration we must make sure all installs are configured with the same needed variables. If not a future update might cause a production down due to missing this step.

--- Additional comment from Ryan Howe on 2018-02-07 10:18:31 EST ---

With out setting the following to true 

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"

The code defaults to false. Not sure what the impact of this is. 

https://github.com/coreos/etcd/blob/master/etcdmain/config.go#L180
https://github.com/coreos/etcd/blob/master/etcdmain/config.go#L187


Also a correction to the above. 

 ETCD_PEER_CERT_AUTH in the comments above should be ETCD_PEER_CLIENT_CERT_AUTH 
       *3.6 and 3.7 installer sets this correctly its just my typo.

--- Additional comment from Nicolas Nosenzo on 2018-03-07 05:38:11 EST ---

 I just tested this by running a plain "yum update" in a 3.5 cluster running etcd3.1, and it crashed while throwing messages about authority certificate issues (caused for those deprecated variables).

Increasing the priority as we have seen this as well on a customer production environment where the etcd cluster was upgraded from 3.1 to 3.2, as a result the etcd cluster got broken (As the etcd is not included within the "atomic-excluder" package, I'm re-opening BZ 1493034) for this.)

--- Additional comment from Scott Dodson on 2018-03-07 09:10:34 EST ---

If setting these two variables alleviates the problem then please set them as a workaround for now.

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"

--- Additional comment from Scott Dodson on 2018-03-12 14:53:41 EDT ---

We need to add all of the uncommented variables from https://github.com/openshift/openshift-ansible/commit/7c96c92cc3a71a8d00494b2e177afc3e130a58d4 during upgrades via the lineinfile module or potentially re-evaluate the template but that may be risky as the inputs to openshift-ansible may have changed or we may not evaluate all facts.

We should also go ahead and audit 3.3 config file changes and get those in. We have no immediate plans to push customers to upgrade to 3.3 but I imagine we'll roll that out in an errata within the year.

--- Additional comment from Scott Dodson on 2018-03-29 09:43:47 EDT ---

Summarizing:

During 3.6 and later upgrades we need to assert that the following configuration lines exist in /etc/etcd/etcd.conf

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt

Also need to backport the commit in comment 6 to release-3.6 to ensure that new installs of 3.6 get the required configuration items.

--- Additional comment from Russell Teague on 2018-03-29 11:09:13 EDT ---

Commit in comment 6 is already backported in [1] so new installs of 3.6 should set the correct values.


[1] https://github.com/openshift/openshift-ansible/pull/5424

--- Additional comment from Russell Teague on 2018-03-29 14:46:40 EDT ---

master: https://github.com/openshift/openshift-ansible/pull/7711

Comment 1 Russell Teague 2018-04-03 18:57:08 UTC
release-3.7: https://github.com/openshift/openshift-ansible/pull/7755

Comment 2 Russell Teague 2018-04-04 18:10:33 UTC
New release-3.7 PR: https://github.com/openshift/openshift-ansible/pull/7780

Comment 3 Russell Teague 2018-04-11 12:29:14 UTC
Commit is in build openshift-ansible-3.7.43-1.git.0.176ff8d.el7

Comment 4 Gaoyun Pei 2018-04-16 10:48:55 UTC
Prepare an openshift v3.6.173.0.21 cluster with etcd-3.1.9-2 using openshift-ansible-3.6.173.0.21-2.git.0.44a4038.el7.noarch.rpm, which still has the deprecated etcd options: "ETCD_CA_FILE", "ETCD_PEER_CA_FILE" in etcd conf.


Run 3.6 -> 3.7 upgrade using openshift-ansible-3.7.44-1.git.0.dbb912c.el7.noarch.

ansible-playbook -i host/host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_7/upgrade.yml

After upgrade finished, check etcd status.

[root@ip-172-18-3-39 ~]# ETCDCTL_API=3 etcdctl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --endpoints  https://`hostname`:2379  -w table endpoint status
+------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
|                 ENDPOINT                 |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://ip-172-18-3-39.ec2.internal:2379 | cd8fe9e886d1558e |  3.2.15 |   11 MB |      true |         3 |      12152 |
+------------------------------------------+------------------+---------+---------+-----------+-----------+------------+

[root@ip-172-18-3-39 ~]# rpm -q etcd
etcd-3.2.15-2.el7.x86_64

2. Check etcd conf file, it's the same as fresh 3.7 install.
The following values added: 
ETCD_QUOTA_BACKEND_BYTES=4294967296
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt

The following removed:
ETCD_CA_FILE
ETCD_PEER_CA_FILE

[root@ip-172-18-3-39 ~]# diff /etc/etcd/etcd.conf /etc/etcd/etcd.conf_back
27a28
> ETCD_CA_FILE=/etc/etcd/ca.crt
29a31
> ETCD_PEER_CA_FILE=/etc/etcd/ca.crt
34,38d35
< ETCD_QUOTA_BACKEND_BYTES=4294967296
< ETCD_CLIENT_CERT_AUTH="true"
< ETCD_PEER_CLIENT_CERT_AUTH="true"
< ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
< ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt


3. Create and delete project
[root@ip-172-18-12-110 ~]# oc new-project test1
Now using project "test1" on server "https://ip-172-18-12-110.ec2.internal:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.
[root@ip-172-18-12-110 ~]# oc delete project test1
project "test1" deleted

Comment 6 errata-xmlrpc 2018-06-27 07:59:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2009


Note You need to log in before you can comment on or make changes to this bug.