Bug 1563375 - [3.6] Updating etcd does not update the etcd config with new variables
Summary: [3.6] Updating etcd does not update the etcd config with new variables
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.6.z
Assignee: Russell Teague
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On: 1529575 1563376 1567857
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-03 18:51 UTC by Russell Teague
Modified: 2018-06-07 08:40 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
In certain cases, an existing etcd installation may not have updated configuration variables causing services to fail. This ensures the etcd.conf file is verified during upgrades to ensure all variables are set as expected.
Clone Of: 1529575
Environment:
Last Closed: 2018-06-07 08:40:35 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1801 None None None 2018-06-07 08:40:49 UTC

Description Russell Teague 2018-04-03 18:51:28 UTC
+++ This bug was initially created as a clone of Bug #1529575 +++

Description of problem:

ETCD_CA_FILE is deprecated and replaced by ETCD_TRUSTED_CA_FILE
ETCD_PEER_CA_FILE is deprecated and replaced by ETCD_PEER_TRUSTED_CA_FILE


```
#[cluster]
ETCD_QUOTA_BACKEND_BYTES=4294967296

#[security]
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_AUTH="true"
```
https://coreos.com/etcd/docs/latest/v2/configuration.html#security-flags



Version-Release number of the following components:
3.6 and 3.7 migrate playbooks. 

Actual results:
/etc/etcd/etcd.conf is not updated


Expected results:

The following values added: 
```
#[cluster]
ETCD_QUOTA_BACKEND_BYTES=4294967296

#[security]
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_AUTH="true"
```

The following removed:
```
ETCD_CA_FILE
ETCD_PEER_CA_FILE

```

--- Additional comment from Scott Dodson on 2018-01-02 09:29:19 EST ---

Is there actually any consequence of not updating them in version 3.2 of etcd? ... as in everything works without error as it is currently, correct?

--- Additional comment from Ryan Howe on 2018-02-07 09:45:02 EST ---

(In reply to Scott Dodson from comment #1)
> Is there actually any consequence of not updating them in version 3.2 of
> etcd? ... as in everything works without error as it is currently, correct?

Everything works now, but I we need to make sure that these values get set, if the we leave these unchanged are you 100% that a future updated of etcd will not result in issues due to these not getting set? 

When we do an update or migration we must make sure all installs are configured with the same needed variables. If not a future update might cause a production down due to missing this step.

--- Additional comment from Ryan Howe on 2018-02-07 10:18:31 EST ---

With out setting the following to true 

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"

The code defaults to false. Not sure what the impact of this is. 

https://github.com/coreos/etcd/blob/master/etcdmain/config.go#L180
https://github.com/coreos/etcd/blob/master/etcdmain/config.go#L187


Also a correction to the above. 

 ETCD_PEER_CERT_AUTH in the comments above should be ETCD_PEER_CLIENT_CERT_AUTH 
       *3.6 and 3.7 installer sets this correctly its just my typo.

--- Additional comment from Nicolas Nosenzo on 2018-03-07 05:38:11 EST ---

 I just tested this by running a plain "yum update" in a 3.5 cluster running etcd3.1, and it crashed while throwing messages about authority certificate issues (caused for those deprecated variables).

Increasing the priority as we have seen this as well on a customer production environment where the etcd cluster was upgraded from 3.1 to 3.2, as a result the etcd cluster got broken (As the etcd is not included within the "atomic-excluder" package, I'm re-opening BZ 1493034) for this.)

--- Additional comment from Scott Dodson on 2018-03-07 09:10:34 EST ---

If setting these two variables alleviates the problem then please set them as a workaround for now.

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"

--- Additional comment from Scott Dodson on 2018-03-12 14:53:41 EDT ---

We need to add all of the uncommented variables from https://github.com/openshift/openshift-ansible/commit/7c96c92cc3a71a8d00494b2e177afc3e130a58d4 during upgrades via the lineinfile module or potentially re-evaluate the template but that may be risky as the inputs to openshift-ansible may have changed or we may not evaluate all facts.

We should also go ahead and audit 3.3 config file changes and get those in. We have no immediate plans to push customers to upgrade to 3.3 but I imagine we'll roll that out in an errata within the year.

--- Additional comment from Scott Dodson on 2018-03-29 09:43:47 EDT ---

Summarizing:

During 3.6 and later upgrades we need to assert that the following configuration lines exist in /etc/etcd/etcd.conf

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt

Also need to backport the commit in comment 6 to release-3.6 to ensure that new installs of 3.6 get the required configuration items.

--- Additional comment from Russell Teague on 2018-03-29 11:09:13 EDT ---

Commit in comment 6 is already backported in [1] so new installs of 3.6 should set the correct values.


[1] https://github.com/openshift/openshift-ansible/pull/5424

--- Additional comment from Russell Teague on 2018-03-29 14:46:40 EDT ---

master: https://github.com/openshift/openshift-ansible/pull/7711

Comment 1 Russell Teague 2018-04-03 18:56:25 UTC
release-3.6: https://github.com/openshift/openshift-ansible/pull/7756

Comment 2 Russell Teague 2018-04-04 18:09:45 UTC
New release-3.6 PR: https://github.com/openshift/openshift-ansible/pull/7781

Comment 3 Russell Teague 2018-04-12 12:52:38 UTC
Fixed in openshift-ansible-3.6.173.0.113-1.git.0.8a42ef5.el7

Comment 4 Gaoyun Pei 2018-04-27 07:30:44 UTC
Verify this bug with openshift-ansible-3.6.173.0.113-1.git.1.8eaab14.el7.noarch

Setup an ocp-3.5 cluster with etcd-3.1.9-2.el7.x86_64 installed as external etcd.
Upgrade it to 3.6, etcd also would be updated to etcd-3.2.15-2.el7.x86_64

[root@ip-172-18-1-165 ~]# etcdctl --cert-file /etc/etcd/peer.crt --key-file /etc/etcd/peer.key --ca-file /etc/etcd/ca.crt --endpoints https://172.18.1.165:2379 cluster-health
member 8e9e05c52164694d is healthy: got healthy result from https://172.18.1.165:2379
cluster is healthy

Check etcd conf file /etc/etcd/etcd.conf
The following values added: 
ETCD_QUOTA_BACKEND_BYTES=4294967296
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt

The following removed:
ETCD_CA_FILE
ETCD_PEER_CA_FILE

Comment 6 errata-xmlrpc 2018-06-07 08:40:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1801


Note You need to log in before you can comment on or make changes to this bug.