Bug 1529575 - [3.9] Updating etcd does not update the etcd config with new variables
Summary: [3.9] Updating etcd does not update the etcd config with new variables
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.9.z
Assignee: Russell Teague
QA Contact: liujia
URL:
Whiteboard:
: 1559876 (view as bug list)
Depends On:
Blocks: 1563375 1563376
TreeView+ depends on / blocked
 
Reported: 2017-12-28 18:37 UTC by Ryan Howe
Modified: 2018-06-18 18:20 UTC (History)
10 users (show)

Fixed In Version: openshift-ansible-3.9.24-1.git.0.d0289ea.el7.noarch
Doc Type: Bug Fix
Doc Text:
In certain cases, an existing etcd installation may not have updated configuration variables causing services to fail. This ensures the etcd.conf file is verified during upgrades to ensure all variables are set as expected.
Clone Of:
: 1563375 1563376 (view as bug list)
Environment:
Last Closed: 2018-06-18 18:20:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2013 0 normal SHIPPED_LIVE Important: OpenShift Container Platform 3.9 security, bug fix, and enhancement update 2018-06-27 22:01:43 UTC

Description Ryan Howe 2017-12-28 18:37:28 UTC
Description of problem:

ETCD_CA_FILE is deprecated and replaced by ETCD_TRUSTED_CA_FILE
ETCD_PEER_CA_FILE is deprecated and replaced by ETCD_PEER_TRUSTED_CA_FILE


```
#[cluster]
ETCD_QUOTA_BACKEND_BYTES=4294967296

#[security]
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_AUTH="true"
```
https://coreos.com/etcd/docs/latest/v2/configuration.html#security-flags



Version-Release number of the following components:
3.6 and 3.7 migrate playbooks. 

Actual results:
/etc/etcd/etcd.conf is not updated


Expected results:

The following values added: 
```
#[cluster]
ETCD_QUOTA_BACKEND_BYTES=4294967296

#[security]
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CERT_AUTH="true"
```

The following removed:
```
ETCD_CA_FILE
ETCD_PEER_CA_FILE

```

Comment 1 Scott Dodson 2018-01-02 14:29:19 UTC
Is there actually any consequence of not updating them in version 3.2 of etcd? ... as in everything works without error as it is currently, correct?

Comment 2 Ryan Howe 2018-02-07 14:45:02 UTC
(In reply to Scott Dodson from comment #1)
> Is there actually any consequence of not updating them in version 3.2 of
> etcd? ... as in everything works without error as it is currently, correct?

Everything works now, but I we need to make sure that these values get set, if the we leave these unchanged are you 100% that a future updated of etcd will not result in issues due to these not getting set? 

When we do an update or migration we must make sure all installs are configured with the same needed variables. If not a future update might cause a production down due to missing this step.

Comment 3 Ryan Howe 2018-02-07 15:18:31 UTC
With out setting the following to true 

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"

The code defaults to false. Not sure what the impact of this is. 

https://github.com/coreos/etcd/blob/master/etcdmain/config.go#L180
https://github.com/coreos/etcd/blob/master/etcdmain/config.go#L187


Also a correction to the above. 

 ETCD_PEER_CERT_AUTH in the comments above should be ETCD_PEER_CLIENT_CERT_AUTH 
       *3.6 and 3.7 installer sets this correctly its just my typo.

Comment 4 Nicolas Nosenzo 2018-03-07 10:38:11 UTC
 I just tested this by running a plain "yum update" in a 3.5 cluster running etcd3.1, and it crashed while throwing messages about authority certificate issues (caused for those deprecated variables).

Increasing the priority as we have seen this as well on a customer production environment where the etcd cluster was upgraded from 3.1 to 3.2, as a result the etcd cluster got broken (As the etcd is not included within the "atomic-excluder" package, I'm re-opening BZ 1493034) for this.)

Comment 5 Scott Dodson 2018-03-07 14:10:34 UTC
If setting these two variables alleviates the problem then please set them as a workaround for now.

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"

Comment 6 Scott Dodson 2018-03-12 18:53:41 UTC
We need to add all of the uncommented variables from https://github.com/openshift/openshift-ansible/commit/7c96c92cc3a71a8d00494b2e177afc3e130a58d4 during upgrades via the lineinfile module or potentially re-evaluate the template but that may be risky as the inputs to openshift-ansible may have changed or we may not evaluate all facts.

We should also go ahead and audit 3.3 config file changes and get those in. We have no immediate plans to push customers to upgrade to 3.3 but I imagine we'll roll that out in an errata within the year.

Comment 7 Scott Dodson 2018-03-29 13:43:47 UTC
Summarizing:

During 3.6 and later upgrades we need to assert that the following configuration lines exist in /etc/etcd/etcd.conf

ETCD_CLIENT_CERT_AUTH="true"
ETCD_PEER_CLIENT_CERT_AUTH="true"
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt

Also need to backport the commit in comment 6 to release-3.6 to ensure that new installs of 3.6 get the required configuration items.

Comment 8 Russell Teague 2018-03-29 15:09:13 UTC
Commit in comment 6 is already backported in [1] so new installs of 3.6 should set the correct values.


[1] https://github.com/openshift/openshift-ansible/pull/5424

Comment 9 Russell Teague 2018-03-29 18:46:40 UTC
master: https://github.com/openshift/openshift-ansible/pull/7711

Comment 10 Russell Teague 2018-04-03 18:55:23 UTC
release-3.9: https://github.com/openshift/openshift-ansible/pull/7754

Comment 11 Russell Teague 2018-04-11 12:26:33 UTC
Commit is in build e1f1eda4e1e3938a55a5172d89664facd2ca4ca4

Comment 12 Scott Dodson 2018-04-16 20:54:37 UTC
*** Bug 1559876 has been marked as a duplicate of this bug. ***

Comment 13 liujia 2018-04-17 01:44:39 UTC
Blocked verify by bz1566435

Comment 14 liujia 2018-04-20 08:14:04 UTC
Version:
openshift-ansible-3.9.24-1.git.0.d0289ea.el7.noarch

Steps:
1. install ocp v3.7 with etcd-3.1.9
2. since etcd.conf is the latest version of config,so edit etcd.conf to change some variables as the description.
update:
ETCD_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_CA_FILE=/etc/etcd/ca.crt
remove:
ETCD_QUOTA_BACKEND_BYTES=4294967296
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_CLIENT_CERT_AUTH=true
3. restart etcd service to ensure etcd works well with updated etcd.conf
4. do upgrade against above ocp

Etcd was updated successfully with etcd config updated.
ETCD_QUOTA_BACKEND_BYTES=4294967296
ETCD_CLIENT_CERT_AUTH=true
ETCD_PEER_CLIENT_CERT_AUTH=true
ETCD_TRUSTED_CA_FILE=/etc/etcd/ca.crt
ETCD_PEER_TRUSTED_CA_FILE=/etc/etcd/ca.crt


Note You need to log in before you can comment on or make changes to this bug.