Bug 1672518 - Add etcd-3.3 to yum.conf exclude line
Summary: Add etcd-3.3 to yum.conf exclude line
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Scott Dodson
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-05 08:28 UTC by Brendan Mchugh
Modified: 2019-02-18 17:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-07 16:16:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Brendan Mchugh 2019-02-05 08:28:46 UTC
Description of problem:
Since release of etcd-3.3.11-2.el7.x86_64 package in "rhel-7-server-extras-rpms" repo it will be required to exclude this version in yum.conf as we do for incompatible docker versions with "atomic-openshift-docker-excluder".

Version-Release number of the following components:
$ rpm -qa | grep -e ansible -e openshift -e etcd | sort
ansible-2.4.2.0-2.el7.noarch
atomic-openshift-3.7.72-1.git.0.925b9cd.el7.x86_64
atomic-openshift-clients-3.7.72-1.git.0.925b9cd.el7.x86_64
atomic-openshift-docker-excluder-3.7.72-1.git.0.925b9cd.el7.noarch
atomic-openshift-excluder-3.7.72-1.git.0.925b9cd.el7.noarch
atomic-openshift-master-3.7.72-1.git.0.925b9cd.el7.x86_64
atomic-openshift-node-3.7.72-1.git.0.925b9cd.el7.x86_64
atomic-openshift-sdn-ovs-3.7.72-1.git.0.925b9cd.el7.x86_64
atomic-openshift-utils-3.7.72-1.git.0.5c45a8a.el7.noarch
etcd-3.3.11-2.el7.x86_64
openshift-ansible-3.7.72-1.git.0.5c45a8a.el7.noarch
openshift-ansible-callback-plugins-3.7.72-1.git.0.5c45a8a.el7.noarch
openshift-ansible-docs-3.7.72-1.git.0.5c45a8a.el7.noarch
openshift-ansible-filter-plugins-3.7.72-1.git.0.5c45a8a.el7.noarch
openshift-ansible-lookup-plugins-3.7.72-1.git.0.5c45a8a.el7.noarch
openshift-ansible-playbooks-3.7.72-1.git.0.5c45a8a.el7.noarch
openshift-ansible-roles-3.7.72-1.git.0.5c45a8a.el7.noarch
tuned-profiles-atomic-openshift-node-3.7.72-1.git.0.925b9cd.el7.x86_64


How reproducible:
Always

Steps to Reproduce:
1. Install fresh 3.7 cluster
2.
3.

Actual results:
etcd-3.3.11-1 is installed

Expected results:
etcd-3.2.22-1 should be installed as that is the supported version. [1]

[1] https://access.redhat.com/articles/2176281

Additional info:
This came up due to the package being picked up in the image "registry.access.redhat.com/rhel7/etcd/images/3.2.22-24". [2]

[2] https://bugzilla.redhat.com/show_bug.cgi?id=1672344


But as the etcd package is picked up from the "rhel-7-server-extras-rpms" repo, all versions of Openshift would seem to be affected by this version bump when performing a new install or upgrade.


Suggestion would be to create an "atomic-openshift-etcd-excluder" package as there is the "atomic-openshift-docker-excluder" performing similar function to protect docker version.

Comment 1 Brendan Mchugh 2019-02-05 09:52:52 UTC
Typo:

Actual results:
etcd-3.3.11-1 is installed

Should be:

Actual results:
etcd-3.3.11-2 is installed

Comment 3 Scott Dodson 2019-02-06 18:04:24 UTC
There's never been a need for an excluder for etcd in the past. Without identified problems in 3.3 we won't be adding one at this point in the lifecycle.

If you reopen this please make sure to link to actual identified problems with 3.3

Comment 4 Brendan Mchugh 2019-02-06 20:54:56 UTC
These seem to strongly imply it is a bug/unsupported.

https://bugzilla.redhat.com/show_bug.cgi?id=1672344
https://projects.engineering.redhat.com/browse/RCM-50624

We depend on packages from extras, extras can have newer packages than a particular version of OCP supports (docker).
The same can now be said of etcd.

Comment 7 Scott Dodson 2019-02-07 16:16:16 UTC
No, that bug is that an image tagged 3.2 contains 3.3 which is a bug into itself. Master team owns the etcd and API server testing.

Comment 9 dlbewley 2019-02-18 01:50:38 UTC
This is indeed a bug for customers like me who are running OCP 3.9 and trying to prep for 3.10 upgrade.

During the middle of migrating from dedicated etcd nodes to collocated on masters, I find myself in a situation where 3 etcd stand-alone VMs and 2 masters are in a etcd cluster running 3.3.11 from RPM. I have a 3rd master which can't be scaled up because the 3.3.11 RPM has suddenly gone "missing" in extras repo leaving behind only 3.2.22 RPM.

Even if the RPM where still there I'm going to have to downgrade and recover from snapshot per https://access.redhat.com/solutions/3885101

Fortunately I had not yum updated my production cluster and gotten bitten by etcd-3.3.11.

I'm rather dismayed by this situation.

Comment 10 Scott Dodson 2019-02-18 15:35:19 UTC
I'm sorry you're in that situation but I don't think that adding that version to the excluder at this point would improve anything. I believe it would actually cause additional problems where if your etcd cluster had been upgraded to 3.3 and you haven't/won't downgrade to etcd 3.2 it impossible to scale up an additional node running 3.3 during disaster recovery scenarios. In short, I don't think it's good to add additional rigidity to the situation when the packages have already been pulled and images rebuilt.

Remember though, that a cluster is only upgraded to 3.3 once all members have been upgraded to 3.3 and the logs indicate that the agreed upon cluster version is 3.3. If you're in a situation where only 2 of 3 members upgraded to 3.3 then I would expect that your cluster has not been upgraded to 3.3 and it would be safe to downgrade the packages in place without further remediation. Please work with support to explore that if you think that's appropriate for your situation.

Comment 11 dlbewley 2019-02-18 17:06:50 UTC
Sorry, my last comment on this closed bug.

All my members are upgraded to 3.3 and unaware of the implication, I only noticed when I attempted to add another member. The 3.3 RPM had apparently been pulled (without notice?). 

I saw this error on the prospective member:
 Feb 17 15:18:50 ose-test-master-03.example.com etcd[9997]: etcd Version: 3.2.22
 ...
 Feb 17 15:18:50 ose-test-master-03.example.com etcd[9997]: the running cluster version(3.3.0) is higher than the maximum cluster version(3.2.0) supported

Active member:
 Feb 17 14:48:00 ose-test-master-02.example.com etcd[35101]: starting server... [version: 3.3.11, cluster version: to_be_decided]
 Feb 17 14:48:01 ose-test-master-02.example.com etcd[35101]: enabled capabilities for version 3.3
 Feb 17 14:48:01 ose-test-master-02.example.com etcd[35101]: set the cluster version to 3.3 from store


It's not clear how long 3.9 will be supported, but I know I'm not the only customer on it. Implicit in the close of this bug is the assumption server-extras will never again include a version of etcd greater than what is supported by OpenShift and we will never be bitten again. I'm not sure how to know that, and that does seem limiting to other uses of etcd on RHEL.

I'll work with support to downgrade etcd to 3.2 before continuing.


Note You need to log in before you can comment on or make changes to this bug.