Description of problem: Since release of etcd-3.3.11-2.el7.x86_64 package in "rhel-7-server-extras-rpms" repo it will be required to exclude this version in yum.conf as we do for incompatible docker versions with "atomic-openshift-docker-excluder". Version-Release number of the following components: $ rpm -qa | grep -e ansible -e openshift -e etcd | sort ansible-2.4.2.0-2.el7.noarch atomic-openshift-3.7.72-1.git.0.925b9cd.el7.x86_64 atomic-openshift-clients-3.7.72-1.git.0.925b9cd.el7.x86_64 atomic-openshift-docker-excluder-3.7.72-1.git.0.925b9cd.el7.noarch atomic-openshift-excluder-3.7.72-1.git.0.925b9cd.el7.noarch atomic-openshift-master-3.7.72-1.git.0.925b9cd.el7.x86_64 atomic-openshift-node-3.7.72-1.git.0.925b9cd.el7.x86_64 atomic-openshift-sdn-ovs-3.7.72-1.git.0.925b9cd.el7.x86_64 atomic-openshift-utils-3.7.72-1.git.0.5c45a8a.el7.noarch etcd-3.3.11-2.el7.x86_64 openshift-ansible-3.7.72-1.git.0.5c45a8a.el7.noarch openshift-ansible-callback-plugins-3.7.72-1.git.0.5c45a8a.el7.noarch openshift-ansible-docs-3.7.72-1.git.0.5c45a8a.el7.noarch openshift-ansible-filter-plugins-3.7.72-1.git.0.5c45a8a.el7.noarch openshift-ansible-lookup-plugins-3.7.72-1.git.0.5c45a8a.el7.noarch openshift-ansible-playbooks-3.7.72-1.git.0.5c45a8a.el7.noarch openshift-ansible-roles-3.7.72-1.git.0.5c45a8a.el7.noarch tuned-profiles-atomic-openshift-node-3.7.72-1.git.0.925b9cd.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. Install fresh 3.7 cluster 2. 3. Actual results: etcd-3.3.11-1 is installed Expected results: etcd-3.2.22-1 should be installed as that is the supported version. [1] [1] https://access.redhat.com/articles/2176281 Additional info: This came up due to the package being picked up in the image "registry.access.redhat.com/rhel7/etcd/images/3.2.22-24". [2] [2] https://bugzilla.redhat.com/show_bug.cgi?id=1672344 But as the etcd package is picked up from the "rhel-7-server-extras-rpms" repo, all versions of Openshift would seem to be affected by this version bump when performing a new install or upgrade. Suggestion would be to create an "atomic-openshift-etcd-excluder" package as there is the "atomic-openshift-docker-excluder" performing similar function to protect docker version.
Typo: Actual results: etcd-3.3.11-1 is installed Should be: Actual results: etcd-3.3.11-2 is installed
There's never been a need for an excluder for etcd in the past. Without identified problems in 3.3 we won't be adding one at this point in the lifecycle. If you reopen this please make sure to link to actual identified problems with 3.3
These seem to strongly imply it is a bug/unsupported. https://bugzilla.redhat.com/show_bug.cgi?id=1672344 https://projects.engineering.redhat.com/browse/RCM-50624 We depend on packages from extras, extras can have newer packages than a particular version of OCP supports (docker). The same can now be said of etcd.
No, that bug is that an image tagged 3.2 contains 3.3 which is a bug into itself. Master team owns the etcd and API server testing.
https://projects.engineering.redhat.com/browse/RCM-50668
This is indeed a bug for customers like me who are running OCP 3.9 and trying to prep for 3.10 upgrade. During the middle of migrating from dedicated etcd nodes to collocated on masters, I find myself in a situation where 3 etcd stand-alone VMs and 2 masters are in a etcd cluster running 3.3.11 from RPM. I have a 3rd master which can't be scaled up because the 3.3.11 RPM has suddenly gone "missing" in extras repo leaving behind only 3.2.22 RPM. Even if the RPM where still there I'm going to have to downgrade and recover from snapshot per https://access.redhat.com/solutions/3885101 Fortunately I had not yum updated my production cluster and gotten bitten by etcd-3.3.11. I'm rather dismayed by this situation.
I'm sorry you're in that situation but I don't think that adding that version to the excluder at this point would improve anything. I believe it would actually cause additional problems where if your etcd cluster had been upgraded to 3.3 and you haven't/won't downgrade to etcd 3.2 it impossible to scale up an additional node running 3.3 during disaster recovery scenarios. In short, I don't think it's good to add additional rigidity to the situation when the packages have already been pulled and images rebuilt. Remember though, that a cluster is only upgraded to 3.3 once all members have been upgraded to 3.3 and the logs indicate that the agreed upon cluster version is 3.3. If you're in a situation where only 2 of 3 members upgraded to 3.3 then I would expect that your cluster has not been upgraded to 3.3 and it would be safe to downgrade the packages in place without further remediation. Please work with support to explore that if you think that's appropriate for your situation.
Sorry, my last comment on this closed bug. All my members are upgraded to 3.3 and unaware of the implication, I only noticed when I attempted to add another member. The 3.3 RPM had apparently been pulled (without notice?). I saw this error on the prospective member: Feb 17 15:18:50 ose-test-master-03.example.com etcd[9997]: etcd Version: 3.2.22 ... Feb 17 15:18:50 ose-test-master-03.example.com etcd[9997]: the running cluster version(3.3.0) is higher than the maximum cluster version(3.2.0) supported Active member: Feb 17 14:48:00 ose-test-master-02.example.com etcd[35101]: starting server... [version: 3.3.11, cluster version: to_be_decided] Feb 17 14:48:01 ose-test-master-02.example.com etcd[35101]: enabled capabilities for version 3.3 Feb 17 14:48:01 ose-test-master-02.example.com etcd[35101]: set the cluster version to 3.3 from store It's not clear how long 3.9 will be supported, but I know I'm not the only customer on it. Implicit in the close of this bug is the assumption server-extras will never again include a version of etcd greater than what is supported by OpenShift and we will never be bitten again. I'm not sure how to know that, and that does seem limiting to other uses of etcd on RHEL. I'll work with support to downgrade etcd to 3.2 before continuing.