Description of problem: When migrating etcd from v2 -> v3 in a containerized env the first etcd container never gets updated to the latest available image while the others are Version-Release number of selected component (if applicable): 3.6.173.0.96 How reproducible: Steps to Reproduce: 1. Upgrade existing OCP 3.6 to latest available 3.6 version: 3.6.173.0.96 as well as ansible et al 2. run # ansible-playbook -i /etc/ansible/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml 3. check versions, e.g.: # export ETCD_LISTEN_CLIENT_URLS="https://10.0.0.152:2379,https://10.0.0.153:2379,https://10.0.0.154:2379" # ETCDCTL_API=3 /usr/bin/etcdctl --cert="/etc/etcd/peer.crt" --key="/etc/etcd/peer.key" --cacert="/etc/etcd/ca.crt" --endpoints=$ETCD_LISTEN_CLIENT_URLS endpoint status -w table Actual results: +---------------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +---------------------------+------------------+---------+---------+-----------+-----------+------------+ | https://10.0.0.152:2379 | ec57d625662aa394 | 3.2.9 | 9.2 MB | true | 98 | 4960148 | | https://10.0.0.153:2379 | 316d70307263b11d | 3.2.11 | 9.2 MB | false | 98 | 4960148 | | https://10.0.0.154:2379 | f1ee01ab7b3d594f | 3.2.11 | 9.2 MB | false | 98 | 4960148 | +---------------------------+------------------+---------+---------+-----------+-----------+------------+ Expected results: all etcd nodes are at the same version, i.e. 3.2.11 Additional info: Tried this several times and always had the same result. I think the first node (etcd on masters) active is always skipped/ignored. When going to that node and stopping etcd_container, removing the image and docker and then starting it again makes al etcds to be at the same version Description of problem: Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
Daniel, We're going to strip out all the extraneous steps that affect the etcd installation. This is surely an interesting side effect, basically we do the migration on the first host and then we currently run the scale up playbooks on the other two which is pretty heavy handed. It will replace certificates and effectively re-install etcd. When we update the playbooks just to re-add those hosts and start them back up the outcome would be that all etcd hosts remain unchanged aside from data migration. Vadim, Not sure whether we should mark these all as dupes or not, they're all different symptoms of the same root cause. I guess for now lets leave them all open that way QE can test to ensure that each different symptom is resolved by our work.
I'll check if this is still reproducible with https://github.com/openshift/openshift-ansible/pull/7226 - I guess etcd upgrade during migrate is actually unwanted
Fix is available in openshift-ansible-3.6.173.0.104-1-4-g76aa5371e - the etcd migrate playbook no longer includes scaleup, so container version won't change
Version: openshift-ansible-3.6.173.0.104-1.git.0.ee43cc5.el7.noarch Steps: 1. HA containerized install ocp v3.5 (v3.5.5.31.48) with container etcd version is v3.2.7. # docker run -it --entrypoint rpm registry.access.redhat.com/rhel7/etcd:3.2.7 -qa etcd etcd-3.2.7-1.el7.x86_64 2. Upgrade v3.5 to latest ocp v3.6.173.0.104 +-----------------------------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +-----------------------------------------+------------------+---------+---------+-----------+-----------+------------+ | https://aos-138.lab.sjc.redhat.com:2379 | b0c1d3f602268dc8 | 3.2.7 | 25 kB | true | 9 | 75544 | | https://aos-152.lab.sjc.redhat.com:2379 | 53c604bc13ce5de3 | 3.2.7 | 25 kB | false | 9 | 75545 | | https://aos-155.lab.sjc.redhat.com:2379 | 122ce3db037f9bf3 | 3.2.7 | 25 kB | false | 9 | 75546 | +-----------------------------------------+------------------+---------+---------+-----------+-----------+------------+ 3. Do etcd migrate,migration succeed. Checked etcd versions in the cluster. +-----------------------------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +-----------------------------------------+------------------+---------+---------+-----------+-----------+------------+ | https://aos-138.lab.sjc.redhat.com:2379 | b0c1d3f602268dc8 | 3.2.7 | 7.7 MB | true | 132 | 81829 | | https://aos-152.lab.sjc.redhat.com:2379 | 658edf5257cc6250 | 3.2.11 | 7.7 MB | false | 132 | 81829 | | https://aos-155.lab.sjc.redhat.com:2379 | 18822de21a345140 | 3.2.11 | 7.7 MB | false | 132 | 81829 | +-----------------------------------------+------------------+---------+---------+-----------+-----------+------------+ Then checked pr was not merged into latest v3.6 build. # rpm -qa|grep openshift-ansible openshift-ansible-playbooks-3.6.173.0.104-1.git.0.ee43cc5.el7.noarch openshift-ansible-docs-3.6.173.0.104-1.git.0.ee43cc5.el7.noarch openshift-ansible-roles-3.6.173.0.104-1.git.0.ee43cc5.el7.noarch openshift-ansible-lookup-plugins-3.6.173.0.104-1.git.0.ee43cc5.el7.noarch openshift-ansible-callback-plugins-3.6.173.0.104-1.git.0.ee43cc5.el7.noarch openshift-ansible-filter-plugins-3.6.173.0.104-1.git.0.ee43cc5.el7.noarch openshift-ansible-3.6.173.0.104-1.git.0.ee43cc5.el7.noarch # grep -r "scaleup" /usr/share/ansible/openshift-ansible/playbooks/common/openshift-etcd/migrate.yml - include: ./scaleup.yml Change modified to wait for the pr merged.
Right, the PR is merged, but the release is not yet prepared
openshift-ansible-3.6.173.0.105-1 no longer calls scaleup
Blocked verify due to bz1554707
Version: openshift-ansible-3.6.173.0.110-1.git.0.ca81843.el7.noarch Steps: 1. HA containerized install ocp v3.5 with container etcd version is v3.2.7. # docker run -it --entrypoint rpm registry.access.redhat.com/rhel7/etcd:3.2.7 -qa etcd etcd-3.2.7-1.el7.x86_64 2. Upgrade v3.5 to latest ocp v3.6.173.0.110 +-------------------------------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +-------------------------------------------+------------------+---------+---------+-----------+-----------+------------+ | https://ip-172-18-2-113.ec2.internal:2379 | 9459974f264be826 | 3.2.7 | 25 kB | true | 13 | 48877 | | https://ip-172-18-2-175.ec2.internal:2379 | c1b23e750866c037 | 3.2.7 | 25 kB | false | 13 | 48878 | | https://ip-172-18-10-56.ec2.internal:2379 | be6ae0df781edce | 3.2.7 | 25 kB | false | 13 | 48878 | +-------------------------------------------+------------------+---------+---------+-----------+-----------+------------+ 3. Do etcd migrate,migration succeed. Checked etcd versions in the cluster. +-------------------------------------------+------------------+---------+---------+-----------+-----------+------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | +-------------------------------------------+------------------+---------+---------+-----------+-----------+------------+ | https://ip-172-18-2-113.ec2.internal:2379 | 9459974f264be826 | 3.2.7 | 8.3 MB | true | 18 | 63557 | | https://ip-172-18-2-175.ec2.internal:2379 | b7256a130b6be421 | 3.2.7 | 8.3 MB | false | 18 | 63557 | | https://ip-172-18-10-56.ec2.internal:2379 | 289a4e3dbe9b8ac7 | 3.2.7 | 8.3 MB | false | 18 | 63557 | +-------------------------------------------+------------------+---------+---------+-----------+-----------+------------+ I think it is expected now. Etcd migration should not do etcd upgrade or scaleup. So all etcd version keep the same version with it was before migration and keep the same with each other in the cluster.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1106