Description of problem: The migrate failed when use the dedicated etcd clusters. For there isn't atomic-openshift packages on the dedicated etcd clusters. I guess we only need to run 'oadm migrate etcd-ttl' on the first master. Version-Release number of selected component (if applicable): openshift/openshift-ansible: Pull Request 4492. How reproducible: always Steps to Reproduce: 1. install OCP v3.5 with dedicated etcd clusters 2. upgrade to v3.6 3. migrate to etcd v3 anible-playbook openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml Actual results: TASK [etcd_migrate : Re-introduce leases (as a replacement for key TTLs)] ****** failed: [qe-auto-etcd-1.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/events) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.16:2379 --ttl-keys-prefix /kubernetes.io/events --lease-duration 1h", "failed": true, "item": "/kubernetes.io/events", "rc": 2 } MSG: [Errno 2] No such file or directory failed: [qe-auto-etcd-2.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/events) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.17:2379 --ttl-keys-prefix /kubernetes.io/events --lease-duration 1h", "failed": true, "item": "/kubernetes.io/events", "rc": 2 } MSG: [Errno 2] No such file or directory failed: [qe-auto-etcd-3.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/events) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.18:2379 --ttl-keys-prefix /kubernetes.io/events --lease-duration 1h", "failed": true, "item": "/kubernetes.io/events", "rc": 2 } MSG: [Errno 2] No such file or directory failed: [qe-auto-etcd-1.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/masterleases) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.16:2379 --ttl-keys-prefix /kubernetes.io/masterleases --lease-duration 1h", "failed": true, "item": "/kubernetes.io/masterleases", "rc": 2 } MSG: [Errno 2] No such file or directory failed: [qe-auto-etcd-2.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/masterleases) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.17:2379 --ttl-keys-prefix /kubernetes.io/masterleases --lease-duration 1h", "failed": true, "item": "/kubernetes.io/masterleases", "rc": 2 } MSG: [Errno 2] No such file or directory failed: [qe-auto-etcd-3.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/masterleases) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.18:2379 --ttl-keys-prefix /kubernetes.io/masterleases --lease-duration 1h", "failed": true, "item": "/kubernetes.io/masterleases", "rc": 2 } MSG: [Errno 2] No such file or directory to retry, use: --limit @/root/openshift-ansible/playbooks/byo/openshift-etcd/migrate.retry PLAY RECAP ********************************************************************* localhost : ok=22 changed=0 unreachable=0 failed=0 qe-auto-etcd-1.0621-ktl.qe.rhcloud.com : ok=96 changed=9 unreachable=0 failed=1 qe-auto-etcd-2.0621-ktl.qe.rhcloud.com : ok=93 changed=9 unreachable=0 failed=1 qe-auto-etcd-3.0621-ktl.qe.rhcloud.com : ok=93 changed=9 unreachable=0 failed=1 qe-auto-master-1.0621-ktl.qe.rhcloud.com : ok=61 changed=3 unreachable=0 failed=0 qe-auto-node-registry-router-1.0621-ktl.qe.rhcloud.com : ok=56 changed=2 unreachable=0 failed=0 Expected results: Additional info:
The example inventory [OSEv3:children] masters nodes etcd [OSEv3:vars] ansible_ssh_user=root xxxx xxxx [masters] master.example.com [nodes] master.example.com node.example.com [etcd] etcd1.example.com etcd2.example.com etcd3.example.com
Created attachment 1290503 [details] Migrade logs hit 'oadm migrate etcd-ttl' error with custers etcd (installed on masters). I think that is same issue. [masters] host-8-174-222.host.centralci.eng.rdu2.redhat.com host-8-174-253.host.centralci.eng.rdu2.redhat.com host-8-175-112.host.centralci.eng.rdu2.redhat.com [etcd] host-8-174-222.host.centralci.eng.rdu2.redhat.com host-8-174-253.host.centralci.eng.rdu2.redhat.com host-8-175-112.host.centralci.eng.rdu2.redhat.com [nodes] host-8-174-222.host.centralci.eng.rdu2.redhat.com host-8-174-253.host.centralci.eng.rdu2.redhat.com host-8-175-112.host.centralci.eng.rdu2.redhat.com host-8-175-68.host.centralci.eng.rdu2.redhat.com host-8-175-73.host.centralci.eng.rdu2.redhat.com [lb] host-8-175-186.host.centralci.eng.rdu2.redhat.com [nfs] host-8-175-186.host.centralci.eng.rdu2.redhat.com
delegate_to: {{ oo_first_master }} so that it's run on the first master
Fixed as part of https://github.com/openshift/openshift-ansible/pull/4558
More specific upstream PR: https://github.com/openshift/openshift-ansible/pull/4623 The #4558 can be ignored for this issue.
Merged upstream
With master branch, I get the following error. Should I use the errata puddle? [root@anli host2]# cat hosts [OSEv3:children] masters nodes etcd nfs [OSEv3:vars] deployment_type=openshift-enterprise ansible_become=true ansible_user=root openshift_auth_type=allowall openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}] openshift_image_tag=v3.6.129 containerized=true enable_excluders=false openshift_master_cert_expire_days=365 openshift_disable_check=disk_availability,docker_storage,memory_availability [masters] openshift-225.lab.eng.nay.redhat.com openshift_public_hostname=openshift-225.lab.eng.nay.redhat.com openshift_hostname=openshift-225.lab.eng.nay.redhat.com [nodes] openshift-225.lab.eng.nay.redhat.com openshift_public_hostname=openshift-225.lab.eng.nay.redhat.com openshift_hostname=openshift-225.lab.eng.nay.redhat.com openshift_node_labels="{'region': 'infra'}" [etcd] openshift-208.lab.eng.nay.redhat.com openshift_public_hostname=openshift-208.lab.eng.nay.redhat.com openshift_hostname=openshift-208.lab.eng.nay.redhat.com [nfs] openshift-208.lab.eng.nay.redhat.com TASK [etcd_migrate : Re-introduce leases (as a replacement for key TTLs)] ****** failed: [openshift-208.lab.eng.nay.redhat.com -> openshift-225.lab.eng.nay.redhat.com] (item=/kubernetes.io/events) => { "changed": true, "cmd": [ "oadm", "migrate", "etcd-ttl", "--cert", "/etc/etcd/peer.crt", "--key", "/etc/etcd/peer.key", "--cacert", "/etc/etcd/ca.crt", "--etcd-address", "https://192.168.1.186:2379", "--ttl-keys-prefix", "/kubernetes.io/events", "--lease-duration", "1h" ], "delta": "0:00:00.278998", "end": "2017-07-03 22:30:37.631742", "failed": true, "item": "/kubernetes.io/events", "rc": 1, "start": "2017-07-03 22:30:37.352744", "warnings": [] } STDERR: error: open /etc/etcd/peer.crt: no such file or directory failed: [openshift-208.lab.eng.nay.redhat.com -> openshift-225.lab.eng.nay.redhat.com] (item=/kubernetes.io/masterleases) => { "changed": true, "cmd": [ "oadm", "migrate", "etcd-ttl", "--cert", "/etc/etcd/peer.crt", "--key", "/etc/etcd/peer.key", "--cacert", "/etc/etcd/ca.crt", "--etcd-address", "https://192.168.1.186:2379", "--ttl-keys-prefix", "/kubernetes.io/masterleases", "--lease-duration", "1h" ], "delta": "0:00:00.271130", "end": "2017-07-03 22:30:38.155144", "failed": true, "item": "/kubernetes.io/masterleases", "rc": 1, "start": "2017-07-03 22:30:37.884014", "warnings": [] } STDERR: error: open /etc/etcd/peer.crt: no such file or directory NO MORE HOSTS LEFT ************************************************************* to retry, use: --limit @/root/openshift-ansible/playbooks/byo/openshift-etcd/migrate.retry PLAY RECAP ********************************************************************* localhost : ok=15 changed=0 unreachable=0 failed=0 openshift-208.lab.eng.nay.redhat.com : ok=74 changed=9 unreachable=0 failed=1 openshift-225.lab.eng.nay.redhat.com : ok=19 changed=1 unreachable=0 failed=0
Another question, What shall we do to recover from failure when the playbook stopped at ' Re-introduce leases'?
PR setting the proper certificates: https://github.com/openshift/openshift-ansible/pull/4671
The external clustered can be migrated via openshift-ansible-3.6.136
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716