Bug 1463494
Summary: | oadm migrate etcd-ttl failed when use the dedicated etcd clusters | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Anping Li <anli> | ||||
Component: | Cluster Version Operator | Assignee: | Jan Chaloupka <jchaloup> | ||||
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.6.0 | CC: | aos-bugs, jchaloup, jokerman, mmccomas, smunilla, xtian | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-08-10 05:28:56 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Anping Li
2017-06-21 06:30:41 UTC
The example inventory [OSEv3:children] masters nodes etcd [OSEv3:vars] ansible_ssh_user=root xxxx xxxx [masters] master.example.com [nodes] master.example.com node.example.com [etcd] etcd1.example.com etcd2.example.com etcd3.example.com Created attachment 1290503 [details]
Migrade logs
hit 'oadm migrate etcd-ttl' error with custers etcd (installed on masters). I think that is same issue.
[masters]
host-8-174-222.host.centralci.eng.rdu2.redhat.com
host-8-174-253.host.centralci.eng.rdu2.redhat.com
host-8-175-112.host.centralci.eng.rdu2.redhat.com
[etcd]
host-8-174-222.host.centralci.eng.rdu2.redhat.com
host-8-174-253.host.centralci.eng.rdu2.redhat.com
host-8-175-112.host.centralci.eng.rdu2.redhat.com
[nodes]
host-8-174-222.host.centralci.eng.rdu2.redhat.com
host-8-174-253.host.centralci.eng.rdu2.redhat.com
host-8-175-112.host.centralci.eng.rdu2.redhat.com
host-8-175-68.host.centralci.eng.rdu2.redhat.com
host-8-175-73.host.centralci.eng.rdu2.redhat.com
[lb]
host-8-175-186.host.centralci.eng.rdu2.redhat.com
[nfs]
host-8-175-186.host.centralci.eng.rdu2.redhat.com
delegate_to: {{ oo_first_master }} so that it's run on the first master Fixed as part of https://github.com/openshift/openshift-ansible/pull/4558 More specific upstream PR: https://github.com/openshift/openshift-ansible/pull/4623 The #4558 can be ignored for this issue. Merged upstream With master branch, I get the following error. Should I use the errata puddle? [root@anli host2]# cat hosts [OSEv3:children] masters nodes etcd nfs [OSEv3:vars] deployment_type=openshift-enterprise ansible_become=true ansible_user=root openshift_auth_type=allowall openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}] openshift_image_tag=v3.6.129 containerized=true enable_excluders=false openshift_master_cert_expire_days=365 openshift_disable_check=disk_availability,docker_storage,memory_availability [masters] openshift-225.lab.eng.nay.redhat.com openshift_public_hostname=openshift-225.lab.eng.nay.redhat.com openshift_hostname=openshift-225.lab.eng.nay.redhat.com [nodes] openshift-225.lab.eng.nay.redhat.com openshift_public_hostname=openshift-225.lab.eng.nay.redhat.com openshift_hostname=openshift-225.lab.eng.nay.redhat.com openshift_node_labels="{'region': 'infra'}" [etcd] openshift-208.lab.eng.nay.redhat.com openshift_public_hostname=openshift-208.lab.eng.nay.redhat.com openshift_hostname=openshift-208.lab.eng.nay.redhat.com [nfs] openshift-208.lab.eng.nay.redhat.com TASK [etcd_migrate : Re-introduce leases (as a replacement for key TTLs)] ****** failed: [openshift-208.lab.eng.nay.redhat.com -> openshift-225.lab.eng.nay.redhat.com] (item=/kubernetes.io/events) => { "changed": true, "cmd": [ "oadm", "migrate", "etcd-ttl", "--cert", "/etc/etcd/peer.crt", "--key", "/etc/etcd/peer.key", "--cacert", "/etc/etcd/ca.crt", "--etcd-address", "https://192.168.1.186:2379", "--ttl-keys-prefix", "/kubernetes.io/events", "--lease-duration", "1h" ], "delta": "0:00:00.278998", "end": "2017-07-03 22:30:37.631742", "failed": true, "item": "/kubernetes.io/events", "rc": 1, "start": "2017-07-03 22:30:37.352744", "warnings": [] } STDERR: error: open /etc/etcd/peer.crt: no such file or directory failed: [openshift-208.lab.eng.nay.redhat.com -> openshift-225.lab.eng.nay.redhat.com] (item=/kubernetes.io/masterleases) => { "changed": true, "cmd": [ "oadm", "migrate", "etcd-ttl", "--cert", "/etc/etcd/peer.crt", "--key", "/etc/etcd/peer.key", "--cacert", "/etc/etcd/ca.crt", "--etcd-address", "https://192.168.1.186:2379", "--ttl-keys-prefix", "/kubernetes.io/masterleases", "--lease-duration", "1h" ], "delta": "0:00:00.271130", "end": "2017-07-03 22:30:38.155144", "failed": true, "item": "/kubernetes.io/masterleases", "rc": 1, "start": "2017-07-03 22:30:37.884014", "warnings": [] } STDERR: error: open /etc/etcd/peer.crt: no such file or directory NO MORE HOSTS LEFT ************************************************************* to retry, use: --limit @/root/openshift-ansible/playbooks/byo/openshift-etcd/migrate.retry PLAY RECAP ********************************************************************* localhost : ok=15 changed=0 unreachable=0 failed=0 openshift-208.lab.eng.nay.redhat.com : ok=74 changed=9 unreachable=0 failed=1 openshift-225.lab.eng.nay.redhat.com : ok=19 changed=1 unreachable=0 failed=0 Another question, What shall we do to recover from failure when the playbook stopped at ' Re-introduce leases'? PR setting the proper certificates: https://github.com/openshift/openshift-ansible/pull/4671 The external clustered can be migrated via openshift-ansible-3.6.136 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716 |