Bug 1463494
| Summary: | oadm migrate etcd-ttl failed when use the dedicated etcd clusters | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Anping Li <anli> | ||||
| Component: | Cluster Version Operator | Assignee: | Jan Chaloupka <jchaloup> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.6.0 | CC: | aos-bugs, jchaloup, jokerman, mmccomas, smunilla, xtian | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-08-10 05:28:56 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
The example inventory [OSEv3:children] masters nodes etcd [OSEv3:vars] ansible_ssh_user=root xxxx xxxx [masters] master.example.com [nodes] master.example.com node.example.com [etcd] etcd1.example.com etcd2.example.com etcd3.example.com Created attachment 1290503 [details]
Migrade logs
hit 'oadm migrate etcd-ttl' error with custers etcd (installed on masters). I think that is same issue.
[masters]
host-8-174-222.host.centralci.eng.rdu2.redhat.com
host-8-174-253.host.centralci.eng.rdu2.redhat.com
host-8-175-112.host.centralci.eng.rdu2.redhat.com
[etcd]
host-8-174-222.host.centralci.eng.rdu2.redhat.com
host-8-174-253.host.centralci.eng.rdu2.redhat.com
host-8-175-112.host.centralci.eng.rdu2.redhat.com
[nodes]
host-8-174-222.host.centralci.eng.rdu2.redhat.com
host-8-174-253.host.centralci.eng.rdu2.redhat.com
host-8-175-112.host.centralci.eng.rdu2.redhat.com
host-8-175-68.host.centralci.eng.rdu2.redhat.com
host-8-175-73.host.centralci.eng.rdu2.redhat.com
[lb]
host-8-175-186.host.centralci.eng.rdu2.redhat.com
[nfs]
host-8-175-186.host.centralci.eng.rdu2.redhat.com
delegate_to: {{ oo_first_master }} so that it's run on the first master
Fixed as part of https://github.com/openshift/openshift-ansible/pull/4558 More specific upstream PR: https://github.com/openshift/openshift-ansible/pull/4623 The #4558 can be ignored for this issue. Merged upstream With master branch, I get the following error. Should I use the errata puddle?
[root@anli host2]# cat hosts
[OSEv3:children]
masters
nodes
etcd
nfs
[OSEv3:vars]
deployment_type=openshift-enterprise
ansible_become=true
ansible_user=root
openshift_auth_type=allowall
openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}]
openshift_image_tag=v3.6.129
containerized=true
enable_excluders=false
openshift_master_cert_expire_days=365
openshift_disable_check=disk_availability,docker_storage,memory_availability
[masters]
openshift-225.lab.eng.nay.redhat.com openshift_public_hostname=openshift-225.lab.eng.nay.redhat.com openshift_hostname=openshift-225.lab.eng.nay.redhat.com
[nodes]
openshift-225.lab.eng.nay.redhat.com openshift_public_hostname=openshift-225.lab.eng.nay.redhat.com openshift_hostname=openshift-225.lab.eng.nay.redhat.com openshift_node_labels="{'region': 'infra'}"
[etcd]
openshift-208.lab.eng.nay.redhat.com openshift_public_hostname=openshift-208.lab.eng.nay.redhat.com openshift_hostname=openshift-208.lab.eng.nay.redhat.com
[nfs]
openshift-208.lab.eng.nay.redhat.com
TASK [etcd_migrate : Re-introduce leases (as a replacement for key TTLs)] ******
failed: [openshift-208.lab.eng.nay.redhat.com -> openshift-225.lab.eng.nay.redhat.com] (item=/kubernetes.io/events) => {
"changed": true,
"cmd": [
"oadm",
"migrate",
"etcd-ttl",
"--cert",
"/etc/etcd/peer.crt",
"--key",
"/etc/etcd/peer.key",
"--cacert",
"/etc/etcd/ca.crt",
"--etcd-address",
"https://192.168.1.186:2379",
"--ttl-keys-prefix",
"/kubernetes.io/events",
"--lease-duration",
"1h"
],
"delta": "0:00:00.278998",
"end": "2017-07-03 22:30:37.631742",
"failed": true,
"item": "/kubernetes.io/events",
"rc": 1,
"start": "2017-07-03 22:30:37.352744",
"warnings": []
}
STDERR:
error: open /etc/etcd/peer.crt: no such file or directory
failed: [openshift-208.lab.eng.nay.redhat.com -> openshift-225.lab.eng.nay.redhat.com] (item=/kubernetes.io/masterleases) => {
"changed": true,
"cmd": [
"oadm",
"migrate",
"etcd-ttl",
"--cert",
"/etc/etcd/peer.crt",
"--key",
"/etc/etcd/peer.key",
"--cacert",
"/etc/etcd/ca.crt",
"--etcd-address",
"https://192.168.1.186:2379",
"--ttl-keys-prefix",
"/kubernetes.io/masterleases",
"--lease-duration",
"1h"
],
"delta": "0:00:00.271130",
"end": "2017-07-03 22:30:38.155144",
"failed": true,
"item": "/kubernetes.io/masterleases",
"rc": 1,
"start": "2017-07-03 22:30:37.884014",
"warnings": []
}
STDERR:
error: open /etc/etcd/peer.crt: no such file or directory
NO MORE HOSTS LEFT *************************************************************
to retry, use: --limit @/root/openshift-ansible/playbooks/byo/openshift-etcd/migrate.retry
PLAY RECAP *********************************************************************
localhost : ok=15 changed=0 unreachable=0 failed=0
openshift-208.lab.eng.nay.redhat.com : ok=74 changed=9 unreachable=0 failed=1
openshift-225.lab.eng.nay.redhat.com : ok=19 changed=1 unreachable=0 failed=0
Another question, What shall we do to recover from failure when the playbook stopped at ' Re-introduce leases'? PR setting the proper certificates: https://github.com/openshift/openshift-ansible/pull/4671 The external clustered can be migrated via openshift-ansible-3.6.136 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716 |
Description of problem: The migrate failed when use the dedicated etcd clusters. For there isn't atomic-openshift packages on the dedicated etcd clusters. I guess we only need to run 'oadm migrate etcd-ttl' on the first master. Version-Release number of selected component (if applicable): openshift/openshift-ansible: Pull Request 4492. How reproducible: always Steps to Reproduce: 1. install OCP v3.5 with dedicated etcd clusters 2. upgrade to v3.6 3. migrate to etcd v3 anible-playbook openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml Actual results: TASK [etcd_migrate : Re-introduce leases (as a replacement for key TTLs)] ****** failed: [qe-auto-etcd-1.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/events) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.16:2379 --ttl-keys-prefix /kubernetes.io/events --lease-duration 1h", "failed": true, "item": "/kubernetes.io/events", "rc": 2 } MSG: [Errno 2] No such file or directory failed: [qe-auto-etcd-2.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/events) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.17:2379 --ttl-keys-prefix /kubernetes.io/events --lease-duration 1h", "failed": true, "item": "/kubernetes.io/events", "rc": 2 } MSG: [Errno 2] No such file or directory failed: [qe-auto-etcd-3.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/events) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.18:2379 --ttl-keys-prefix /kubernetes.io/events --lease-duration 1h", "failed": true, "item": "/kubernetes.io/events", "rc": 2 } MSG: [Errno 2] No such file or directory failed: [qe-auto-etcd-1.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/masterleases) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.16:2379 --ttl-keys-prefix /kubernetes.io/masterleases --lease-duration 1h", "failed": true, "item": "/kubernetes.io/masterleases", "rc": 2 } MSG: [Errno 2] No such file or directory failed: [qe-auto-etcd-2.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/masterleases) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.17:2379 --ttl-keys-prefix /kubernetes.io/masterleases --lease-duration 1h", "failed": true, "item": "/kubernetes.io/masterleases", "rc": 2 } MSG: [Errno 2] No such file or directory failed: [qe-auto-etcd-3.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/masterleases) => { "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.18:2379 --ttl-keys-prefix /kubernetes.io/masterleases --lease-duration 1h", "failed": true, "item": "/kubernetes.io/masterleases", "rc": 2 } MSG: [Errno 2] No such file or directory to retry, use: --limit @/root/openshift-ansible/playbooks/byo/openshift-etcd/migrate.retry PLAY RECAP ********************************************************************* localhost : ok=22 changed=0 unreachable=0 failed=0 qe-auto-etcd-1.0621-ktl.qe.rhcloud.com : ok=96 changed=9 unreachable=0 failed=1 qe-auto-etcd-2.0621-ktl.qe.rhcloud.com : ok=93 changed=9 unreachable=0 failed=1 qe-auto-etcd-3.0621-ktl.qe.rhcloud.com : ok=93 changed=9 unreachable=0 failed=1 qe-auto-master-1.0621-ktl.qe.rhcloud.com : ok=61 changed=3 unreachable=0 failed=0 qe-auto-node-registry-router-1.0621-ktl.qe.rhcloud.com : ok=56 changed=2 unreachable=0 failed=0 Expected results: Additional info: