Bug 1463494 - oadm migrate etcd-ttl failed when use the dedicated etcd clusters
oadm migrate etcd-ttl failed when use the dedicated etcd clusters
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Upgrade (Show other bugs)
3.6.0
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Jan Chaloupka
Anping Li
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-21 02:30 EDT by Anping Li
Modified: 2017-08-16 15 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-10 01:28:56 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Migrade logs (120.00 KB, application/x-tar)
2017-06-22 02:09 EDT, Anping Li
no flags Details

  None (edit)
Description Anping Li 2017-06-21 02:30:41 EDT
Description of problem:
The migrate failed when use the dedicated etcd clusters. For there isn't atomic-openshift packages on the dedicated etcd clusters. I guess we only need to run 'oadm migrate etcd-ttl' on the first master.

Version-Release number of selected component (if applicable):
openshift/openshift-ansible: Pull Request 4492. 

How reproducible:
always

Steps to Reproduce:
1. install OCP v3.5 with dedicated etcd clusters
2. upgrade to v3.6
3. migrate to etcd v3
   anible-playbook openshift-ansible/playbooks/byo/openshift-etcd/migrate.yml

Actual results:
TASK [etcd_migrate : Re-introduce leases (as a replacement for key TTLs)] ******
failed: [qe-auto-etcd-1.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/events) => {
    "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.16:2379 --ttl-keys-prefix /kubernetes.io/events --lease-duration 1h", 
    "failed": true, 
    "item": "/kubernetes.io/events", 
    "rc": 2
}

MSG:

[Errno 2] No such file or directory

failed: [qe-auto-etcd-2.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/events) => {
    "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.17:2379 --ttl-keys-prefix /kubernetes.io/events --lease-duration 1h", 
    "failed": true, 
    "item": "/kubernetes.io/events", 
    "rc": 2
}

MSG:

[Errno 2] No such file or directory

failed: [qe-auto-etcd-3.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/events) => {
    "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.18:2379 --ttl-keys-prefix /kubernetes.io/events --lease-duration 1h", 
    "failed": true, 
    "item": "/kubernetes.io/events", 
    "rc": 2
}

MSG:

[Errno 2] No such file or directory

failed: [qe-auto-etcd-1.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/masterleases) => {
    "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.16:2379 --ttl-keys-prefix /kubernetes.io/masterleases --lease-duration 1h", 
    "failed": true, 
    "item": "/kubernetes.io/masterleases", 
    "rc": 2
}

MSG:

[Errno 2] No such file or directory

failed: [qe-auto-etcd-2.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/masterleases) => {
    "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.17:2379 --ttl-keys-prefix /kubernetes.io/masterleases --lease-duration 1h", 
    "failed": true, 
    "item": "/kubernetes.io/masterleases", 
    "rc": 2
}

MSG:

[Errno 2] No such file or directory

failed: [qe-auto-etcd-3.0621-ktl.qe.rhcloud.com] (item=/kubernetes.io/masterleases) => {
    "cmd": "oadm migrate etcd-ttl --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt --etcd-address https://10.240.0.18:2379 --ttl-keys-prefix /kubernetes.io/masterleases --lease-duration 1h", 
    "failed": true, 
    "item": "/kubernetes.io/masterleases", 
    "rc": 2
}

MSG:

[Errno 2] No such file or directory

    to retry, use: --limit @/root/openshift-ansible/playbooks/byo/openshift-etcd/migrate.retry

PLAY RECAP *********************************************************************
localhost                  : ok=22   changed=0    unreachable=0    failed=0   
qe-auto-etcd-1.0621-ktl.qe.rhcloud.com : ok=96   changed=9    unreachable=0    failed=1   
qe-auto-etcd-2.0621-ktl.qe.rhcloud.com : ok=93   changed=9    unreachable=0    failed=1   
qe-auto-etcd-3.0621-ktl.qe.rhcloud.com : ok=93   changed=9    unreachable=0    failed=1   
qe-auto-master-1.0621-ktl.qe.rhcloud.com : ok=61   changed=3    unreachable=0    failed=0   
qe-auto-node-registry-router-1.0621-ktl.qe.rhcloud.com : ok=56   changed=2    unreachable=0    failed=0   

Expected results:


Additional info:
Comment 1 Anping Li 2017-06-21 02:34:53 EDT
The example inventory
[OSEv3:children]
masters
nodes
etcd
[OSEv3:vars]
ansible_ssh_user=root
xxxx
xxxx
[masters]
master.example.com
[nodes]
master.example.com
node.example.com
[etcd]
etcd1.example.com
etcd2.example.com
etcd3.example.com
Comment 2 Anping Li 2017-06-22 02:09 EDT
Created attachment 1290503 [details]
Migrade logs

hit 'oadm migrate etcd-ttl' error with custers etcd (installed on masters).  I think that is same issue. 

[masters]
host-8-174-222.host.centralci.eng.rdu2.redhat.com 
host-8-174-253.host.centralci.eng.rdu2.redhat.com 
host-8-175-112.host.centralci.eng.rdu2.redhat.com

[etcd]
host-8-174-222.host.centralci.eng.rdu2.redhat.com
host-8-174-253.host.centralci.eng.rdu2.redhat.com
host-8-175-112.host.centralci.eng.rdu2.redhat.com

[nodes]
host-8-174-222.host.centralci.eng.rdu2.redhat.com
host-8-174-253.host.centralci.eng.rdu2.redhat.com
host-8-175-112.host.centralci.eng.rdu2.redhat.com
host-8-175-68.host.centralci.eng.rdu2.redhat.com 
host-8-175-73.host.centralci.eng.rdu2.redhat.com
[lb]
host-8-175-186.host.centralci.eng.rdu2.redhat.com
[nfs]
host-8-175-186.host.centralci.eng.rdu2.redhat.com
Comment 3 Scott Dodson 2017-06-22 09:15:44 EDT
delegate_to: {{ oo_first_master }} so that it's run on the first master
Comment 4 Jan Chaloupka 2017-06-23 09:06:37 EDT
Fixed as part of https://github.com/openshift/openshift-ansible/pull/4558
Comment 5 Jan Chaloupka 2017-06-28 08:00:27 EDT
More specific upstream PR: https://github.com/openshift/openshift-ansible/pull/4623

The #4558 can be ignored for this issue.
Comment 6 Jan Chaloupka 2017-06-28 08:14:40 EDT
Merged upstream
Comment 8 Anping Li 2017-07-03 22:41:22 EDT
With master branch, I get the following error. Should I use the errata puddle?

[root@anli host2]# cat hosts 
[OSEv3:children]
masters
nodes
etcd
nfs

[OSEv3:vars]
deployment_type=openshift-enterprise
ansible_become=true
ansible_user=root
openshift_auth_type=allowall
openshift_master_identity_providers=[{'name': 'allow_all', 'login': 'true', 'challenge': 'true', 'kind': 'AllowAllPasswordIdentityProvider'}]
openshift_image_tag=v3.6.129
containerized=true
enable_excluders=false
openshift_master_cert_expire_days=365
openshift_disable_check=disk_availability,docker_storage,memory_availability

[masters]
openshift-225.lab.eng.nay.redhat.com openshift_public_hostname=openshift-225.lab.eng.nay.redhat.com openshift_hostname=openshift-225.lab.eng.nay.redhat.com
[nodes]
openshift-225.lab.eng.nay.redhat.com openshift_public_hostname=openshift-225.lab.eng.nay.redhat.com openshift_hostname=openshift-225.lab.eng.nay.redhat.com  openshift_node_labels="{'region': 'infra'}"
[etcd]
openshift-208.lab.eng.nay.redhat.com openshift_public_hostname=openshift-208.lab.eng.nay.redhat.com openshift_hostname=openshift-208.lab.eng.nay.redhat.com
[nfs]
openshift-208.lab.eng.nay.redhat.com



TASK [etcd_migrate : Re-introduce leases (as a replacement for key TTLs)] ******
failed: [openshift-208.lab.eng.nay.redhat.com -> openshift-225.lab.eng.nay.redhat.com] (item=/kubernetes.io/events) => {
    "changed": true, 
    "cmd": [
        "oadm", 
        "migrate", 
        "etcd-ttl", 
        "--cert", 
        "/etc/etcd/peer.crt", 
        "--key", 
        "/etc/etcd/peer.key", 
        "--cacert", 
        "/etc/etcd/ca.crt", 
        "--etcd-address", 
        "https://192.168.1.186:2379", 
        "--ttl-keys-prefix", 
        "/kubernetes.io/events", 
        "--lease-duration", 
        "1h"
    ], 
    "delta": "0:00:00.278998", 
    "end": "2017-07-03 22:30:37.631742", 
    "failed": true, 
    "item": "/kubernetes.io/events", 
    "rc": 1, 
    "start": "2017-07-03 22:30:37.352744", 
    "warnings": []
}

STDERR:

error: open /etc/etcd/peer.crt: no such file or directory

failed: [openshift-208.lab.eng.nay.redhat.com -> openshift-225.lab.eng.nay.redhat.com] (item=/kubernetes.io/masterleases) => {
    "changed": true, 
    "cmd": [
        "oadm", 
        "migrate", 
        "etcd-ttl", 
        "--cert", 
        "/etc/etcd/peer.crt", 
        "--key", 
        "/etc/etcd/peer.key", 
        "--cacert", 
        "/etc/etcd/ca.crt", 
        "--etcd-address", 
        "https://192.168.1.186:2379", 
        "--ttl-keys-prefix", 
        "/kubernetes.io/masterleases", 
        "--lease-duration", 
        "1h"
    ], 
    "delta": "0:00:00.271130", 
    "end": "2017-07-03 22:30:38.155144", 
    "failed": true, 
    "item": "/kubernetes.io/masterleases", 
    "rc": 1, 
    "start": "2017-07-03 22:30:37.884014", 
    "warnings": []
}

STDERR:

error: open /etc/etcd/peer.crt: no such file or directory


NO MORE HOSTS LEFT *************************************************************
	to retry, use: --limit @/root/openshift-ansible/playbooks/byo/openshift-etcd/migrate.retry

PLAY RECAP *********************************************************************
localhost                  : ok=15   changed=0    unreachable=0    failed=0   
openshift-208.lab.eng.nay.redhat.com : ok=74   changed=9    unreachable=0    failed=1   
openshift-225.lab.eng.nay.redhat.com : ok=19   changed=1    unreachable=0    failed=0
Comment 9 Anping Li 2017-07-03 23:09:05 EDT
Another question,  What shall we do to recover from failure when the playbook stopped at ' Re-introduce leases'?
Comment 10 Jan Chaloupka 2017-07-04 04:40:19 EDT
PR setting the proper certificates: https://github.com/openshift/openshift-ansible/pull/4671
Comment 12 Anping Li 2017-07-06 05:32:15 EDT
The external clustered can be migrated via openshift-ansible-3.6.136
Comment 14 errata-xmlrpc 2017-08-10 01:28:56 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716

Note You need to log in before you can comment on or make changes to this bug.