1504604 – Original ocp does not work after migrate an embedded etcd to a fresh hosts

Bug 1504604 - Original ocp does not work after migrate an embedded etcd to a fresh hosts

Summary: Original ocp does not work after migrate an embedded etcd to a fresh hosts

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Jan Chaloupka
QA Contact:	liujia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-20 10:09 UTC by liujia
Modified:	2017-11-28 22:18 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-11-28 22:18:47 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:3188	0	normal	SHIPPED_LIVE	Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update	2017-11-29 02:34:54 UTC

Description liujia 2017-10-20 10:09:27 UTC

Description of problem:
Do embedded etcd migrate against rpm non-ha v3.6 ocp with an embedded, migrate 
playbook run successfully, But after migration, the ocp does not work.
For example:
1) atomic-openshift-master.service restart in loop
# systemctl status atomic-openshift-master.service 
● atomic-openshift-master.service - Atomic OpenShift Master
   Loaded: loaded (/etc/systemd/system/atomic-openshift-master.service; enabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: exit-code) since Fri 2017-10-20 05:57:35 EDT; 3s ago
     Docs: https://github.com/openshift/origin
  Process: 49089 ExecStart=/usr/bin/openshift start master --config=${CONFIG_FILE} $OPTIONS (code=exited, status=255)
 Main PID: 49089 (code=exited, status=255)

Oct 20 05:57:35 x-embed-master-nfs-1 systemd[1]: atomic-openshift-master.service: main process exited, code=exited, status=255/n/a
Oct 20 05:57:35 x-embed-master-nfs-1 systemd[1]: Failed to start Atomic OpenShift Master.
Oct 20 05:57:35 x-embed-master-nfs-1 systemd[1]: Unit atomic-openshift-master.service entered failed state.
Oct 20 05:57:35 x-embed-master-nfs-1 systemd[1]: atomic-openshift-master.service failed.

2) "oc get" can now get any data
# oc get node
The connection to the server x-embed-master-nfs-1:8443 was refused - did you specify the right host or port?

===============================
Check master log, master try to connect itself(10.240.0.49) but not new etcd host(10.240.0.56)
getsockopt: connection refused"; Reconnecting to {10.240.0.49:2379 <nil>}


# cat /etc/etcd/etcd.conf | grep LISTEN
ETCD_LISTEN_PEER_URLS=https://10.240.0.56:2380
ETCD_LISTEN_CLIENT_URLS=https://10.240.0.56:2379


Version-Release number of the following components:
openshift-ansible-3.7.0-0.167.0.git.0.0e34535.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Install v3.6 ocp with embedded etcd
2. Prepare repos on a new host(just install docker on it)
3. Edit hosts file to add etcd group
[OSEv3:children]
...
etcd
...
[etcd]
hostname...
//Specify a new host for etcd.
4. Do etcd migrate
# ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/embedded2external.yml

Actual results:
OCP does not work after migrate

Expected results:
OCP should works well after migrate

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 4 Jan Chaloupka 2017-10-23 13:07:55 UTC

I am able to reproduce it, I know what is wrong, I got a fix for it. I will open a PR in a few.

Comment 5 Jan Chaloupka 2017-10-23 13:32:15 UTC

Upstream PR: https://github.com/openshift/openshift-ansible/pull/5843

Comment 7 liujia 2017-10-26 09:40:08 UTC

Version:
openshift-ansible-3.7.0-0.179.0.git.0.a2641b6.el7.noarch

Steps:
1. Install v3.6 ocp with embedded etcd
2. Prepare repos on a new host(just install docker on it)
3. Edit hosts file to add etcd group
[OSEv3:children]
...
etcd
...
[etcd]
hostname...
//Specify a new host for etcd.
4. Do etcd migrate
# ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/embedded2external.yml

After migrate to external etcd, it works well now.

Comment 10 errata-xmlrpc 2017-11-28 22:18:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.