1892413 – Can't scale down masters

Bug 1892413 - Can't scale down masters [NEEDINFO]

Summary: Can't scale down masters

Keywords:
Status:	CLOSED DUPLICATE of bug 1880759
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Etcd
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Sam Batschelet
QA Contact:	ge liu
Docs Contact:
URL:
Whiteboard:	LifecycleStale
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-28 18:04 UTC by Karim Boumedhel
Modified:	2021-01-23 15:57 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-01-23 15:57:55 UTC
Target Upstream Version:
Embargoed:
Flags:	mfojtik: needinfo?

Attachments	(Terms of Use)
etcd logs failure (12.20 KB, text/plain) 2020-10-28 18:05 UTC, Karim Boumedhel	no flags	Details
View All

Description Karim Boumedhel 2020-10-28 18:04:29 UTC

Description of problem:
when trying to scale down from a two nodes cluster to a single one


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. create a single node cluster
2. expand it with an additional
3. try to get rid of the initial node by deleting him as member in etcd or deleting in Openshift with oc delete node and making him physically unavailable

Actual results:
access to api is lost, etcd hangs and crashes if rebooted


Expected results:
cluster continues to operate


Additional info:

Comment 1 Karim Boumedhel 2020-10-28 18:05:27 UTC

Created attachment 1724877 [details]
etcd logs failure

Comment 2 Sam Batschelet 2020-11-15 17:44:34 UTC

Did you get this figured out can we close this bug?

> ETCD_INITIAL_CLUSTER=qctsingle-master-1.karmalabs.com

this is in the wrong format needs to be <key>=<value>

> 3. try to get rid of the initial node by deleting him as member in etcd or deleting in Openshift with oc delete node and making him physically unavailable

Can we validate full steps to understand the problem better?

Comment 3 Karim Boumedhel 2020-11-16 23:02:12 UTC

no i havent figured it out

Comment 5 Michal Fojtik 2020-12-25 13:58:21 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 6 Sam Batschelet 2021-01-23 15:57:55 UTC

as etcd is a static pod if you delete the node `oc delete node` the member will continue to run. if the node is still part of the cluster and you scale down etcd via Cluster API `etcdctl member remove $id` and you only have 2 nodes net result will be quorum loss. Reason being because node is still existent we will try to scale etcd back up on that node. Butr because it was a previous member its data-dir still exists. This will cause etcd to fail to start which has a net result of quorum loss in a 2 member cluster. This is why scaling is only supported with a 3 member cluster. I am going to mark this a dupe of 1880759.

*** This bug has been marked as a duplicate of bug 1880759 ***

Note You need to log in before you can comment on or make changes to this bug.