1490427 – [3.6.z] Stopping etcd leader can cause authentication failures due to OpenShift Master caching and using etcd leader IP address.

Bug 1490427 - [3.6.z] Stopping etcd leader can cause authentication failures due to OpenShift Master caching and using etcd leader IP address.

Summary: [3.6.z] Stopping etcd leader can cause authentication failures due to OpenShi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Master
Sub Component:
Version:	3.5.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.6.z
Assignee:	Robert Rati
QA Contact:	ge liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-09-11 14:11 UTC by Eric Rich
Modified:	2020-12-14 09:59 UTC (History)
CC List:	18 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: In some failure cases the etcd client used by OpenShift won't rotate through all the available etcd cluster members. The client will end up repeatedly trying the same server, and if that server is down then requests will fail for an extended time until the client finds the server invalid. Consequence: If the etcd leader goes away when it is attempted to be contacted for something like authentication then the authentication will fail and the etcd client will be stuck trying to talk to the etcd member that doesn't exist. User authentication would fail for an extended period of time. Fix: The etcd client now rotates to other cluster members even on failure. Result: If the etcd leader goes away, the worst that should happen is a failure of that one authentication attempt. The next attempt will succeed because a different etcd member will be used.
Clone Of:	1475184
Environment:
Last Closed:	2017-10-25 13:06:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	coreos etcd issues 8515	None	None	None	2017-09-11 14:11:25 UTC
Github	coreos etcd pull 8519	None	None	None	2017-09-11 14:11:25 UTC
Red Hat Product Errata	RHBA-2017:3049	normal	SHIPPED_LIVE	OpenShift Container Platform 3.6, 3.5, and 3.4 bug fix and enhancement update	2017-10-25 15:57:15 UTC

Comment 5 ge liu 2017-09-28 06:55:13 UTC

Verified in ocp version:

openshift v3.6.173.0.37
kubernetes v1.6.1+5115d708d7
etcd 3.2.1


Steps:

1). setup env with HA mode: 3 master, 4 nodes, 1 lb

2). run oc login/logout in a loop

3). turn off master service randomly

4). the login/logout works well without interruption

5). turn off the rest master service one by one

6). login/logout works well if there is one master service living at least, and report err after all master service turned off

error: EOF
error: You must have a token in order to logout.
error: EOF
error: You must have a token in order to logout.
error: EOF
error: You must have a token in order to logout.

Comment 7 errata-xmlrpc 2017-10-25 13:06:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049

Note You need to log in before you can comment on or make changes to this bug.