Bug 1490427 - [3.6.z] Stopping etcd leader can cause authentication failures due to OpenShift Master caching and using etcd leader IP address.
Summary: [3.6.z] Stopping etcd leader can cause authentication failures due to OpenShi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.6.z
Assignee: Robert Rati
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-11 14:11 UTC by Eric Rich
Modified: 2020-12-14 09:59 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: In some failure cases the etcd client used by OpenShift won't rotate through all the available etcd cluster members. The client will end up repeatedly trying the same server, and if that server is down then requests will fail for an extended time until the client finds the server invalid. Consequence: If the etcd leader goes away when it is attempted to be contacted for something like authentication then the authentication will fail and the etcd client will be stuck trying to talk to the etcd member that doesn't exist. User authentication would fail for an extended period of time. Fix: The etcd client now rotates to other cluster members even on failure. Result: If the etcd leader goes away, the worst that should happen is a failure of that one authentication attempt. The next attempt will succeed because a different etcd member will be used.
Clone Of: 1475184
Environment:
Last Closed: 2017-10-25 13:06:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github coreos etcd issues 8515 0 None None None 2017-09-11 14:11:25 UTC
Github coreos etcd pull 8519 0 None None None 2017-09-11 14:11:25 UTC
Red Hat Product Errata RHBA-2017:3049 0 normal SHIPPED_LIVE OpenShift Container Platform 3.6, 3.5, and 3.4 bug fix and enhancement update 2017-10-25 15:57:15 UTC

Comment 5 ge liu 2017-09-28 06:55:13 UTC
Verified in ocp version:

openshift v3.6.173.0.37
kubernetes v1.6.1+5115d708d7
etcd 3.2.1


Steps:

1). setup env with HA mode: 3 master, 4 nodes, 1 lb

2). run oc login/logout in a loop

3). turn off master service randomly

4). the login/logout works well without interruption

5). turn off the rest master service one by one

6). login/logout works well if there is one master service living at least, and report err after all master service turned off

error: EOF
error: You must have a token in order to logout.
error: EOF
error: You must have a token in order to logout.
error: EOF
error: You must have a token in order to logout.

Comment 7 errata-xmlrpc 2017-10-25 13:06:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3049


Note You need to log in before you can comment on or make changes to this bug.