1282718 – Login is failed with Unauthorized error sometimes on ha etcd environment

Bug 1282718 - Login is failed with Unauthorized error sometimes on ha etcd environment

Summary: Login is failed with Unauthorized error sometimes on ha etcd environment

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	apiserver-auth
Sub Component:
Version:	unspecified
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Jordan Liggitt
QA Contact:	weiwei jiang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1289603
TreeView+	depends on / blocked

Reported:	2015-11-17 08:52 UTC by DeShuai Ma
Modified:	2019-03-29 15:47 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1289603 (view as bug list)
Environment:
Last Closed:	2016-05-12 17:11:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description DeShuai Ma 2015-11-17 08:52:14 UTC

Description of problem:
Same username and password to login same openshift(auth: ldap) server. sometime can't login.

Version-Release number of selected component (if applicable):
openshift v3.1.0.4-5-gebe80f5
kubernetes v1.1.0-origin-1107-g4c8e6f4
etcd 2.1.2

How reproducible:
50%

Steps to Reproduce:
1.Login openshift server
# sum=0;for i in {1..500}; do oc login https://openshift-162.lab.eng.nay.redhat.com:8443 -u dma -p dma; if [[ $? == 0 ]]; then sum=$(($sum + 1)); fi ; done; echo $sum

Actual results:
1.Sometime can't login, run 500 times, about 230 login failed.

Expected results:
1.All the login opts should be successful

Additional info:

Comment 1 Jordan Liggitt 2015-11-18 07:05:08 UTC

Can you provide the following information?

what is the HA setup? (how many masters, how many etcd servers)

how is the https://openshift-162.lab.eng.nay.redhat.com:8443 URL pointing to multiple masters?

do you have server logs from the master API?

can you include the master configurations from all the HA masters?

Comment 5 Jordan Liggitt 2015-11-18 18:06:22 UTC

Pretty sure this is a stale read issue when using an etcd cluster. I see this in the master configs:

etcdClientInfo:
  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
    - https://openshift-159.lab.eng.nay.redhat.com:2379
    - https://openshift-138.lab.eng.nay.redhat.com:2379
    - https://openshift-155.lab.eng.nay.redhat.com:2379

So there are at least three etcd servers in place, right?

1. The token is created, written to etcd, and returned to the client.
2. The client then uses the token against the users/~ API
3. The authentication layer attempts to verify the token exists in etcd. There is no guarantee the same etcd server is queried for the token.

In this case, I think a quorum read may be needed when the token is not found.

Comment 6 Jordan Liggitt 2016-01-05 13:45:19 UTC

Fix pending in https://github.com/openshift/origin/pull/6530

Tested with an etcd cluster:



ip=192.168.99.100
count=3

cluster_members=()
for i in `seq 1 $count`;
do
  cluster_members+=("etcd${i}=http://${ip}:700${i}")
done  

IFS=',' eval 'initial_cluster="${cluster_members[*]}"'

for i in `seq 1 $count`;
do
  docker run -d -p 400${i}:400${i} -p 700${i}:700${i} \
   --name "etcd${i}" quay.io/coreos/etcd:latest \
   -name "etcd${i}" \
   -advertise-client-urls       "http://${ip}:400${i}" \
   -listen-client-urls          "http://0.0.0.0:400${i}" \
   -initial-advertise-peer-urls "http://${ip}:700${i}" \
   -listen-peer-urls            "http://0.0.0.0:700${i}" \
   -initial-cluster-token       "my-etcd-cluster" \
   -initial-cluster             "${initial_cluster}" \
   -initial-cluster-state       "new"
done


Started from master-config file with:

etcdClientInfo:
  urls:
  - http://192.168.99.100:4001
  - http://192.168.99.100:4002
  - http://192.168.99.100:4003

Comment 7 Jordan Liggitt 2016-01-13 05:40:10 UTC

https://github.com/openshift/origin/pull/6530 in the merge queue

Comment 8 DeShuai Ma 2016-01-14 05:58:31 UTC

Verify on the latest origin evn, this bug is fixed.

Note You need to log in before you can comment on or make changes to this bug.