Description of problem: Same username and password to login same openshift(auth: ldap) server. sometime can't login. Version-Release number of selected component (if applicable): openshift v3.1.0.4-5-gebe80f5 kubernetes v1.1.0-origin-1107-g4c8e6f4 etcd 2.1.2 How reproducible: 50% Steps to Reproduce: 1.Login openshift server # sum=0;for i in {1..500}; do oc login https://openshift-162.lab.eng.nay.redhat.com:8443 -u dma -p dma; if [[ $? == 0 ]]; then sum=$(($sum + 1)); fi ; done; echo $sum Actual results: 1.Sometime can't login, run 500 times, about 230 login failed. Expected results: 1.All the login opts should be successful Additional info:
Can you provide the following information? what is the HA setup? (how many masters, how many etcd servers) how is the https://openshift-162.lab.eng.nay.redhat.com:8443 URL pointing to multiple masters? do you have server logs from the master API? can you include the master configurations from all the HA masters?
Pretty sure this is a stale read issue when using an etcd cluster. I see this in the master configs: etcdClientInfo: ca: master.etcd-ca.crt certFile: master.etcd-client.crt keyFile: master.etcd-client.key urls: - https://openshift-159.lab.eng.nay.redhat.com:2379 - https://openshift-138.lab.eng.nay.redhat.com:2379 - https://openshift-155.lab.eng.nay.redhat.com:2379 So there are at least three etcd servers in place, right? 1. The token is created, written to etcd, and returned to the client. 2. The client then uses the token against the users/~ API 3. The authentication layer attempts to verify the token exists in etcd. There is no guarantee the same etcd server is queried for the token. In this case, I think a quorum read may be needed when the token is not found.
Fix pending in https://github.com/openshift/origin/pull/6530 Tested with an etcd cluster: ip=192.168.99.100 count=3 cluster_members=() for i in `seq 1 $count`; do cluster_members+=("etcd${i}=http://${ip}:700${i}") done IFS=',' eval 'initial_cluster="${cluster_members[*]}"' for i in `seq 1 $count`; do docker run -d -p 400${i}:400${i} -p 700${i}:700${i} \ --name "etcd${i}" quay.io/coreos/etcd:latest \ -name "etcd${i}" \ -advertise-client-urls "http://${ip}:400${i}" \ -listen-client-urls "http://0.0.0.0:400${i}" \ -initial-advertise-peer-urls "http://${ip}:700${i}" \ -listen-peer-urls "http://0.0.0.0:700${i}" \ -initial-cluster-token "my-etcd-cluster" \ -initial-cluster "${initial_cluster}" \ -initial-cluster-state "new" done Started from master-config file with: etcdClientInfo: urls: - http://192.168.99.100:4001 - http://192.168.99.100:4002 - http://192.168.99.100:4003
https://github.com/openshift/origin/pull/6530 in the merge queue
Verify on the latest origin evn, this bug is fixed.