Hide Forgot
Description of problem: creating a router, then deleting it, then creating a router with the same name again results in ending up with 0 desired replicas 50% of the time for me. I don't know if timeframe between deletion and creation matters. But IMO replicas should not be ignored either way. Also I don't know if the issue has anything to do with the errors for existing service account and cluster role binding. These are tracked in bug 1381378. See below two creation commands. I executed some 8-10 times and it was always like once replicas is 0 once 2 and iterating like that. > [root@openshift-125 ~]# oadm router tc-518936 --images=registry.access.redhat.com/openshift3/ose-haproxy-router:v3.3.0.34 --replicas=2 > info: password for stats user admin has been set to cUsztwMFRB > --> Creating router tc-518936 ... > error: serviceaccounts "router" already exists > error: rolebinding "router-tc-518936-role" already exists > deploymentconfig "tc-518936" created > service "tc-518936" created > --> Failed > [root@openshift-125 ~]# oc get dc > NAME REVISION DESIRED CURRENT TRIGGERED BY > docker-registry 2 1 1 config > router 4 0 0 config > tc-518936 1 2 0 config > [root@openshift-125 ~]# oc delete dc/tc-518936 svc/tc-518936deploymentconfig > "tc-518936" deleted > service "tc-518936" deleted > [root@openshift-125 ~]# oadm router tc-518936 --images=registry.access.redhat.com/openshift3/ose-haproxy-router:v3.3.0.34 --replicas=2 > info: password for stats user admin has been set to EbtHgW5E2d > --> Creating router tc-518936 ... > error: serviceaccounts "router" already exists > error: rolebinding "router-tc-518936-role" already exists > deploymentconfig "tc-518936" created > service "tc-518936" created > --> Failed > [root@openshift-125 ~]# oc get dc > NAME REVISION DESIRED CURRENT TRIGGERED BY > docker-registry 2 1 1 config > router 4 0 0 config > tc-518936 1 0 0 config Version-Release number of selected component (if applicable): > oc v3.3.0.34 > kubernetes v1.3.0+52492b4 How reproducible: 50% Steps to Reproduce: 1. create route 2. delete dc and svc 3. create route again 4. oc get dc Actual results: desired replicas 0 Expected results: desired replicas 2
can you reproduce with `--loglevel=10`, just so we can double-check the requests that `oadm router` is sending?
Created attachment 1209705 [details] loglevel 10 reproducer on origin It was not as consistent on origin but I could reproduce (see attached) with: > oc v1.4.0-alpha.0+8f6030a > kubernetes v1.4.0+776c994 > features: Basic-Auth GSSAPI Kerberos SPNEGO > > Server https://172.18.14.117:8443 > openshift v1.4.0-alpha.0+8f6030a > kubernetes v1.4.0+776c994
The actual requests and responses being sent to the API server look fine. Is there any way we could get the controller manager logs (at log level at least 4) from when this was running, to see if anything looks off in the DC controller? Also, can we get dumps of the DC and any deployments as YAML or JSON?
If you tell me steps how to obtain that log, I'll give it a go. But perhaps it will be more time efficient to try reproducing in an environment you have access to.
I can no longer reproduce on 3.4
I agree we are not going to backport this back to 3.3. QA: Can you verify this on 3.4?
Can't reproduce this issue with latest OCP3.4: openshift version openshift v3.4.0.37+3b76456-1 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 [root@ip-172-18-11-194 ~]# oc get dc NAME REVISION DESIRED CURRENT TRIGGERED BY docker-registry 2 3 3 config registry-console 1 1 1 config tester 1 2 2 config [root@ip-172-18-11-194 ~]# oc get po NAME READY STATUS RESTARTS AGE docker-registry-2-axuso 1/1 Running 0 45m docker-registry-2-sy6yu 1/1 Running 0 45m docker-registry-2-v0d2q 1/1 Running 0 45m registry-console-1-cp1i6 1/1 Running 0 44m tester-1-hni4v 1/1 Running 0 56s tester-1-zfpzh 1/1 Running 0 56s