1382142 – router startup sometimes ignores scale

Bug 1382142 - router startup sometimes ignores scale

Summary: router startup sometimes ignores scale

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Deployments
Sub Component:
Version:	3.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Michal Fojtik
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-05 21:09 UTC by Aleksandar Kostadinov
Modified:	2017-05-30 12:50 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-05-30 12:50:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
loglevel 10 reproducer on origin (29.81 KB, text/plain) 2016-10-12 18:42 UTC, Aleksandar Kostadinov	no flags	Details
View All

Description Aleksandar Kostadinov 2016-10-05 21:09:40 UTC

Description of problem:

creating a router, then deleting it, then creating a router with the same name again results in ending up with 0 desired replicas 50% of the time for me. I don't know if timeframe between deletion and creation matters. But IMO replicas should not be ignored either way.

Also I don't know if the issue has anything to do with the errors for existing service account and cluster role binding. These are tracked in bug 1381378.

See below two creation commands. I executed some 8-10 times and it was always like once replicas is 0 once 2 and iterating like that.

> [root@openshift-125 ~]# oadm router tc-518936 --images=registry.access.redhat.com/openshift3/ose-haproxy-router:v3.3.0.34 --replicas=2
> info: password for stats user admin has been set to cUsztwMFRB
> --> Creating router tc-518936 ...
>     error: serviceaccounts "router" already exists
>     error: rolebinding "router-tc-518936-role" already exists
>     deploymentconfig "tc-518936" created
>     service "tc-518936" created
> --> Failed
> [root@openshift-125 ~]# oc get dc
> NAME              REVISION   DESIRED   CURRENT   TRIGGERED BY
> docker-registry   2          1         1         config
> router            4          0         0         config
> tc-518936         1          2         0         config
> [root@openshift-125 ~]# oc delete dc/tc-518936 svc/tc-518936deploymentconfig > "tc-518936" deleted
> service "tc-518936" deleted
> [root@openshift-125 ~]# oadm router tc-518936 --images=registry.access.redhat.com/openshift3/ose-haproxy-router:v3.3.0.34 --replicas=2
> info: password for stats user admin has been set to EbtHgW5E2d
> --> Creating router tc-518936 ...
>     error: serviceaccounts "router" already exists
>     error: rolebinding "router-tc-518936-role" already exists
>     deploymentconfig "tc-518936" created
>     service "tc-518936" created
> --> Failed
> [root@openshift-125 ~]# oc get dc
> NAME              REVISION   DESIRED   CURRENT   TRIGGERED BY
> docker-registry   2          1         1         config
> router            4          0         0         config
> tc-518936         1          0         0         config


Version-Release number of selected component (if applicable):
> oc v3.3.0.34
> kubernetes v1.3.0+52492b4

How reproducible:
50%

Steps to Reproduce:
1. create route
2. delete dc and svc
3. create route again
4. oc get dc

Actual results:
desired replicas 0

Expected results:
desired replicas 2

Comment 1 Solly Ross 2016-10-07 19:53:18 UTC

can you reproduce with `--loglevel=10`, just so we can double-check the requests that `oadm router` is sending?

Comment 3 Aleksandar Kostadinov 2016-10-12 18:42:42 UTC

Created attachment 1209705 [details]
loglevel 10 reproducer on origin

It was not as consistent on origin but I could reproduce (see attached) with:

> oc v1.4.0-alpha.0+8f6030a
> kubernetes v1.4.0+776c994
> features: Basic-Auth GSSAPI Kerberos SPNEGO
> 
> Server https://172.18.14.117:8443
> openshift v1.4.0-alpha.0+8f6030a
> kubernetes v1.4.0+776c994

Comment 5 Solly Ross 2016-10-25 15:27:46 UTC

The actual requests and responses being sent to the API server look fine.  Is there any way we could get the controller manager logs (at log level at least 4) from when this was running, to see if anything looks off in the DC controller?  Also, can we get dumps of the DC and any deployments as YAML or JSON?

Comment 6 Aleksandar Kostadinov 2016-10-25 15:39:59 UTC

If you tell me steps how to obtain that log, I'll give it a go. But perhaps it will be more time efficient to try reproducing in an environment you have access to.

Comment 11 Aleksandar Kostadinov 2016-11-22 15:23:08 UTC

I can no longer reproduce on 3.4

Comment 12 Michal Fojtik 2016-12-15 08:50:28 UTC

I agree we are not going to backport this back to 3.3.

QA: Can you verify this on 3.4?

Comment 13 zhou ying 2016-12-19 07:03:29 UTC

Can't reproduce this issue with latest OCP3.4:
 openshift version
openshift v3.4.0.37+3b76456-1
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

[root@ip-172-18-11-194 ~]# oc get dc
NAME               REVISION   DESIRED   CURRENT   TRIGGERED BY
docker-registry    2          3         3         config
registry-console   1          1         1         config
tester             1          2         2         config
[root@ip-172-18-11-194 ~]# oc get po 
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-2-axuso    1/1       Running   0          45m
docker-registry-2-sy6yu    1/1       Running   0          45m
docker-registry-2-v0d2q    1/1       Running   0          45m
registry-console-1-cp1i6   1/1       Running   0          44m
tester-1-hni4v             1/1       Running   0          56s
tester-1-zfpzh             1/1       Running   0          56s

Note You need to log in before you can comment on or make changes to this bug.