Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1388319

Summary:	master-controllers panics and crashes when setting maxScheduledImageImportsPerMinute: -1
Product:	OpenShift Container Platform	Reporter:	Takayoshi Kimura <tkimura>
Component:	Master	Assignee:	Paul Weil <pweil>
Status:	CLOSED ERRATA	QA Contact:	Chuan Yu <chuyu>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.3.0	CC:	aos-bugs, bvincell, jokerman, mfojtik, mmccomas, mrobson, pweil, tkimura
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	maxScheduledImageImportsPerMinute was previously documented as accepting -1 as a value to allow unlimited imports. When using -1 the cluster would experience a panic. maxScheduledImageImportsPerMinute now correctly accepts -1 as an unlimited value. Administrators who have set maxScheduledImageImportsPerMinute to an extremely high number as a workaround may leave the existing setting or now use -1.	Story Points:	---
Clone Of:
Clones:	1494133 (view as bug list)		Environment:
Last Closed:	2017-08-10 05:15:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1494133

Description Takayoshi Kimura 2016-10-25 06:21:31 UTC

Description of problem:

Booting master-controllers always panic and crash with "panic: cannot find suitable quantum"

atomic-openshift-master-controllers[56139]: panic: cannot find suitable quantum for -0.01666666753590107
atomic-openshift-master-controllers[56139]: goroutine 200 [running]:
atomic-openshift-master-controllers[56139]: panic(0x36107a0, 0xc8296a9220)
atomic-openshift-master-controllers[56139]: /usr/lib/golang/src/runtime/panic.go:481 +0x3e6 fp=0xc829691a08 sp=0xc829691988
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/vendor/github.com/juju/ratelimit.NewBucketWithRate(0xbf91111120000000, 0xfffffffffffffffe, 0x40)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/vendor/github.com/juju/ratelimit/ratelimit.go:64 +0x150 fp=0xc829691a70 sp=0xc829691a08
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/flowcontrol.NewTokenBucketRateLimiter(0xbc888889, 0xfffffffffffffffe, 0x0, 0x0)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/flowcontrol/throttle.go:50 +0x3d fp=0xc829691aa8 sp=0xc829691a70
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/pkg/cmd/server/origin.(*MasterConfig).RunImageImportController(0xc820be4000)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/run_components.go:401 +0xa2 fp=0xc829691b80 sp=0xc829691aa8
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/pkg/cmd/server/start.startControllers(0xc820be4000, 0xc820a8b9e0, 0x0, 0x0)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_master.go:697 +0x1763 fp=0xc829691ef0 sp=0xc829691b80
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/pkg/cmd/server/start.(*Master).Start.func1(0xc820be4000, 0xc820a8b9e0)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_master.go:415 +0x3d fp=0xc829691f70 sp=0xc829691ef0

Version-Release number of selected component (if applicable):

atomic-openshift-3.3.0.35-1.git.0.d7bd9b6.el7.x86_64

How reproducible:

Always in customer env

Steps to Reproduce:
1. Start atomic-openshift-master-controllers
2.
3.

Actual results:

master-controllers gets panic and crash with "panic: cannot find suitable quantum"

Expected results:

Boot normally

Additional info:

Upstream code: https://github.com/juju/ratelimit/blob/master/ratelimit.go#L64

Comment 1 Matthew Robson 2016-10-25 19:30:06 UTC

Per the discussion, this is an issue with ImportRateLimiter:    flowcontrol.NewTokenBucketRateLimiter [1] and setting maxScheduledImageImportsPerMinute: -1


imagePolicyConfig:
  disableScheduledImport: false
  maxImagesBulkImportedPerRepository: 1
  maxScheduledImageImportsPerMinute: -1
  scheduledImageImportMinimumIntervalSeconds: 60

Based on this line from the panic;

atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/run_components.go:401 +0xa2 fp=0xc829691b80 sp=0xc829691aa8

[1] https://github.com/openshift/origin/blob/master/pkg/cmd/server/origin/run_components.go#L408

Comment 2 Alexey Gladkov 2016-10-31 14:41:14 UTC

It seems it's broken for a long time. I tested the v1.2.1 [1] and it also contains this error.

[1] https://github.com/openshift/origin/releases/tag/v1.2.1

Comment 3 Michal Fojtik 2016-10-31 14:44:02 UTC

Adding UpcomingRelease as this is not a blocker (regression). We still have to fix this but I won't block the release on this.

Comment 4 Paul Weil 2016-10-31 14:47:29 UTC

Michal - there is a workaround for this IIRC - set the max to a very high value.  If it won't be fixed (why? too risky?) we should at least get a known issues doc or kb for this.

Comment 5 Michal Fojtik 2016-10-31 14:57:24 UTC

(In reply to Paul Weil from comment #4)
> Michal - there is a workaround for this IIRC - set the max to a very high
> value.  If it won't be fixed (why? too risky?) we should at least get a
> known issues doc or kb for this.

I think you can set disableScheduledImport: true in case you want to disable the scheduled import? I think we can validate this and only allow positive values, but we will have to backport this.

Comment 6 Paul Weil 2016-10-31 15:00:27 UTC

-1 is supposed to mean unlimited (https://docs.openshift.org/latest/install_config/master_node_configuration.html).  Disabling isn't going to help in this situation.  But setting it to something like 5000 would effectively give unlimited and avoid the -1 bug.

Comment 7 Michal Fojtik 2017-02-01 12:18:48 UTC

I think the solution Paul proposed (use very high number to get "unlimited" behavior seems correct).

Would that be sufficient to close this bug?

Comment 8 Takayoshi Kimura 2017-02-02 00:24:31 UTC

No according to the current doc, the -1 value should work, so at least we need to fix the doc if we won't fix the code.

Comment 9 Michal Fojtik 2017-02-02 14:17:00 UTC

(In reply to Takayoshi Kimura from comment #8)
> No according to the current doc, the -1 value should work, so at least we
> need to fix the doc if we won't fix the code.

I would rather update the documentation. Thanks!

Comment 11 Michal Fojtik 2017-02-02 14:22:00 UTC

Docs PR: https://github.com/openshift/openshift-docs/pull/3638

Comment 12 Michal Fojtik 2017-02-20 14:16:44 UTC

(we should validate the -1 and refuse it in validation, only positive value allowed).

Comment 13 Paul Weil 2017-03-08 21:41:33 UTC

PR: https://github.com/openshift/origin/pull/13315

Comment 14 Chuan Yu 2017-03-20 09:52:34 UTC

The doc changed is ok.

For the validation for value -1, I used the devenv-rhel7_6073 image, when setting 'maxScheduledImageImportsPerMinute: -1', the openshift start successfully, and no panics any more, does this the correct result?

Comment 15 Paul Weil 2017-03-20 12:59:12 UTC

Yes, should be disabling rate limiting with that value.

Comment 16 Chuan Yu 2017-03-23 06:57:10 UTC

Does this bug will be fixed into OCP 3.3? If so, I will verify it with 3.3 puddle.

Comment 17 Paul Weil 2017-03-23 11:55:02 UTC

This is only for 3.6 and will not be backported.  Current workaround is still to set a high number on previous releases.

Comment 18 Chuan Yu 2017-03-24 09:49:34 UTC

Waiting for 3.6 new build to verify, change the status to modified.

Comment 19 Troy Dawson 2017-04-11 21:06:03 UTC

This has been merged into ocp and is in OCP v3.6.27 or newer.

Comment 21 Chuan Yu 2017-04-12 02:49:48 UTC

Verified in OCP 3.6.27.
# openshift version
openshift v3.6.27
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Comment 23 errata-xmlrpc 2017-08-10 05:15:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716