Bug 1388319 - master-controllers panics and crashes when setting maxScheduledImageImportsPerMinute: -1
Summary: master-controllers panics and crashes when setting maxScheduledImageImportsPe...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Paul Weil
QA Contact: Chuan Yu
URL:
Whiteboard:
Depends On:
Blocks: 1494133
TreeView+ depends on / blocked
 
Reported: 2016-10-25 06:21 UTC by Takayoshi Kimura
Modified: 2021-09-09 11:58 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
maxScheduledImageImportsPerMinute was previously documented as accepting -1 as a value to allow unlimited imports. When using -1 the cluster would experience a panic. maxScheduledImageImportsPerMinute now correctly accepts -1 as an unlimited value. Administrators who have set maxScheduledImageImportsPerMinute to an extremely high number as a workaround may leave the existing setting or now use -1.
Clone Of:
: 1494133 (view as bug list)
Environment:
Last Closed: 2017-08-10 05:15:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Takayoshi Kimura 2016-10-25 06:21:31 UTC
Description of problem:

Booting master-controllers always panic and crash with "panic: cannot find suitable quantum"

atomic-openshift-master-controllers[56139]: panic: cannot find suitable quantum for -0.01666666753590107
atomic-openshift-master-controllers[56139]: goroutine 200 [running]:
atomic-openshift-master-controllers[56139]: panic(0x36107a0, 0xc8296a9220)
atomic-openshift-master-controllers[56139]: /usr/lib/golang/src/runtime/panic.go:481 +0x3e6 fp=0xc829691a08 sp=0xc829691988
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/vendor/github.com/juju/ratelimit.NewBucketWithRate(0xbf91111120000000, 0xfffffffffffffffe, 0x40)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/vendor/github.com/juju/ratelimit/ratelimit.go:64 +0x150 fp=0xc829691a70 sp=0xc829691a08
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/flowcontrol.NewTokenBucketRateLimiter(0xbc888889, 0xfffffffffffffffe, 0x0, 0x0)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/flowcontrol/throttle.go:50 +0x3d fp=0xc829691aa8 sp=0xc829691a70
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/pkg/cmd/server/origin.(*MasterConfig).RunImageImportController(0xc820be4000)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/run_components.go:401 +0xa2 fp=0xc829691b80 sp=0xc829691aa8
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/pkg/cmd/server/start.startControllers(0xc820be4000, 0xc820a8b9e0, 0x0, 0x0)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_master.go:697 +0x1763 fp=0xc829691ef0 sp=0xc829691b80
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/pkg/cmd/server/start.(*Master).Start.func1(0xc820be4000, 0xc820a8b9e0)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_master.go:415 +0x3d fp=0xc829691f70 sp=0xc829691ef0

Version-Release number of selected component (if applicable):

atomic-openshift-3.3.0.35-1.git.0.d7bd9b6.el7.x86_64

How reproducible:

Always in customer env

Steps to Reproduce:
1. Start atomic-openshift-master-controllers
2.
3.

Actual results:

master-controllers gets panic and crash with "panic: cannot find suitable quantum"

Expected results:

Boot normally

Additional info:

Upstream code: https://github.com/juju/ratelimit/blob/master/ratelimit.go#L64

Comment 1 Matthew Robson 2016-10-25 19:30:06 UTC
Per the discussion, this is an issue with ImportRateLimiter:    flowcontrol.NewTokenBucketRateLimiter [1] and setting maxScheduledImageImportsPerMinute: -1


imagePolicyConfig:
  disableScheduledImport: false
  maxImagesBulkImportedPerRepository: 1
  maxScheduledImageImportsPerMinute: -1
  scheduledImageImportMinimumIntervalSeconds: 60

Based on this line from the panic;

atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/run_components.go:401 +0xa2 fp=0xc829691b80 sp=0xc829691aa8

[1] https://github.com/openshift/origin/blob/master/pkg/cmd/server/origin/run_components.go#L408

Comment 2 Alexey Gladkov 2016-10-31 14:41:14 UTC
It seems it's broken for a long time. I tested the v1.2.1 [1] and it also contains this error.

[1] https://github.com/openshift/origin/releases/tag/v1.2.1

Comment 3 Michal Fojtik 2016-10-31 14:44:02 UTC
Adding UpcomingRelease as this is not a blocker (regression). We still have to fix this but I won't block the release on this.

Comment 4 Paul Weil 2016-10-31 14:47:29 UTC
Michal - there is a workaround for this IIRC - set the max to a very high value.  If it won't be fixed (why? too risky?) we should at least get a known issues doc or kb for this.

Comment 5 Michal Fojtik 2016-10-31 14:57:24 UTC
(In reply to Paul Weil from comment #4)
> Michal - there is a workaround for this IIRC - set the max to a very high
> value.  If it won't be fixed (why? too risky?) we should at least get a
> known issues doc or kb for this.

I think you can set disableScheduledImport: true in case you want to disable the scheduled import? I think we can validate this and only allow positive values, but we will have to backport this.

Comment 6 Paul Weil 2016-10-31 15:00:27 UTC
-1 is supposed to mean unlimited (https://docs.openshift.org/latest/install_config/master_node_configuration.html).  Disabling isn't going to help in this situation.  But setting it to something like 5000 would effectively give unlimited and avoid the -1 bug.

Comment 7 Michal Fojtik 2017-02-01 12:18:48 UTC
I think the solution Paul proposed (use very high number to get "unlimited" behavior seems correct).

Would that be sufficient to close this bug?

Comment 8 Takayoshi Kimura 2017-02-02 00:24:31 UTC
No according to the current doc, the -1 value should work, so at least we need to fix the doc if we won't fix the code.

Comment 9 Michal Fojtik 2017-02-02 14:17:00 UTC
(In reply to Takayoshi Kimura from comment #8)
> No according to the current doc, the -1 value should work, so at least we
> need to fix the doc if we won't fix the code.

I would rather update the documentation. Thanks!

Comment 11 Michal Fojtik 2017-02-02 14:22:00 UTC
Docs PR: https://github.com/openshift/openshift-docs/pull/3638

Comment 12 Michal Fojtik 2017-02-20 14:16:44 UTC
(we should validate the -1 and refuse it in validation, only positive value allowed).

Comment 13 Paul Weil 2017-03-08 21:41:33 UTC
PR: https://github.com/openshift/origin/pull/13315

Comment 14 Chuan Yu 2017-03-20 09:52:34 UTC
The doc changed is ok.

For the validation for value -1, I used the devenv-rhel7_6073 image, when setting 'maxScheduledImageImportsPerMinute: -1', the openshift start successfully, and no panics any more, does this the correct result?

Comment 15 Paul Weil 2017-03-20 12:59:12 UTC
Yes, should be disabling rate limiting with that value.

Comment 16 Chuan Yu 2017-03-23 06:57:10 UTC
Does this bug will be fixed into OCP 3.3? If so, I will verify it with 3.3 puddle.

Comment 17 Paul Weil 2017-03-23 11:55:02 UTC
This is only for 3.6 and will not be backported.  Current workaround is still to set a high number on previous releases.

Comment 18 Chuan Yu 2017-03-24 09:49:34 UTC
Waiting for 3.6 new build to verify, change the status to modified.

Comment 19 Troy Dawson 2017-04-11 21:06:03 UTC
This has been merged into ocp and is in OCP v3.6.27 or newer.

Comment 21 Chuan Yu 2017-04-12 02:49:48 UTC
Verified in OCP 3.6.27.
# openshift version
openshift v3.6.27
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Comment 23 errata-xmlrpc 2017-08-10 05:15:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.