Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1388319 - master-controllers panics and crashes when setting maxScheduledImageImportsPerMinute: -1
master-controllers panics and crashes when setting maxScheduledImageImportsPe...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master (Show other bugs)
3.3.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Paul Weil
Chuan Yu
:
Depends On:
Blocks: 1494133
  Show dependency treegraph
 
Reported: 2016-10-25 02:21 EDT by Takayoshi Kimura
Modified: 2017-09-21 10:08 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
maxScheduledImageImportsPerMinute was previously documented as accepting -1 as a value to allow unlimited imports. When using -1 the cluster would experience a panic. maxScheduledImageImportsPerMinute now correctly accepts -1 as an unlimited value. Administrators who have set maxScheduledImageImportsPerMinute to an extremely high number as a workaround may leave the existing setting or now use -1.
Story Points: ---
Clone Of:
: 1494133 (view as bug list)
Environment:
Last Closed: 2017-08-10 01:15:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 05:02:50 EDT

  None (edit)
Description Takayoshi Kimura 2016-10-25 02:21:31 EDT
Description of problem:

Booting master-controllers always panic and crash with "panic: cannot find suitable quantum"

atomic-openshift-master-controllers[56139]: panic: cannot find suitable quantum for -0.01666666753590107
atomic-openshift-master-controllers[56139]: goroutine 200 [running]:
atomic-openshift-master-controllers[56139]: panic(0x36107a0, 0xc8296a9220)
atomic-openshift-master-controllers[56139]: /usr/lib/golang/src/runtime/panic.go:481 +0x3e6 fp=0xc829691a08 sp=0xc829691988
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/vendor/github.com/juju/ratelimit.NewBucketWithRate(0xbf91111120000000, 0xfffffffffffffffe, 0x40)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/vendor/github.com/juju/ratelimit/ratelimit.go:64 +0x150 fp=0xc829691a70 sp=0xc829691a08
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/flowcontrol.NewTokenBucketRateLimiter(0xbc888889, 0xfffffffffffffffe, 0x0, 0x0)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/util/flowcontrol/throttle.go:50 +0x3d fp=0xc829691aa8 sp=0xc829691a70
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/pkg/cmd/server/origin.(*MasterConfig).RunImageImportController(0xc820be4000)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/run_components.go:401 +0xa2 fp=0xc829691b80 sp=0xc829691aa8
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/pkg/cmd/server/start.startControllers(0xc820be4000, 0xc820a8b9e0, 0x0, 0x0)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_master.go:697 +0x1763 fp=0xc829691ef0 sp=0xc829691b80
atomic-openshift-master-controllers[56139]: github.com/openshift/origin/pkg/cmd/server/start.(*Master).Start.func1(0xc820be4000, 0xc820a8b9e0)
atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_master.go:415 +0x3d fp=0xc829691f70 sp=0xc829691ef0

Version-Release number of selected component (if applicable):

atomic-openshift-3.3.0.35-1.git.0.d7bd9b6.el7.x86_64

How reproducible:

Always in customer env

Steps to Reproduce:
1. Start atomic-openshift-master-controllers
2.
3.

Actual results:

master-controllers gets panic and crash with "panic: cannot find suitable quantum"

Expected results:

Boot normally

Additional info:

Upstream code: https://github.com/juju/ratelimit/blob/master/ratelimit.go#L64
Comment 1 Matthew Robson 2016-10-25 15:30:06 EDT
Per the discussion, this is an issue with ImportRateLimiter:    flowcontrol.NewTokenBucketRateLimiter [1] and setting maxScheduledImageImportsPerMinute: -1


imagePolicyConfig:
  disableScheduledImport: false
  maxImagesBulkImportedPerRepository: 1
  maxScheduledImageImportsPerMinute: -1
  scheduledImageImportMinimumIntervalSeconds: 60

Based on this line from the panic;

atomic-openshift-master-controllers[56139]: /builddir/build/BUILD/atomic-openshift-git-0.d7bd9b6/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/origin/run_components.go:401 +0xa2 fp=0xc829691b80 sp=0xc829691aa8

[1] https://github.com/openshift/origin/blob/master/pkg/cmd/server/origin/run_components.go#L408
Comment 2 Alexey Gladkov 2016-10-31 10:41:14 EDT
It seems it's broken for a long time. I tested the v1.2.1 [1] and it also contains this error.

[1] https://github.com/openshift/origin/releases/tag/v1.2.1
Comment 3 Michal Fojtik 2016-10-31 10:44:02 EDT
Adding UpcomingRelease as this is not a blocker (regression). We still have to fix this but I won't block the release on this.
Comment 4 Paul Weil 2016-10-31 10:47:29 EDT
Michal - there is a workaround for this IIRC - set the max to a very high value.  If it won't be fixed (why? too risky?) we should at least get a known issues doc or kb for this.
Comment 5 Michal Fojtik 2016-10-31 10:57:24 EDT
(In reply to Paul Weil from comment #4)
> Michal - there is a workaround for this IIRC - set the max to a very high
> value.  If it won't be fixed (why? too risky?) we should at least get a
> known issues doc or kb for this.

I think you can set disableScheduledImport: true in case you want to disable the scheduled import? I think we can validate this and only allow positive values, but we will have to backport this.
Comment 6 Paul Weil 2016-10-31 11:00:27 EDT
-1 is supposed to mean unlimited (https://docs.openshift.org/latest/install_config/master_node_configuration.html).  Disabling isn't going to help in this situation.  But setting it to something like 5000 would effectively give unlimited and avoid the -1 bug.
Comment 7 Michal Fojtik 2017-02-01 07:18:48 EST
I think the solution Paul proposed (use very high number to get "unlimited" behavior seems correct).

Would that be sufficient to close this bug?
Comment 8 Takayoshi Kimura 2017-02-01 19:24:31 EST
No according to the current doc, the -1 value should work, so at least we need to fix the doc if we won't fix the code.
Comment 9 Michal Fojtik 2017-02-02 09:17:00 EST
(In reply to Takayoshi Kimura from comment #8)
> No according to the current doc, the -1 value should work, so at least we
> need to fix the doc if we won't fix the code.

I would rather update the documentation. Thanks!
Comment 11 Michal Fojtik 2017-02-02 09:22:00 EST
Docs PR: https://github.com/openshift/openshift-docs/pull/3638
Comment 12 Michal Fojtik 2017-02-20 09:16:44 EST
(we should validate the -1 and refuse it in validation, only positive value allowed).
Comment 13 Paul Weil 2017-03-08 16:41:33 EST
PR: https://github.com/openshift/origin/pull/13315
Comment 14 Chuan Yu 2017-03-20 05:52:34 EDT
The doc changed is ok.

For the validation for value -1, I used the devenv-rhel7_6073 image, when setting 'maxScheduledImageImportsPerMinute: -1', the openshift start successfully, and no panics any more, does this the correct result?
Comment 15 Paul Weil 2017-03-20 08:59:12 EDT
Yes, should be disabling rate limiting with that value.
Comment 16 Chuan Yu 2017-03-23 02:57:10 EDT
Does this bug will be fixed into OCP 3.3? If so, I will verify it with 3.3 puddle.
Comment 17 Paul Weil 2017-03-23 07:55:02 EDT
This is only for 3.6 and will not be backported.  Current workaround is still to set a high number on previous releases.
Comment 18 Chuan Yu 2017-03-24 05:49:34 EDT
Waiting for 3.6 new build to verify, change the status to modified.
Comment 19 Troy Dawson 2017-04-11 17:06:03 EDT
This has been merged into ocp and is in OCP v3.6.27 or newer.
Comment 21 Chuan Yu 2017-04-11 22:49:48 EDT
Verified in OCP 3.6.27.
# openshift version
openshift v3.6.27
kubernetes v1.5.2+43a9be4
etcd 3.1.0
Comment 23 errata-xmlrpc 2017-08-10 01:15:47 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716

Note You need to log in before you can comment on or make changes to this bug.