Bug 1508061 - master-controllers panic running conformance. concurrent map iteration and map write in start_master.go
Summary: master-controllers panic running conformance. concurrent map iteration and m...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 3.7.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 3.7.0
Assignee: Michal Fojtik
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-10-31 19:00 UTC by Mike Fiedler
Modified: 2021-03-11 16:09 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-18 13:23:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
system log (987.66 KB, application/x-gzip)
2017-10-31 19:00 UTC, Mike Fiedler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:3464 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.7 bug fix and enhancement update 2017-12-18 18:22:05 UTC

Description Mike Fiedler 2017-10-31 19:00:33 UTC
Created attachment 1346056 [details]
system log

Description of problem:

While trying to the fix for the master-api panic in bug 1506375, hit a different panic in master-controllers.   It crashed and restarted.   This was again running the conformance test. 


Full log attached - search on 'fatal error'



Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: fatal error: concurrent map iteration and map write
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: goroutine 158 [running]:
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: runtime.throw(0x5c74762, 0x26)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /usr/lib/golang/src/runtime/panic.go:596 +0x95 fp=0xc4215394a0 sp=0xc421539480
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: runtime.mapiternext(0xc421539650)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /usr/lib/golang/src/runtime/hashmap.go:737 +0x7ee fp=0xc421539550 sp=0xc4215394a0
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/util/flags.Apply(0xc421094f90, 0xc42116a120, 0x3c22f3d, 0x53f59a0, 0xc4214e8220)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/util/flags/flags.go:17 +0x363 fp=0xc4215396c0 sp=0xc421539550
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/util/flags.Resolve(0xc421094f90, 0xc4214e8220, 0x408765, 0x0, 0xc421539768)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/util/flags/flags.go:36 +0x96 fp=0xc421539700 sp=0xc4215396c0
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/server/start.getOpenshiftControllerOptions(0xc421094f90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x5b9c33f, 0x2)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/controllers.go:73 +0x7e fp=0xc421539778 sp=0xc421539700
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/server/start.(*Master).Start.func2(0xf2c4c00, 0xc4215f7d10, 0xf2ece00, 0xc420ad76b0, 0xc4201ce710, 0xc420748d20, 0x41, 0xc420ad6c60, 0xc4216cfe00, 0xc421083140, ...)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_master.go:488 +0x23d fp=0xc421539f60 sp=0xc421539778



Version-Release number of selected component (if applicable): 3.7.0-0.188.0


How reproducible: Hit it once out of 2 times running conformance.  Will clean up and see if it is reliably reproducible.


Steps to Reproduce:  see bug 1506375 for full steps

Comment 2 Michal Fojtik 2017-11-01 09:36:17 UTC
Proposed fix: https://github.com/openshift/origin/pull/17127

Comment 3 openshift-github-bot 2017-11-01 19:59:21 UTC
Commits pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/9b2f23d0536eea23544bb7fe9f322e3616624759
Bug 1508061: Fix panic when accessing controller args

https://github.com/openshift/origin/commit/7968b969cc6e33b3e3f40caee6d46f133b83c753
Merge pull request #17127 from mfojtik/fix-initialization-panic

Automatic merge from submit-queue.

Bug 1508061: Fix panic during openshift controller options initialization

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508061

Basically `newKubeControllerManager` mutates the cmdLineArgs in parallel to `getOpenshiftControllerOptions` which is trying to read it.

@deads2k @sttts PTAL, i consider this 3.7 blocker.

Comment 4 Mike Fiedler 2017-11-09 23:51:26 UTC
Verified on 3.7.5.   Ran conformance multiple times with master restarts and no panics observed.

Comment 8 errata-xmlrpc 2017-12-18 13:23:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3464


Note You need to log in before you can comment on or make changes to this bug.