Bug 1508061

Summary: master-controllers panic running conformance. concurrent map iteration and map write in start_master.go
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: MasterAssignee: Michal Fojtik <mfojtik>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: high Docs Contact:
Priority: high    
Version: 3.7.0CC: aos-bugs, decarr, eparis, jokerman, mmccomas, rkant
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-18 13:23:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
system log none

Description Mike Fiedler 2017-10-31 19:00:33 UTC
Created attachment 1346056 [details]
system log

Description of problem:

While trying to the fix for the master-api panic in bug 1506375, hit a different panic in master-controllers.   It crashed and restarted.   This was again running the conformance test. 


Full log attached - search on 'fatal error'



Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: fatal error: concurrent map iteration and map write
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: goroutine 158 [running]:
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: runtime.throw(0x5c74762, 0x26)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /usr/lib/golang/src/runtime/panic.go:596 +0x95 fp=0xc4215394a0 sp=0xc421539480
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: runtime.mapiternext(0xc421539650)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /usr/lib/golang/src/runtime/hashmap.go:737 +0x7ee fp=0xc421539550 sp=0xc4215394a0
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/util/flags.Apply(0xc421094f90, 0xc42116a120, 0x3c22f3d, 0x53f59a0, 0xc4214e8220)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/util/flags/flags.go:17 +0x363 fp=0xc4215396c0 sp=0xc421539550
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/util/flags.Resolve(0xc421094f90, 0xc4214e8220, 0x408765, 0x0, 0xc421539768)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/util/flags/flags.go:36 +0x96 fp=0xc421539700 sp=0xc4215396c0
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/server/start.getOpenshiftControllerOptions(0xc421094f90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x5b9c33f, 0x2)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/controllers.go:73 +0x7e fp=0xc421539778 sp=0xc421539700
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/server/start.(*Master).Start.func2(0xf2c4c00, 0xc4215f7d10, 0xf2ece00, 0xc420ad76b0, 0xc4201ce710, 0xc420748d20, 0x41, 0xc420ad6c60, 0xc4216cfe00, 0xc421083140, ...)
Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_master.go:488 +0x23d fp=0xc421539f60 sp=0xc421539778



Version-Release number of selected component (if applicable): 3.7.0-0.188.0


How reproducible: Hit it once out of 2 times running conformance.  Will clean up and see if it is reliably reproducible.


Steps to Reproduce:  see bug 1506375 for full steps

Comment 2 Michal Fojtik 2017-11-01 09:36:17 UTC
Proposed fix: https://github.com/openshift/origin/pull/17127

Comment 3 openshift-github-bot 2017-11-01 19:59:21 UTC
Commits pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/9b2f23d0536eea23544bb7fe9f322e3616624759
Bug 1508061: Fix panic when accessing controller args

https://github.com/openshift/origin/commit/7968b969cc6e33b3e3f40caee6d46f133b83c753
Merge pull request #17127 from mfojtik/fix-initialization-panic

Automatic merge from submit-queue.

Bug 1508061: Fix panic during openshift controller options initialization

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508061

Basically `newKubeControllerManager` mutates the cmdLineArgs in parallel to `getOpenshiftControllerOptions` which is trying to read it.

@deads2k @sttts PTAL, i consider this 3.7 blocker.

Comment 4 Mike Fiedler 2017-11-09 23:51:26 UTC
Verified on 3.7.5.   Ran conformance multiple times with master restarts and no panics observed.

Comment 8 errata-xmlrpc 2017-12-18 13:23:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3464