Created attachment 1346056 [details] system log Description of problem: While trying to the fix for the master-api panic in bug 1506375, hit a different panic in master-controllers. It crashed and restarted. This was again running the conformance test. Full log attached - search on 'fatal error' Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: fatal error: concurrent map iteration and map write Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: goroutine 158 [running]: Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: runtime.throw(0x5c74762, 0x26) Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /usr/lib/golang/src/runtime/panic.go:596 +0x95 fp=0xc4215394a0 sp=0xc421539480 Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: runtime.mapiternext(0xc421539650) Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /usr/lib/golang/src/runtime/hashmap.go:737 +0x7ee fp=0xc421539550 sp=0xc4215394a0 Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/util/flags.Apply(0xc421094f90, 0xc42116a120, 0x3c22f3d, 0x53f59a0, 0xc4214e8220) Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/util/flags/flags.go:17 +0x363 fp=0xc4215396c0 sp=0xc421539550 Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/util/flags.Resolve(0xc421094f90, 0xc4214e8220, 0x408765, 0x0, 0xc421539768) Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/util/flags/flags.go:36 +0x96 fp=0xc421539700 sp=0xc4215396c0 Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/server/start.getOpenshiftControllerOptions(0xc421094f90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x5b9c33f, 0x2) Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/controllers.go:73 +0x7e fp=0xc421539778 sp=0xc421539700 Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: github.com/openshift/origin/pkg/cmd/server/start.(*Master).Start.func2(0xf2c4c00, 0xc4215f7d10, 0xf2ece00, 0xc420ad76b0, 0xc4201ce710, 0xc420748d20, 0x41, 0xc420ad6c60, 0xc4216cfe00, 0xc421083140, ...) Oct 31 14:08:18 ip-172-31-44-71.us-west-2.compute.internal atomic-openshift-master-controllers[10071]: /builddir/build/BUILD/atomic-openshift-git-0.45c2d34/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/start/start_master.go:488 +0x23d fp=0xc421539f60 sp=0xc421539778 Version-Release number of selected component (if applicable): 3.7.0-0.188.0 How reproducible: Hit it once out of 2 times running conformance. Will clean up and see if it is reliably reproducible. Steps to Reproduce: see bug 1506375 for full steps
Proposed fix: https://github.com/openshift/origin/pull/17127
Commits pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/9b2f23d0536eea23544bb7fe9f322e3616624759 Bug 1508061: Fix panic when accessing controller args https://github.com/openshift/origin/commit/7968b969cc6e33b3e3f40caee6d46f133b83c753 Merge pull request #17127 from mfojtik/fix-initialization-panic Automatic merge from submit-queue. Bug 1508061: Fix panic during openshift controller options initialization Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508061 Basically `newKubeControllerManager` mutates the cmdLineArgs in parallel to `getOpenshiftControllerOptions` which is trying to read it. @deads2k @sttts PTAL, i consider this 3.7 blocker.
Verified on 3.7.5. Ran conformance multiple times with master restarts and no panics observed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3464