Bug 1525799

Summary: master-controllers panic and crash repeatedly with "fatal error: concurrent map writes"
Product: OpenShift Container Platform Reporter: Takayoshi Kimura <tkimura>
Component: MasterAssignee: Dan Mace <dmace>
Status: CLOSED ERRATA QA Contact: Wang Haoran <haowang>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.0CC: aos-bugs, byount, chrkim, decarr, jokerman, mfojtik, mmccomas
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-23 17:59:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1519277    
Bug Blocks:    

Description Takayoshi Kimura 2017-12-14 05:03:33 UTC
Description of problem:

master-controllers panic and crash repeatedly with "fatal error: concurrent map writes" during statefulsets processing:

atomic-openshift-master-controllers[122375]: I1214 02:36:04.316381  122375 stateful_set.go:420] Syncing StatefulSet myproject/mypet with 5 pods
atomic-openshift-master-controllers[122375]: I1214 02:36:04.316442  122375 stateful_set_control.go:147] StatefulSet mypet is waiting for Pod mypet-1 to be Running and Ready
atomic-openshift-master-controllers[122375]: I1214 02:36:04.316448  122375 stateful_set.go:425] Succesfully synced StatefulSet myproject/mypet successful
atomic-openshift-master-controllers[122375]: fatal error: concurrent map writes
atomic-openshift-master-controllers[122375]: goroutine 2536 [running]:
atomic-openshift-master-controllers[122375]: runtime.throw(0x511810e, 0x15)
atomic-openshift-master-controllers[122375]: /usr/lib/golang/src/runtime/panic.go:566 +0x95 fp=0xc433a3f100 sp=0xc433a3f0e0
atomic-openshift-master-controllers[122375]: runtime.mapassign1(0x4850280, 0xc4311f2720, 0xc433a3f2e0, 0xc433a3f2d0)
atomic-openshift-master-controllers[122375]: /usr/lib/golang/src/runtime/hashmap.go:458 +0x8ef fp=0xc433a3f1e8 sp=0xc433a3f100
atomic-openshift-master-controllers[122375]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/api/v1.Convert_v1_Pod_To_api_Pod(0xc42d6fa900, 0xc43828db00, 0x0, 0x0, 0x7000100, 0x0)

Version-Release number of selected component (if applicable):

3.6 upgraded from 3.5

How reproducible:

Always in customer env, crash every 2 min

Steps to Reproduce:
1.
2.
3.

Actual results:

crash and generating core dumps, causes disk full

Expected results:

no crash

Additional info:

Comment 14 Michal Fojtik 2017-12-14 10:25:24 UTC
Dan, Tomas found: https://github.com/kubernetes/kubernetes/pull/52092

Any chance we can backport that to 3.6? (if we not already did it)

Comment 15 Dan Mace 2017-12-14 14:02:49 UTC
(In reply to Michal Fojtik from comment #14)
> Dan, Tomas found: https://github.com/kubernetes/kubernetes/pull/52092
> 
> Any chance we can backport that to 3.6? (if we not already did it)

Good news: the 3.6 backport is already under way. See https://bugzilla.redhat.com/show_bug.cgi?id=1519277.

Comment 17 Wang Haoran 2018-01-05 03:33:35 UTC
Verified according to this comments:https://bugzilla.redhat.com/show_bug.cgi?id=1519277#c17

Comment 20 errata-xmlrpc 2018-01-23 17:59:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0113