Bug 1473031 - fatal error: concurrent map read and map write
Summary: fatal error: concurrent map read and map write
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.7.0
Assignee: Ben Bennett
QA Contact: zhaozhanqi
Depends On:
TreeView+ depends on / blocked
Reported: 2017-07-19 21:50 UTC by Eric Paris
Modified: 2022-08-04 22:20 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Missing locking around a router data structure Consequence: The router pod would (very occasionally) crash and restart Fix: Add the appropriate locking Result: The invalid data access does not crash the router
Clone Of:
Last Closed: 2017-11-28 22:04:10 UTC
Target Upstream Version:

Attachments (Terms of Use)
Logs with backtrace (38.57 KB, text/plain)
2017-07-19 21:51 UTC, Eric Paris
no flags Details

System ID Private Priority Status Summary Last Updated
Origin (Github) 15385 0 None None None 2017-07-21 14:42:54 UTC
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Eric Paris 2017-07-19 21:50:12 UTC
I found a router that had a 'restart'.


 Looked at the logs for the last pod and found:

  - spec.tls.key: Invalid value: "redacted key data": unrecognized PEM block DSA PRIVATE KEY
E0718 14:14:42.201169       1 router_controller.go:311] invalid route configuration
fatal error: concurrent map read and map write

Comment 1 Eric Paris 2017-07-19 21:51:02 UTC
Created attachment 1301446 [details]
Logs with backtrace

Comment 3 Jordan Liggitt 2017-07-20 16:02:23 UTC
state map is read from outside of a lock on line 770:

	if existingConfig, exists := r.state[backendKey]; exists {

Comment 4 openshift-github-bot 2017-07-22 11:58:47 UTC
Commit pushed to master at https://github.com/openshift/origin

Moved locking to protect a read of a map in the router

The locking was not protecting a read, so a simultaneous write would
crash the router.  I made a bunch of new functions that implemented
the functional part of the function without the locking, then made the
locking functions acquire the lock and then call the internal part.
Then in the rename, I moved the lock acquisition earlier and called
the internal functions.

In brief: re-jiggered the code so we could lock properly.

Fixes bug 1473031 (https://bugzilla.redhat.com/show_bug.cgi?id=1473031)

Comment 6 zhaozhanqi 2017-09-27 09:45:49 UTC
verified this bug on v3.7.0-0.127.0

Create route using the following script
function _create_routes() {
    local name=$1
    echo "  - worker name: ${name} ... "
    sleep 0.0$((RANDOM%3))

    for idx in `seq $((RANDOM%10))`; do
      local route_name="${NAME_PREFIX}-${name}-id-${idx}"
      oc expose service tc-500001 --name="${route_name}"

}  #  End of function  _create_routes.

#  main():

for i in `seq ${ntimes}`; do
  _create_routes "worker-${i}" &

_create_routes "main"


No above error logs found in the haproxy pod

So verified this bug.  please correct me if the step is not enough. thanks

Comment 9 errata-xmlrpc 2017-11-28 22:04:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.