Bug 1473031 - fatal error: concurrent map read and map write
fatal error: concurrent map read and map write
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing (Show other bugs)
3.6.0
Unspecified Unspecified
unspecified Severity unspecified
: ---
: 3.7.0
Assigned To: Ben Bennett
zhaozhanqi
: NeedsTestCase
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-19 17:50 EDT by Eric Paris
Modified: 2017-11-28 17:04 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Missing locking around a router data structure Consequence: The router pod would (very occasionally) crash and restart Fix: Add the appropriate locking Result: The invalid data access does not crash the router
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-11-28 17:04:10 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Logs with backtrace (38.57 KB, text/plain)
2017-07-19 17:51 EDT, Eric Paris
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Origin (Github) 15385 None None None 2017-07-21 10:42 EDT

  None (edit)
Description Eric Paris 2017-07-19 17:50:12 EDT
I found a router that had a 'restart'.

ose-haproxy-router:v3.6.126.1

 Looked at the logs for the last pod and found:

  - spec.tls.key: Invalid value: "redacted key data": unrecognized PEM block DSA PRIVATE KEY
E0718 14:14:42.201169       1 router_controller.go:311] invalid route configuration
fatal error: concurrent map read and map write
Comment 1 Eric Paris 2017-07-19 17:51 EDT
Created attachment 1301446 [details]
Logs with backtrace
Comment 3 Jordan Liggitt 2017-07-20 12:02:23 EDT
state map is read from outside of a lock on line 770:

	if existingConfig, exists := r.state[backendKey]; exists {
Comment 4 openshift-github-bot 2017-07-22 07:58:47 EDT
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/0b305fba3645f1313b54c30e7890a7a6cf4290f1
Moved locking to protect a read of a map in the router

The locking was not protecting a read, so a simultaneous write would
crash the router.  I made a bunch of new functions that implemented
the functional part of the function without the locking, then made the
locking functions acquire the lock and then call the internal part.
Then in the rename, I moved the lock acquisition earlier and called
the internal functions.

In brief: re-jiggered the code so we could lock properly.

Fixes bug 1473031 (https://bugzilla.redhat.com/show_bug.cgi?id=1473031)
Comment 6 zhaozhanqi 2017-09-27 05:45:49 EDT
verified this bug on v3.7.0-0.127.0

Create route using the following script
*****test.sh*******************
#!/bin/bash
function _create_routes() {
    local name=$1
    echo "  - worker name: ${name} ... "
    sleep 0.0$((RANDOM%3))

    for idx in `seq $((RANDOM%10))`; do
      local route_name="${NAME_PREFIX}-${name}-id-${idx}"
      oc expose service tc-500001 --name="${route_name}"
    done

}  #  End of function  _create_routes.


#
#  main():
#
ntimes=${1:-20}

for i in `seq ${ntimes}`; do
  _create_routes "worker-${i}" &
done

_create_routes "main"

*****************

No above error logs found in the haproxy pod

So verified this bug.  please correct me if the step is not enough. thanks
Comment 9 errata-xmlrpc 2017-11-28 17:04:10 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Note You need to log in before you can comment on or make changes to this bug.