Bug 2003164 - OLM, fatal error: concurrent map writes
Summary: OLM, fatal error: concurrent map writes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Alexander Greene
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks: 2054848
TreeView+ depends on / blocked
 
Reported: 2021-09-10 13:56 UTC by Oscar Casal Sanchez
Modified: 2023-09-18 04:25 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Object from client lister cache was being modified. Consequence: Occasionally update events on cluster would happen at the same time that OLM modified the object from the lister cache, causing concurrent map writes. Fix: OLM was updated so it does not modify objects retrieved from the lister cache and instead modifies a copy of the object where applicable. Result: OLM no longer experiences concurrent map writes.
Clone Of:
Environment:
Last Closed: 2022-03-10 16:09:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift operator-framework-olm pull 242 0 None open Bug 2003164: Do not modify object from the lister cache (#2562) 2022-01-18 20:17:39 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:09:58 UTC

Description Oscar Casal Sanchez 2021-09-10 13:56:28 UTC
### Description of problem

The olm pod was restarted and it's possible to see in the logs, the next error:

~~~
$ oc logs <olm operator> -p > olm_logs
...
fatal error: concurrent map writes

goroutine 577 [running]:
runtime.throw(0x1eb5227, 0x15)
        /usr/lib/golang/src/runtime/panic.go:1116 +0x72 fp=0xc0286dba60 sp=0xc0286dba30 pc=0x43c1f2
runtime.mapassign_faststr(0x1c06100, 0xc01f70d6b0, 0x1ead4ad, 0x10, 0x3067bc0)
        /usr/lib/golang/src/runtime/map_faststr.go:211 +0x3f1 fp=0xc0286dbac8 sp=0xc0286dba60 pc=0x4190f1
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/olm.(*Operator).transitionCSVState(0xc0005cb2c0, 0xc004ea3480, 0x15, 0xc004ea3460, 0x1d, 0xc00d5472f0, 0x21, 0x0, 0x0, 0xc004ea39a0, ...)
        /build/pkg/controller/operators/olm/operator.go:1412 +0x6d46 fp=0xc0286de5e0 sp=0xc0286dbac8 pc=0x19546c6
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/olm.(*Operator).syncClusterServiceVersion(0xc0005cb2c0, 0x1e7e6a0, 0xc00ccfc008, 0xc01b0fd110, 0x1f84690)
        /build/pkg/controller/operators/olm/operator.go:1093 +0x6d8 fp=0xc0286df548 sp=0xc0286de5e0 pc=0x194aa78
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/olm.(*Operator).syncClusterServiceVersion-fm(0x1e7e6a0, 0xc00ccfc008, 0xc00ccfc008, 0x1)
        /build/pkg/controller/operators/olm/operator.go:1066 +0x3e fp=0xc0286df580 sp=0xc0286df548 pc=0x196fede
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.LegacySyncHandler.ToSyncerWithDelete.func1(0x216bda0, 0xc000b92040, 0x2137520, 0xc00d44fb20, 0xc00d44fb20, 0x1d13940)
        /build/pkg/lib/queueinformer/queueinformer.go:183 +0x26e fp=0xc0286df610 sp=0xc0286df580 pc=0x15e112e
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/kubestate.SyncFunc.Sync(0xc0009a2360, 0x216bda0, 0xc000b92040, 0x2137520, 0xc00d44fb20, 0xc01b0fce01, 0x0)
        /build/pkg/lib/kubestate/kubestate.go:184 +0x4e fp=0xc0286df650 sp=0xc0286df610 pc=0x15d91ee
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*QueueInformer).Sync(...)
...
~~~

### Version-Release number of selected component (if applicable):
~~~
$ oc get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version  4.7.24   True       False        3d     Cluster version is 4.7.24
~~~


The complete stack trace of the olm will be linked to the bug.

Comment 12 Alexander Greene 2021-11-01 15:45:20 UTC
@

Comment 20 Jian Zhang 2022-01-19 03:06:40 UTC
1, Install OCP with the fixed PR via the cluster-bot:
mac:~ jianzhang$ oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.ci.test-2022-01-19-022044-ci-ln-f3hfx0k-latest   True        False         2m38s   Cluster version is 4.10.0-0.ci.test-2022-01-19-022044-ci-ln-f3hfx0k-latest

2, Install an Operator, for example, etcd operator, as follows,

mac:~ jianzhang$ oc get sub -A
NAMESPACE             NAME   PACKAGE   SOURCE                CHANNEL
openshift-operators   etcd   etcd      community-operators   clusterwide-alpha

mac:~ jianzhang$ oc get ip -n openshift-operators 
NAME            CSV                               APPROVAL    APPROVED
install-k85n7   etcdoperator.v0.9.4-clusterwide   Automatic   true

mac:~ jianzhang$ oc get csv -n openshift-operators 
NAME                              DISPLAY   VERSION             REPLACES                          PHASE
etcdoperator.v0.9.4-clusterwide   etcd      0.9.4-clusterwide   etcdoperator.v0.9.2-clusterwide   Succeeded

3, the unpack job works well.
mac:~ jianzhang$ oc get job -n openshift-marketplace
NAME                                                              COMPLETIONS   DURATION   AGE
fc2985520e4b70156d718aaadf0ad87196b40b20788981bf3bf6cbe09a3fdcd   1/1           31s        52s

mac:~ jianzhang$ oc get pod -n openshift-marketplace
NAME                                                              READY   STATUS      RESTARTS      AGE
certified-operators-6qrvq                                         1/1     Running     0             25m
community-operators-twn49                                         1/1     Running     0             25m
fc2985520e4b70156d718aaadf0ad87196b40b20788981bf3bf6cbe09a9hhkq   0/1     Completed   0             58s
marketplace-operator-584ddfcb4b-xmz4c                             1/1     Running     1 (17m ago)   28m
redhat-marketplace-nt7br                                          1/1     Running     0             25m
redhat-operators-wxkxg                                            1/1     Running     0             25m

4, The olm-operator works well.
mac:~ jianzhang$ oc get pods -n openshift-operator-lifecycle-manager 
NAME                                      READY   STATUS      RESTARTS      AGE
catalog-operator-69c5fd5cd5-2q7lv         1/1     Running     0             30m
collect-profiles-27376005-zpdv8           0/1     Completed   0             19m
collect-profiles-27376020-b8h25           0/1     Completed   0             5m31s
olm-operator-85454c5f67-bdrg7             1/1     Running     0             30m
package-server-manager-5c47544f69-zdpgq   1/1     Running     1 (19m ago)   30m
packageserver-df9998ff-5j9gk              1/1     Running     0             28m
packageserver-df9998ff-d8jps              1/1     Running     0             28m

Verify it.

Comment 22 Jian Zhang 2022-01-21 08:37:49 UTC
Change the status to VERIFIED based on comment 20.

Comment 26 errata-xmlrpc 2022-03-10 16:09:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Comment 27 Red Hat Bugzilla 2023-09-18 04:25:55 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.