Bug 2003164

Summary: OLM, fatal error: concurrent map writes
Product: OpenShift Container Platform Reporter: Oscar Casal Sanchez <ocasalsa>
Component: OLMAssignee: Alexander Greene <agreene>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: agreene, anbhatta, ddelcian, jarduini, krizza, pescorza, tflannag
Version: 4.7   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Object from client lister cache was being modified. Consequence: Occasionally update events on cluster would happen at the same time that OLM modified the object from the lister cache, causing concurrent map writes. Fix: OLM was updated so it does not modify objects retrieved from the lister cache and instead modifies a copy of the object where applicable. Result: OLM no longer experiences concurrent map writes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:09:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2054848    

Description Oscar Casal Sanchez 2021-09-10 13:56:28 UTC
### Description of problem

The olm pod was restarted and it's possible to see in the logs, the next error:

~~~
$ oc logs <olm operator> -p > olm_logs
...
fatal error: concurrent map writes

goroutine 577 [running]:
runtime.throw(0x1eb5227, 0x15)
        /usr/lib/golang/src/runtime/panic.go:1116 +0x72 fp=0xc0286dba60 sp=0xc0286dba30 pc=0x43c1f2
runtime.mapassign_faststr(0x1c06100, 0xc01f70d6b0, 0x1ead4ad, 0x10, 0x3067bc0)
        /usr/lib/golang/src/runtime/map_faststr.go:211 +0x3f1 fp=0xc0286dbac8 sp=0xc0286dba60 pc=0x4190f1
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/olm.(*Operator).transitionCSVState(0xc0005cb2c0, 0xc004ea3480, 0x15, 0xc004ea3460, 0x1d, 0xc00d5472f0, 0x21, 0x0, 0x0, 0xc004ea39a0, ...)
        /build/pkg/controller/operators/olm/operator.go:1412 +0x6d46 fp=0xc0286de5e0 sp=0xc0286dbac8 pc=0x19546c6
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/olm.(*Operator).syncClusterServiceVersion(0xc0005cb2c0, 0x1e7e6a0, 0xc00ccfc008, 0xc01b0fd110, 0x1f84690)
        /build/pkg/controller/operators/olm/operator.go:1093 +0x6d8 fp=0xc0286df548 sp=0xc0286de5e0 pc=0x194aa78
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/olm.(*Operator).syncClusterServiceVersion-fm(0x1e7e6a0, 0xc00ccfc008, 0xc00ccfc008, 0x1)
        /build/pkg/controller/operators/olm/operator.go:1066 +0x3e fp=0xc0286df580 sp=0xc0286df548 pc=0x196fede
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.LegacySyncHandler.ToSyncerWithDelete.func1(0x216bda0, 0xc000b92040, 0x2137520, 0xc00d44fb20, 0xc00d44fb20, 0x1d13940)
        /build/pkg/lib/queueinformer/queueinformer.go:183 +0x26e fp=0xc0286df610 sp=0xc0286df580 pc=0x15e112e
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/kubestate.SyncFunc.Sync(0xc0009a2360, 0x216bda0, 0xc000b92040, 0x2137520, 0xc00d44fb20, 0xc01b0fce01, 0x0)
        /build/pkg/lib/kubestate/kubestate.go:184 +0x4e fp=0xc0286df650 sp=0xc0286df610 pc=0x15d91ee
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*QueueInformer).Sync(...)
...
~~~

### Version-Release number of selected component (if applicable):
~~~
$ oc get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version  4.7.24   True       False        3d     Cluster version is 4.7.24
~~~


The complete stack trace of the olm will be linked to the bug.

Comment 12 Alexander Greene 2021-11-01 15:45:20 UTC
@

Comment 20 Jian Zhang 2022-01-19 03:06:40 UTC
1, Install OCP with the fixed PR via the cluster-bot:
mac:~ jianzhang$ oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.ci.test-2022-01-19-022044-ci-ln-f3hfx0k-latest   True        False         2m38s   Cluster version is 4.10.0-0.ci.test-2022-01-19-022044-ci-ln-f3hfx0k-latest

2, Install an Operator, for example, etcd operator, as follows,

mac:~ jianzhang$ oc get sub -A
NAMESPACE             NAME   PACKAGE   SOURCE                CHANNEL
openshift-operators   etcd   etcd      community-operators   clusterwide-alpha

mac:~ jianzhang$ oc get ip -n openshift-operators 
NAME            CSV                               APPROVAL    APPROVED
install-k85n7   etcdoperator.v0.9.4-clusterwide   Automatic   true

mac:~ jianzhang$ oc get csv -n openshift-operators 
NAME                              DISPLAY   VERSION             REPLACES                          PHASE
etcdoperator.v0.9.4-clusterwide   etcd      0.9.4-clusterwide   etcdoperator.v0.9.2-clusterwide   Succeeded

3, the unpack job works well.
mac:~ jianzhang$ oc get job -n openshift-marketplace
NAME                                                              COMPLETIONS   DURATION   AGE
fc2985520e4b70156d718aaadf0ad87196b40b20788981bf3bf6cbe09a3fdcd   1/1           31s        52s

mac:~ jianzhang$ oc get pod -n openshift-marketplace
NAME                                                              READY   STATUS      RESTARTS      AGE
certified-operators-6qrvq                                         1/1     Running     0             25m
community-operators-twn49                                         1/1     Running     0             25m
fc2985520e4b70156d718aaadf0ad87196b40b20788981bf3bf6cbe09a9hhkq   0/1     Completed   0             58s
marketplace-operator-584ddfcb4b-xmz4c                             1/1     Running     1 (17m ago)   28m
redhat-marketplace-nt7br                                          1/1     Running     0             25m
redhat-operators-wxkxg                                            1/1     Running     0             25m

4, The olm-operator works well.
mac:~ jianzhang$ oc get pods -n openshift-operator-lifecycle-manager 
NAME                                      READY   STATUS      RESTARTS      AGE
catalog-operator-69c5fd5cd5-2q7lv         1/1     Running     0             30m
collect-profiles-27376005-zpdv8           0/1     Completed   0             19m
collect-profiles-27376020-b8h25           0/1     Completed   0             5m31s
olm-operator-85454c5f67-bdrg7             1/1     Running     0             30m
package-server-manager-5c47544f69-zdpgq   1/1     Running     1 (19m ago)   30m
packageserver-df9998ff-5j9gk              1/1     Running     0             28m
packageserver-df9998ff-d8jps              1/1     Running     0             28m

Verify it.

Comment 22 Jian Zhang 2022-01-21 08:37:49 UTC
Change the status to VERIFIED based on comment 20.

Comment 26 errata-xmlrpc 2022-03-10 16:09:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Comment 27 Red Hat Bugzilla 2023-09-18 04:25:55 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days