Bug 2101357 - catalog-operator fatal error: concurrent map writes
Summary: catalog-operator fatal error: concurrent map writes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.12.0
Assignee: Per da Silva
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks: 2117324
TreeView+ depends on / blocked
 
Reported: 2022-06-27 09:50 UTC by Jian Zhang
Modified: 2023-09-18 04:40 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: An internal issue caused the reconciler to update a resource's annotations without first making a copy of the resource. This caused a fatal concurrent map write error. Consequence: The reconciler process would get killed Fix: Ensured that all resource changes would be make to a copy of the underlying resource. Thus, not causing the concurrent map write error. Result: Reconciler no longer getting killed due to this error
Clone Of:
Environment:
Last Closed: 2023-01-17 19:50:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift operator-framework-olm pull 337 0 None open Bug 2101357: concurrent write bug fix 2022-07-12 14:01:29 UTC
Red Hat Knowledge Base (Solution) 6988724 0 None None None 2022-12-02 15:10:15 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:51:07 UTC

Description Jian Zhang 2022-06-27 09:50:10 UTC
Description of problem:

        message: "her.go:105 +0xe5\ncreated by k8s.io/apimachinery/pkg/watch.NewStreamWatcher\n\t/build/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:76
          +0x130\n\ngoroutine 5545 [select, 7 minutes]:\ngolang.org/x/net/http2.(*clientStream).writeRequest(0xc00240a780,
          0xc003321a00)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1345
          +0x9c9\ngolang.org/x/net/http2.(*clientStream).doRequest(0xc002efea80?,
          0xc0009cc7a0?)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1207
          +0x1e\ncreated by golang.org/x/net/http2.(*ClientConn).RoundTrip\n\t/build/vendor/golang.org/x/net/http2/transport.go:1136
          +0x30a\n\ngoroutine 5678 [select, 3 minutes]:\ngolang.org/x/net/http2.(*clientStream).writeRequest(0xc000b70480,
          0xc0035d4500)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1345
          +0x9c9\ngolang.org/x/net/http2.(*clientStream).doRequest(0x6e5326?, 0xc002999e90?)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1207
          +0x1e\ncreated by golang.org/x/net/http2.(*ClientConn).RoundTrip\n\t/build/vendor/golang.org/x/net/http2/transport.go:1136
          +0x30a\n\ngoroutine 5836 [select, 1 minutes]:\ngolang.org/x/net/http2.(*clientStream).writeRequest(0xc003b00180,
          0xc003ff8a00)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1345
          +0x9c9\ngolang.org/x/net/http2.(*clientStream).doRequest(0x6e5326?, 0xc003a1c8d0?)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1207
          +0x1e\ncreated by golang.org/x/net/http2.(*ClientConn).RoundTrip\n\t/build/vendor/golang.org/x/net/http2/transport.go:1136
          +0x30a\n\ngoroutine 5905 [chan receive, 1 minutes]:\ngithub.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/resolver.(*sourceInvalidator).GetValidChannel.func1()\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/resolver/source_registry.go:51
          +0x85\ncreated by github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/resolver.(*sourceInvalidator).GetValidChannel\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/resolver/source_registry.go:50
          +0x231\n"
        reason: Error
        startedAt: "2022-06-27T00:00:59Z"


Version-Release number of selected component (if applicable):
mac:~ jianzhang$ oc exec catalog-operator-66cb8fd8c5-j7vkx -- olm --version
OLM version: 0.19.0
git commit: 8c2bd46147a90d58e98de73d34fd79477769f11f
mac:~ jianzhang$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-25-081133   True        False         10h     Cluster version is 4.11.0-0.nightly-2022-06-25-081133

How reproducible:
always

Steps to Reproduce:
1. Install OCP 4.11
2. Check OLM pods


Actual results:

mac:~ jianzhang$ oc get pods
NAME                                      READY   STATUS      RESTARTS     AGE
catalog-operator-66cb8fd8c5-j7vkx         1/1     Running     2 (8h ago)   10h
collect-profiles-27605340-wgsvf           0/1     Completed   0            42m
collect-profiles-27605355-ffgxd           0/1     Completed   0            27m
collect-profiles-27605370-w7ds7           0/1     Completed   0            12m
olm-operator-6cfd444b8f-r5q4t             1/1     Running     0            10h
package-server-manager-66589d4bf8-csr7j   1/1     Running     0            10h
packageserver-59977db6cf-nkn5w            1/1     Running     0            10h
packageserver-59977db6cf-nxbnx            1/1     Running     0            10h


mac:~ jianzhang$ oc get pods catalog-operator-66cb8fd8c5-j7vkx  -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.130.0.26"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.130.0.26"
          ],
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: nonroot-v2
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2022-06-26T23:12:45Z"
  generateName: catalog-operator-66cb8fd8c5-
  labels:
    app: catalog-operator
    pod-template-hash: 66cb8fd8c5
  name: catalog-operator-66cb8fd8c5-j7vkx
  namespace: openshift-operator-lifecycle-manager
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: catalog-operator-66cb8fd8c5
    uid: bcf173be-97bc-4152-8cec-d45f820e167c
  resourceVersion: "67395"
  uid: bb34c37b-b22d-4412-bbc0-1e57a1b2bd3a
spec:
  containers:
  - args:
    - --namespace
    - openshift-marketplace
    - --configmapServerImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:033bd57b54bb2e827b0ae95a03cb6f7ceeb65422e85de635185cfed88c17b2b4
    - --opmImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:033bd57b54bb2e827b0ae95a03cb6f7ceeb65422e85de635185cfed88c17b2b4
    - --util-image
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8d42dfe153a5bfa3a66e5e9d4664b4939aed376b4afbd9e098291c259daca2f8
    - --writeStatusName
    - operator-lifecycle-manager-catalog
    - --tls-cert
    - /srv-cert/tls.crt
    - --tls-key
    - /srv-cert/tls.key
    - --client-ca
    - /profile-collector-cert/tls.crt
    command:
    - /bin/catalog
    env:
    - name: RELEASE_VERSION
      value: 4.11.0-0.nightly-2022-06-25-081133
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8d42dfe153a5bfa3a66e5e9d4664b4939aed376b4afbd9e098291c259daca2f8
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 8443
        scheme: HTTPS
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: catalog-operator
    ports:
    - containerPort: 8443
      name: metrics
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 8443
        scheme: HTTPS
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      requests:
        cpu: 10m
        memory: 80Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /srv-cert
      name: srv-cert
      readOnly: true
    - mountPath: /profile-collector-cert
      name: profile-collector-cert
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-7txv8
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-10-0-173-128.ap-southeast-1.compute.internal
  nodeSelector:
    kubernetes.io/os: linux
    node-role.kubernetes.io/master: ""
  preemptionPolicy: PreemptLowerPriority
  priority: 2000000000
  priorityClassName: system-cluster-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    runAsNonRoot: true
    runAsUser: 65534
    seLinuxOptions:
      level: s0:c20,c0
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: olm-operator-serviceaccount
  serviceAccountName: olm-operator-serviceaccount
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 120
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 120
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - name: srv-cert
    secret:
      defaultMode: 420
      secretName: catalog-operator-serving-cert
  - name: profile-collector-cert
    secret:
      defaultMode: 420
      secretName: pprof-cert
  - name: kube-api-access-7txv8
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-06-26T23:16:59Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-06-27T01:19:09Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-06-27T01:19:09Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-06-26T23:16:59Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://f3fce12480556b5ab279236c290acdf15e1bc850426078730dcfd333ecda6795
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8d42dfe153a5bfa3a66e5e9d4664b4939aed376b4afbd9e098291c259daca2f8
    imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8d42dfe153a5bfa3a66e5e9d4664b4939aed376b4afbd9e098291c259daca2f8
    lastState:
      terminated:
        containerID: cri-o://5e3ce04f1cf538f679af008dcb62f9477131aae1772f527d6207fdc7bb247a55
        exitCode: 2
        finishedAt: "2022-06-27T01:19:08Z"
        message: "her.go:105 +0xe5\ncreated by k8s.io/apimachinery/pkg/watch.NewStreamWatcher\n\t/build/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:76
          +0x130\n\ngoroutine 5545 [select, 7 minutes]:\ngolang.org/x/net/http2.(*clientStream).writeRequest(0xc00240a780,
          0xc003321a00)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1345
          +0x9c9\ngolang.org/x/net/http2.(*clientStream).doRequest(0xc002efea80?,
          0xc0009cc7a0?)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1207
          +0x1e\ncreated by golang.org/x/net/http2.(*ClientConn).RoundTrip\n\t/build/vendor/golang.org/x/net/http2/transport.go:1136
          +0x30a\n\ngoroutine 5678 [select, 3 minutes]:\ngolang.org/x/net/http2.(*clientStream).writeRequest(0xc000b70480,
          0xc0035d4500)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1345
          +0x9c9\ngolang.org/x/net/http2.(*clientStream).doRequest(0x6e5326?, 0xc002999e90?)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1207
          +0x1e\ncreated by golang.org/x/net/http2.(*ClientConn).RoundTrip\n\t/build/vendor/golang.org/x/net/http2/transport.go:1136
          +0x30a\n\ngoroutine 5836 [select, 1 minutes]:\ngolang.org/x/net/http2.(*clientStream).writeRequest(0xc003b00180,
          0xc003ff8a00)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1345
          +0x9c9\ngolang.org/x/net/http2.(*clientStream).doRequest(0x6e5326?, 0xc003a1c8d0?)\n\t/build/vendor/golang.org/x/net/http2/transport.go:1207
          +0x1e\ncreated by golang.org/x/net/http2.(*ClientConn).RoundTrip\n\t/build/vendor/golang.org/x/net/http2/transport.go:1136
          +0x30a\n\ngoroutine 5905 [chan receive, 1 minutes]:\ngithub.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/resolver.(*sourceInvalidator).GetValidChannel.func1()\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/resolver/source_registry.go:51
          +0x85\ncreated by github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/resolver.(*sourceInvalidator).GetValidChannel\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/resolver/source_registry.go:50
          +0x231\n"
        reason: Error
        startedAt: "2022-06-27T00:00:59Z"
    name: catalog-operator
    ready: true
    restartCount: 2
    started: true
    state:
      running:
        startedAt: "2022-06-27T01:19:08Z"
  hostIP: 10.0.173.128
  phase: Running
  podIP: 10.130.0.26
  podIPs:
  - ip: 10.130.0.26
  qosClass: Burstable
  startTime: "2022-06-26T23:16:59Z"


Expected results:
catalog-operator works well.

Additional info:
Operators can be subscribed successfully.
mac:~ jianzhang$ oc get sub -A
NAMESPACE                    NAME                     PACKAGE                  SOURCE            CHANNEL
jian                         learn                    learn                    qe-app-registry   beta
openshift-logging            cluster-logging          cluster-logging          qe-app-registry   stable
openshift-operators-redhat   elasticsearch-operator   elasticsearch-operator   qe-app-registry   stable
mac:~ jianzhang$ 
mac:~ jianzhang$ oc get pods -n jian
NAME                                                              READY   STATUS      RESTARTS   AGE
552b4660850a7fe1e1f142091eb5e4305f18af151727c56f70aa5dffc1dg8cg   0/1     Completed   0          54m
learn-operator-666b687bfb-7qppm                                   1/1     Running     0          54m
qe-app-registry-hbzxg                                             1/1     Running     0          58m
mac:~ jianzhang$ oc get csv -n jian
NAME                            DISPLAY                            VERSION   REPLACES                PHASE
elasticsearch-operator.v5.5.0   OpenShift Elasticsearch Operator   5.5.0                             Succeeded
learn-operator.v0.0.3           Learn Operator                     0.0.3     learn-operator.v0.0.2   Succeeded

Comment 3 Jian Zhang 2022-06-28 02:05:39 UTC
mac:~ jianzhang$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-25-132614   True        False         145m    Cluster version is 4.11.0-0.nightly-2022-06-25-132614

mac:~ jianzhang$ oc get pods
NAME                                      READY   STATUS      RESTARTS        AGE
catalog-operator-869fb4bd4d-lbhgj         1/1     Running     3 (9m25s ago)   170m
collect-profiles-27606330-4wg5r           0/1     Completed   0               33m
collect-profiles-27606345-lmk4q           0/1     Completed   0               18m
collect-profiles-27606360-mksv6           0/1     Completed   0               3m17s
olm-operator-5f485d9d5f-wczjc             1/1     Running     0               170m
package-server-manager-6cf996b4cc-79lrw   1/1     Running     2 (156m ago)    170m
packageserver-5f668f98d7-2vjdn            1/1     Running     0               165m
packageserver-5f668f98d7-mb2wc            1/1     Running     0               165m
mac:~ jianzhang$ oc get pods catalog-operator-869fb4bd4d-lbhgj -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.130.0.34"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.130.0.34"
          ],
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: nonroot-v2
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2022-06-27T23:13:12Z"
  generateName: catalog-operator-869fb4bd4d-
  labels:
    app: catalog-operator
    pod-template-hash: 869fb4bd4d
  name: catalog-operator-869fb4bd4d-lbhgj
  namespace: openshift-operator-lifecycle-manager
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: catalog-operator-869fb4bd4d
    uid: 3a1a8cd3-2151-4650-a96b-c1951d461c67
  resourceVersion: "75671"
  uid: 2f06d663-0697-4e88-9e9a-f802c6641efe
spec:
  containers:
  - args:
    - --namespace
    - openshift-marketplace
    - --configmapServerImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:40552deca32531d2fe754f703613f24b514e27ffaa660b57d760f9cd984d9000
    - --opmImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:40552deca32531d2fe754f703613f24b514e27ffaa660b57d760f9cd984d9000
    - --util-image
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    - --writeStatusName
    - operator-lifecycle-manager-catalog
    - --tls-cert
    - /srv-cert/tls.crt
    - --tls-key
    - /srv-cert/tls.key
    - --client-ca
    - /profile-collector-cert/tls.crt
    command:
    - /bin/catalog
    env:
    - name: RELEASE_VERSION
      value: 4.11.0-0.nightly-2022-06-25-132614
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 8443
        scheme: HTTPS
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: catalog-operator
    ports:
    - containerPort: 8443
      name: metrics
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 8443
        scheme: HTTPS
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      requests:
        cpu: 10m
        memory: 80Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /srv-cert
      name: srv-cert
      readOnly: true
    - mountPath: /profile-collector-cert
      name: profile-collector-cert
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-sdjns
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-10-0-190-130.ap-south-1.compute.internal
  nodeSelector:
    kubernetes.io/os: linux
    node-role.kubernetes.io/master: ""
  preemptionPolicy: PreemptLowerPriority
  priority: 2000000000
  priorityClassName: system-cluster-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    runAsNonRoot: true
    runAsUser: 65534
    seLinuxOptions:
      level: s0:c20,c0
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: olm-operator-serviceaccount
  serviceAccountName: olm-operator-serviceaccount
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 120
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 120
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - name: srv-cert
    secret:
      defaultMode: 420
      secretName: catalog-operator-serving-cert
  - name: profile-collector-cert
    secret:
      defaultMode: 420
      secretName: pprof-cert
  - name: kube-api-access-sdjns
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-06-27T23:16:47Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-06-28T01:53:53Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-06-28T01:53:53Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-06-27T23:16:47Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://2a83d3ba503aa760d27aeef28bf03af84b9ff134ea76c6545251f3253507c22b
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    lastState:
      terminated:
        containerID: cri-o://2766bb3bad38c02f531915aac0321ffb18c7a993a7d84b27e8c7199935f7c858
        exitCode: 2
        finishedAt: "2022-06-28T01:53:52Z"
        message: "izer/streaming/streaming.go:77 +0xa7\nk8s.io/client-go/rest/watch.(*Decoder).Decode(0xc002b12500)\n\t/build/vendor/k8s.io/client-go/rest/watch/decoder.go:49
          +0x4f\nk8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive(0xc0036178c0)\n\t/build/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:105
          +0xe5\ncreated by k8s.io/apimachinery/pkg/watch.NewStreamWatcher\n\t/build/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:76
          +0x130\n\ngoroutine 3479 [sync.Cond.Wait]:\nsync.runtime_notifyListWait(0xc003a987c8,
          0x8)\n\t/usr/lib/golang/src/runtime/sema.go:513 +0x13d\nsync.(*Cond).Wait(0xc001f9bf10?)\n\t/usr/lib/golang/src/sync/cond.go:56
          +0x8c\ngolang.org/x/net/http2.(*pipe).Read(0xc003a987b0, {0xc0036e4001,
          0xdff, 0xdff})\n\t/build/vendor/golang.org/x/net/http2/pipe.go:76 +0xeb\ngolang.org/x/net/http2.transportResponseBody.Read({0x0?},
          {0xc0036e4001?, 0x2?, 0x203511d?})\n\t/build/vendor/golang.org/x/net/http2/transport.go:2407
          +0x85\nencoding/json.(*Decoder).refill(0xc002fc0640)\n\t/usr/lib/golang/src/encoding/json/stream.go:165
          +0x17f\nencoding/json.(*Decoder).readValue(0xc002fc0640)\n\t/usr/lib/golang/src/encoding/json/stream.go:140
          +0xbb\nencoding/json.(*Decoder).Decode(0xc002fc0640, {0x1d377c0, 0xc003523098})\n\t/usr/lib/golang/src/encoding/json/stream.go:63
          +0x78\nk8s.io/apimachinery/pkg/util/framer.(*jsonFrameReader).Read(0xc003127770,
          {0xc0036dd500, 0x1000, 0x1500})\n\t/build/vendor/k8s.io/apimachinery/pkg/util/framer/framer.go:152
          +0x19c\nk8s.io/apimachinery/pkg/runtime/serializer/streaming.(*decoder).Decode(0xc003502aa0,
          0xc001f9bf10?, {0x2366870, 0xc0044dca80})\n\t/build/vendor/k8s.io/apimachinery/pkg/runtime/serializer/streaming/streaming.go:77
          +0xa7\nk8s.io/client-go/rest/watch.(*Decoder).Decode(0xc00059f700)\n\t/build/vendor/k8s.io/client-go/rest/watch/decoder.go:49
          +0x4f\nk8s.io/apimachinery/pkg/watch.(*StreamWatcher).receive(0xc0044dcd40)\n\t/build/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:105
          +0xe5\ncreated by k8s.io/apimachinery/pkg/watch.NewStreamWatcher\n\t/build/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:76
          +0x130\n"
        reason: Error
        startedAt: "2022-06-28T01:06:59Z"
    name: catalog-operator
    ready: true
    restartCount: 3
    started: true
    state:
      running:
        startedAt: "2022-06-28T01:53:53Z"
  hostIP: 10.0.190.130
  phase: Running
  podIP: 10.130.0.34
  podIPs:
  - ip: 10.130.0.34
  qosClass: Burstable
  startTime: "2022-06-27T23:16:47Z"

Comment 4 Jian Zhang 2022-06-28 02:09:23 UTC
mac:~ jianzhang$ oc get pods package-server-manager-6cf996b4cc-79lrw -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.130.0.13"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.130.0.13"
          ],
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: nonroot-v2
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2022-06-27T23:13:10Z"
  generateName: package-server-manager-6cf996b4cc-
  labels:
    app: package-server-manager
    pod-template-hash: 6cf996b4cc
  name: package-server-manager-6cf996b4cc-79lrw
  namespace: openshift-operator-lifecycle-manager
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: package-server-manager-6cf996b4cc
    uid: 191cb627-a1f3-452b-aed2-336de7871004
  resourceVersion: "19056"
  uid: 63cb7200-32d6-4b0d-bfb2-ba725cda7fbd
spec:
  containers:
  - args:
    - --name
    - $(PACKAGESERVER_NAME)
    - --namespace
    - $(PACKAGESERVER_NAMESPACE)
    command:
    - /bin/psm
    - start
    env:
    - name: PACKAGESERVER_NAME
      value: packageserver
    - name: PACKAGESERVER_IMAGE
      value: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    - name: PACKAGESERVER_NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: RELEASE_VERSION
      value: 4.11.0-0.nightly-2022-06-25-132614
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: package-server-manager
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 30
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      requests:
        cpu: 10m
        memory: 50Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-k9hh2
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-10-0-190-130.ap-south-1.compute.internal
  nodeSelector:
    kubernetes.io/os: linux
    node-role.kubern

Comment 5 Jian Zhang 2022-06-28 02:10:02 UTC
preemptionPolicy: PreemptLowerPriority
  priority: 2000000000
  priorityClassName: system-cluster-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    runAsNonRoot: true
    runAsUser: 65534
    seLinuxOptions:
      level: s0:c20,c0
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: olm-operator-serviceaccount
  serviceAccountName: olm-operator-serviceaccount
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 120
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 120
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - name: kube-api-access-k9hh2
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-06-27T23:16:47Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-06-27T23:27:28Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-06-27T23:27:28Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-06-27T23:16:47Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://2c56089fabbaa6d7067feb231750ad20422e299dc33dabc0cd19ceca5951ed3a
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    lastState:
      terminated:
        containerID: cri-o://76cc5448de346ec636070a71868432c8f3aa213e16fe211c8fe18a3fee112d23
        exitCode: 1
        finishedAt: "2022-06-27T23:26:36Z"
        message: "d/vendor/github.com/spf13/cobra/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/build/vendor/github.com/spf13/cobra/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/build/vendor/github.com/spf13/cobra/command.go:902\nmain.main\n\t/build/cmd/package-server-manager/main.go:36\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250\n1.6563723963629913e+09\tERROR\tFailed
          to get API Group-Resources\t{\"error\": \"Get \\\"https://172.30.0.1:443/api?timeout=32s\\\":
          dial tcp 172.30.0.1:443: connect: connection refused\"}\nsigs.k8s.io/controller-runtime/pkg/cluster.New\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/cluster/cluster.go:160\nsigs.k8s.io/controller-runtime/pkg/manager.New\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/manager/manager.go:322\nmain.run\n\t/build/cmd/package-server-manager/main.go:67\ngithub.com/spf13/cobra.(*Command).execute\n\t/build/vendor/github.com/spf13/cobra/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/build/vendor/github.com/spf13/cobra/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/build/vendor/github.com/spf13/cobra/command.go:902\nmain.main\n\t/build/cmd/package-server-manager/main.go:36\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250\n1.6563723963631017e+09\tERROR\tsetup\tfailed
          to setup manager instance\t{\"error\": \"Get \\\"https://172.30.0.1:443/api?timeout=32s\\\":
          dial tcp 172.30.0.1:443: connect: connection refused\"}\ngithub.com/spf13/cobra.(*Command).execute\n\t/build/vendor/github.com/spf13/cobra/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/build/vendor/github.com/spf13/cobra/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/build/vendor/github.com/spf13/cobra/command.go:902\nmain.main\n\t/build/cmd/package-server-manager/main.go:36\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250\nError:
          Get \"https://172.30.0.1:443/api?timeout=32s\": dial tcp 172.30.0.1:443:
          connect: connection refused\nencountered an error while executing the binary:
          Get \"https://172.30.0.1:443/api?timeout=32s\": dial tcp 172.30.0.1:443:
          connect: connection refused\n"
        reason: Error
        startedAt: "2022-06-27T23:26:11Z"
    name: package-server-manager
    ready: true
    restartCount: 2
    started: true
    state:
      running:
        startedAt: "2022-06-27T23:26:54Z"
  hostIP: 10.0.190.130
  phase: Running
  podIP: 10.130.0.13
  podIPs:
  - ip: 10.130.0.13
  qosClass: Burstable
  startTime: "2022-06-27T23:16:47Z"

Comment 6 Jian Zhang 2022-06-29 08:43:51 UTC
Observed the error restarts:

mac:~ jianzhang$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-28-160049   True        False         5h57m   Cluster version is 4.11.0-0.nightly-2022-06-28-160049

mac:~ jianzhang$ oc get pods
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-7b88dddfbc-rsfhz        1/1     Running     6 (26m ago)   5h51m
collect-profiles-27608160-6m7r6          0/1     Completed   0             37m
collect-profiles-27608175-94n56          0/1     Completed   0             22m
collect-profiles-27608190-nbzcf          0/1     Completed   0             7m55s
olm-operator-5977ffb855-lgfn8            1/1     Running     0             9h
package-server-manager-75db6dcfc-hql4v   1/1     Running     0             9h
packageserver-5955fb79cd-9n56n           1/1     Running     0             9h
packageserver-5955fb79cd-xf6f6           1/1     Running     0             9h

mac:~ jianzhang$ oc get pods catalog-operator-7b88dddfbc-rsfhz -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.130.0.121"
          ],
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.130.0.121"
          ],
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: nonroot-v2
    seccomp.security.alpha.kubernetes.io/pod: runtime/default
  creationTimestamp: "2022-06-29T02:46:23Z"
  generateName: catalog-operator-7b88dddfbc-
  labels:
    app: catalog-operator
    pod-template-hash: 7b88dddfbc
  name: catalog-operator-7b88dddfbc-rsfhz
  namespace: openshift-operator-lifecycle-manager
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: catalog-operator-7b88dddfbc
    uid: 4c796f4d-b2f2-4e0d-b2de-6e8ab5f45ce2
  resourceVersion: "278732"
  uid: 10e8c6e9-8f1c-4484-b934-c5208016c426
spec:
  containers:
  - args:
    - --debug
    - --namespace
    - openshift-marketplace
    - --configmapServerImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:40552deca32531d2fe754f703613f24b514e27ffaa660b57d760f9cd984d9000
    - --opmImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:40552deca32531d2fe754f703613f24b514e27ffaa660b57d760f9cd984d9000
    - --util-image
    - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    - --writeStatusName
    - operator-lifecycle-manager-catalog
    - --tls-cert
    - /srv-cert/tls.crt
    - --tls-key
    - /srv-cert/tls.key
    - --client-ca
    - /profile-collector-cert/tls.crt
    command:
    - /bin/catalog
    env:
    - name: RELEASE_VERSION
      value: 4.11.0-0.nightly-2022-06-28-160049
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 8443
        scheme: HTTPS
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: catalog-operator
    ports:
    - containerPort: 8443
      name: metrics
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /healthz
        port: 8443
        scheme: HTTPS
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      requests:
        cpu: 10m
        memory: 80Mi
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /srv-cert
      name: srv-cert
      readOnly: true
    - mountPath: /profile-collector-cert
      name: profile-collector-cert
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-cv642
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: olm-operator-serviceaccount-dockercfg-f26bf
  nodeName: ip-10-0-130-83.ap-south-1.compute.internal
  nodeSelector:
    kubernetes.io/os: linux
    node-role.kubernetes.io/master: ""
  preemptionPolicy: PreemptLowerPriority
  priority: 2000000000
  priorityClassName: system-cluster-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    runAsNonRoot: true
    runAsUser: 65534
    seLinuxOptions:
      level: s0:c20,c0
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: olm-operator-serviceaccount
  serviceAccountName: olm-operator-serviceaccount
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 120
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 120
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - name: srv-cert
    secret:
      defaultMode: 420
      secretName: catalog-operator-serving-cert
  - name: profile-collector-cert
    secret:
      defaultMode: 420
      secretName: pprof-cert
  - name: kube-api-access-cv642
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
      - configMap:
          items:
          - key: service-ca.crt
            path: service-ca.crt
          name: openshift-service-ca.crt
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-06-29T02:46:23Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-06-29T08:11:56Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-06-29T08:11:56Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-06-29T02:46:23Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://64a99805cca11e9af66452d739a91259566a35a9e580bfac81dd7205f9272e18
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ac69cd6ac451019bc6060627a1cceb475d2b6b521803addb31b47e501d5e5935
    lastState:
      terminated:
        containerID: cri-o://938e2eb90465caf8ac99ff4405cfbb9a0f03b16598474f944cca44c3e70af9df
        exitCode: 2
        finishedAt: "2022-06-29T08:11:54Z"
        message: "ub.com/operator-framework/operator-lifecycle-manager/pkg/lib/kubestate/kubestate.go:128
          +0xc3 fp=0xc003c11180 sp=0xc003c11118 pc=0x1a36603\ngithub.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*subscriptionSyncer).Sync(0xc000a2f420,
          {0x237a238, 0xc0004ce4c0}, {0x236a448, 0xc004993ce0})\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription/syncer.go:77
          +0x535 fp=0xc003c11648 sp=0xc003c11180 pc=0x1a4f535\ngithub.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*QueueInformer).Sync(...)\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer.go:35\ngithub.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).processNextWorkItem(0xc0001e3ad0,
          {0x237a238, 0xc0004ce4c0}, 0xc000cee360)\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:287
          +0x57c fp=0xc003c11f70 sp=0xc003c11648 pc=0x1a3ca7c\ngithub.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).worker(0x10000c0008fd6e0?,
          {0x237a238, 0xc0004ce4c0}, 0xc0004837b8?)\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:231
          +0x45 fp=0xc003c11fb0 sp=0xc003c11f70 pc=0x1a3c4a5\ngithub.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).start.func3()\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:221
          +0x32 fp=0xc003c11fe0 sp=0xc003c11fb0 pc=0x1a3c152\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1571
          +0x1 fp=0xc003c11fe8 sp=0xc003c11fe0 pc=0x4719c1\ncreated by github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).start\n\t/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:221
          +0x557\n"
        reason: Error
        startedAt: "2022-06-29T07:56:16Z"
    name: catalog-operator
    ready: true
    restartCount: 6
    started: true
    state:
      running:
        startedAt: "2022-06-29T08:11:55Z"
  hostIP: 10.0.130.83
  phase: Running
  podIP: 10.130.0.121
  podIPs:
  - ip: 10.130.0.121
  qosClass: Burstable
  startTime: "2022-06-29T02:46:23Z"

Comment 7 Jian Zhang 2022-07-04 03:50:38 UTC
Please ignore comment 4, 5, they are nothing with this issue.

Comment 8 Jian Zhang 2022-07-04 06:57:24 UTC
Check the `previous` log.

mac:~ jianzhang$ oc logs catalog-operator-f8ddcb57b-j5rf2 --previous
time="2022-07-03T23:49:00Z" level=info msg="log level info"
...
...
time="2022-07-04T03:43:25Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
time="2022-07-04T03:43:25Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
fatal error: concurrent map writes
fatal error: concurrent map writes

goroutine 559 [running]:
runtime.throw({0x2048546?, 0x0?})
	/usr/lib/golang/src/runtime/panic.go:992 +0x71 fp=0xc001f9c508 sp=0xc001f9c4d8 pc=0x43e9f1
runtime.mapassign_faststr(0x1d09880, 0xc0031847b0, {0x2079799, 0x2e})
	/usr/lib/golang/src/runtime/map_faststr.go:295 +0x38b fp=0xc001f9c570 sp=0xc001f9c508 pc=0x419b4b
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler.Pod(0xc001f4a900, {0x203dd0f, 0xf}, {0xc00132ccc0, 0x38}, {0xc003582d50, 0x13}, 0xc00452c1e0, 0xc0031847b0, 0x5, ...)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler/reconciler.go:227 +0xbdf fp=0xc001f9cbb0 sp=0xc001f9c570 pc=0x1a475ff
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler.(*grpcCatalogSourceDecorator).Pod(0xc001f9ce50, {0xc003582d50, 0x13})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler/grpc.go:125 +0xf9 fp=0xc001f9cc30 sp=0xc001f9cbb0 pc=0x1a42c99
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler.(*GrpcRegistryReconciler).currentPodsWithCorrectImageAndSpec(0xc001f9ce68?, {0xc001f4a900}, {0xc003582d50, 0x13})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler/grpc.go:190 +0x198 fp=0xc001f9ce48 sp=0xc001f9cc30 pc=0x1a437b8
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler.(*GrpcRegistryReconciler).CheckRegistryServer(0xc000bcbf80?, 0x493b77?)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler/grpc.go:453 +0x4c fp=0xc001f9ce88 sp=0xc001f9ce48 pc=0x1a45fcc
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).healthy(0x38ca8453?, 0xc001f4a900)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription/reconciler.go:196 +0x7e fp=0xc001f9ced0 sp=0xc001f9ce88 pc=0x1a4ae1e
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).health(0x1bc37c0?, 0xc003e7e7e0, 0x8?)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription/reconciler.go:159 +0x2a fp=0xc001f9cf10 sp=0xc001f9ced0 pc=0x1a4ac8a
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).catalogHealth(0xc000a59a90, {0xc003356a50, 0x11})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription/reconciler.go:137 +0x387 fp=0xc001f9d040 sp=0xc001f9cf10 pc=0x1a4a827
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).Reconcile(0xc000a59a90, {0x237a238, 0xc0003fe5c0}, {0x7f9f6e5b3900?, 0xc0050f64f0?})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription/reconciler.go:82 +0x21e fp=0xc001f9d118 sp=0xc001f9d040 pc=0x1a4a1be
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/kubestate.ReconcilerChain.Reconcile({0xc0008e43c0?, 0x3, 0xc001f9d258?}, {0x237a238, 0xc0003fe5c0}, {0x7f9f6e5b3328?, 0xc0050f6490?})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/kubestate/kubestate.go:128 +0xc3 fp=0xc001f9d180 sp=0xc001f9d118 pc=0x1a36603
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*subscriptionSyncer).Sync(0xc0004dfd50, {0x237a238, 0xc0003fe5c0}, {0x236a448, 0xc003cbb760})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription/syncer.go:77 +0x535 fp=0xc001f9d648 sp=0xc001f9d180 pc=0x1a4f535
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*QueueInformer).Sync(...)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer.go:35
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).processNextWorkItem(0xc00057a580, {0x237a238, 0xc0003fe5c0}, 0xc000954720)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:287 +0x57c fp=0xc001f9df70 sp=0xc001f9d648 pc=0x1a3ca7c
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).worker(0x0?, {0x237a238, 0xc0003fe5c0}, 0x0?)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:231 +0x45 fp=0xc001f9dfb0 sp=0xc001f9df70 pc=0x1a3c4a5
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).start.func3()
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:221 +0x32 fp=0xc001f9dfe0 sp=0xc001f9dfb0 pc=0x1a3c152
runtime.goexit()
	/usr/lib/golang/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc001f9dfe8 sp=0xc001f9dfe0 pc=0x4719c1
created by github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).start
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:221 +0x557


Seems like it failed at: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/registry/reconciler/reconciler.go#L227

Comment 9 Alexander Greene 2022-07-05 16:01:22 UTC
As Jian pointed out, the catalog operator is failing due to a concurrent write at https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/registry/reconciler/reconciler.go#L227.


This is happening because:
- The grpcCatalogSourceDecorator includes a pointer to a catalogSource: https://github.com/operator-framework/operator-lifecycle-manager/blob/0b7970c3d609e1f483a0ba68f599a5ffd2e139b5/pkg/controller/registry/reconciler/grpc.go#L32-L35
- The grpcCatalogSourceDecorator's  Annotations function returns the catalogSource's annotations: https://github.com/operator-framework/operator-lifecycle-manager/blob/0b7970c3d609e1f483a0ba68f599a5ffd2e139b5/pkg/controller/registry/reconciler/grpc.go#L66
- The grpcCatalogSourceDecorator's Pod function uses the annotations returned from it's Annotations function: https://github.com/operator-framework/operator-lifecycle-manager/blob/0b7970c3d609e1f483a0ba68f599a5ffd2e139b5/pkg/controller/registry/reconciler/grpc.go#L125
- These catalogSource's annotations are updated within the Pod function here: https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/registry/reconciler/reconciler.go#L227

Line 227 in the reconciler.go directly mutates the catalogSource's annotations. The grpcCatalogSourceDecorator's Annotations function should be returning a copy of the annotations or it should be created with a deepcopy of the catalogSource to avoid mutating an object in the lister cache.

This doesn't seem to be a blocker, but we should get a fix in swiftly.

Comment 10 Jian Zhang 2022-07-13 05:02:04 UTC
1, Create a cluster with the fixed PR via the Cluster-bot.
mac:~ jianzhang$ oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.ci.test-2022-07-13-022646-ci-ln-41fvni2-latest   True        False         126m    Cluster version is 4.11.0-0.ci.test-2022-07-13-022646-ci-ln-41fvni2-latest

2, Subscribe some operators.
mac:~ jianzhang$ oc get sub -A
NAMESPACE                    NAME                     PACKAGE                  SOURCE                CHANNEL
default                      etcd                     etcd                     community-operators   singlenamespace-alpha
openshift-logging            cluster-logging          cluster-logging          redhat-operators      stable
openshift-operators-redhat   elasticsearch-operator   elasticsearch-operator   redhat-operators      stable

mac:~ jianzhang$ oc get sub -A
NAMESPACE                    NAME                     PACKAGE                  SOURCE                CHANNEL
default                      etcd                     etcd                     community-operators   singlenamespace-alpha
openshift-logging            cluster-logging          cluster-logging          redhat-operators      stable
openshift-operators-redhat   elasticsearch-operator   elasticsearch-operator   redhat-operators      stable
mac:~ jianzhang$ 
mac:~ jianzhang$ 
mac:~ jianzhang$ oc get csv -n openshift-operators-redhat
NAME                           DISPLAY                            VERSION   REPLACES   PHASE
elasticsearch-operator.5.4.2   OpenShift Elasticsearch Operator   5.4.2                Succeeded
mac:~ jianzhang$ oc get csv -n openshift-logging 
NAME                           DISPLAY                            VERSION   REPLACES   PHASE
cluster-logging.5.4.2          Red Hat OpenShift Logging          5.4.2                Succeeded
elasticsearch-operator.5.4.2   OpenShift Elasticsearch Operator   5.4.2                Succeeded
mac:~ jianzhang$ oc get csv -n default
NAME                           DISPLAY                            VERSION   REPLACES              PHASE
elasticsearch-operator.5.4.2   OpenShift Elasticsearch Operator   5.4.2                           Succeeded
etcdoperator.v0.9.4            etcd                               0.9.4     etcdoperator.v0.9.2   Succeeded

3, Check OLM catalog-operator pods status.
mac:~ jianzhang$ oc get pods
NAME                                      READY   STATUS      RESTARTS   AGE
catalog-operator-546db7cdf5-7pldg         1/1     Running     0          145m
collect-profiles-27628110-lr2nv           0/1     Completed   0          30m
collect-profiles-27628125-br8b8           0/1     Completed   0          15m
collect-profiles-27628140-m64gp           0/1     Completed   0          38s
olm-operator-754d7f6f56-26qhw             1/1     Running     0          145m
package-server-manager-77d5cbf696-v9w4p   1/1     Running     0          145m
packageserver-6884994d98-2smtw            1/1     Running     0          143m
packageserver-6884994d98-5d7jg            1/1     Running     0          143m


mac:~ jianzhang$ oc logs catalog-operator-546db7cdf5-7pldg  --previous
Error from server (BadRequest): previous terminated container "catalog-operator" in pod "catalog-operator-546db7cdf5-7pldg" not found

No terminated container. catalog-operator works well. Verify it.

Comment 12 Jian Zhang 2022-07-18 07:23:08 UTC
Changed the status to VERIFIED based on comment 10.

Comment 14 Andreas Bleischwitz 2022-11-18 16:31:29 UTC
The issue seem to affect OpenShift 4.10.39 as well:

% oc get logs catalog-operator-6479d65774-znsnk...
time="2022-11-18T14:44:20Z" level=info msg="log level info"
time="2022-11-18T14:44:20Z" level=info msg="TLS keys set, using https for metrics"
time="2022-11-18T14:44:28Z" level=info msg="connection established. cluster-version: v1.23.12+7566c4d"
...
time="2022-11-18T16:02:39Z" level=info msg=syncing event=update reconciling="*v1alpha1.Subscription" selflink=
fatal error: concurrent map writes

goroutine 746 [running]:
runtime.throw({0x1ee9827, 0xc00e2a85a8})
	/usr/lib/golang/src/runtime/panic.go:1198 +0x71 fp=0xc00e2a8550 sp=0xc00e2a8520 pc=0x43bcd1
runtime.mapassign_faststr(0x1be0e40, 0xc004351200, {0x1f17e43, 0x2e})
	/usr/lib/golang/src/runtime/map_faststr.go:211 +0x39c fp=0xc00e2a85b8 sp=0xc00e2a8550 pc=0x41949c
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler.Pod(0xc009b1a480, {0x1edf878, 0xf}, {0xc0172e1e40, 0x1}, {0xc018a283a8, 0x13}, 0xc004351200, 0xc014ebc0f0, 0x5, ...)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler/reconciler.go:227 +0xbbf fp=0xc00e2a8bf0 sp=0xc00e2a85b8 pc=0x195e1ff
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler.(*grpcCatalogSourceDecorator).Pod(0xc00e2a8e90, {0xc018a283a8, 0x13})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler/grpc.go:125 +0xf9 fp=0xc00e2a8c70 sp=0xc00e2a8bf0 pc=0x1959919
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler.(*GrpcRegistryReconciler).currentPodsWithCorrectImageAndSpec(0xc00e2a8ea8, {0xc009b1a480}, {0xc018a283a8, 0x13})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler/grpc.go:190 +0x1b1 fp=0xc00e2a8e88 sp=0xc00e2a8c70 pc=0x195a491
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler.(*GrpcRegistryReconciler).CheckRegistryServer(0xc00119f980, 0xc0163d6780)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/registry/reconciler/grpc.go:453 +0x4c fp=0xc00e2a8ec8 sp=0xc00e2a8e88 pc=0x195cb2c
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).healthy(0x3719777f, 0xc009b1a480)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription/reconciler.go:196 +0x7e fp=0xc00e2a8f10 sp=0xc00e2a8ec8 pc=0x19619be
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).health(0x1ab0320, 0xc00e7890e0, 0x4)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription/reconciler.go:159 +0x2a fp=0xc00e2a8f50 sp=0xc00e2a8f10 pc=0x196182a
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).catalogHealth(0xc0005f5860, {0xc000467b18, 0xc001c6bad0})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription/reconciler.go:137 +0x372 fp=0xc00e2a9060 sp=0xc00e2a8f50 pc=0x19614f2
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*catalogHealthReconciler).Reconcile(0xc0005f5860, {0x21b7b70, 0xc0009c33c0}, {0x7fb4bc391e88, 0xc001c6b940})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription/reconciler.go:82 +0x20e fp=0xc00e2a9128 sp=0xc00e2a9060 pc=0x1960ece
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/kubestate.ReconcilerChain.Reconcile({0xc0005ade40, 0x3, 0xc001c6b900}, {0x21b7b70, 0xc0009c33c0}, {0x7fb4bc3918b0, 0xc001c6b900})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/kubestate/kubestate.go:128 +0xc3 fp=0xc00e2a9190 sp=0xc00e2a9128 pc=0x194cfe3
github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription.(*subscriptionSyncer).Sync(0xc0007ba620, {0x21b7b70, 0xc0009c33c0}, {0x21a1318, 0xc015a37de0})
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators/catalog/subscription/syncer.go:77 +0x532 fp=0xc00e2a9660 sp=0xc00e2a9190 pc=0x1964992
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*QueueInformer).Sync(...)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer.go:35
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).processNextWorkItem(0xc000022160, {0x21b7b70, 0xc0009c33c0}, 0xc00121b140)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:287 +0x895 fp=0xc00e2a9f70 sp=0xc00e2a9660 pc=0x1953915
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).worker(0xc0003c84e0, {0x21b7b70, 0xc0009c33c0}, 0x0)
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:231 +0x45 fp=0xc00e2a9fb0 sp=0xc00e2a9f70 pc=0x1953025
github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).start·dwrap·7()
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:221 +0x32 fp=0xc00e2a9fe0 sp=0xc00e2a9fb0 pc=0x1952cd2
runtime.goexit()
	/usr/lib/golang/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc00e2a9fe8 sp=0xc00e2a9fe0 pc=0x46f181
created by github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer.(*operator).start
	/build/vendor/github.com/operator-framework/operator-lifecycle-manager/pkg/lib/queueinformer/queueinformer_operator.go:221 +0x55c

Clusterversion: 4.10.39

Are those crashes also blocking updates of already installed versions of operators? We saw that a installed operator wasn't marked as update-able even though there is a newer version in the packagemanifest.
Also are there any workarounds/mitigations to avoid those crashes?

Comment 25 errata-xmlrpc 2023-01-17 19:50:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Comment 26 Red Hat Bugzilla 2023-09-18 04:40:17 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.