Cause: Within the catalogSource resource, the RegistryServiceStatus stores service information that is used to generate an address that OLM relies on in order to establish a connection with the associated pod.
Consequence: If the RegistryStatusService is not nil and is missing the namespace, name, and port information for its service, OLM is unable to recover until the catalogService's associated pod has an invalid image or spec.
Fix: When reconciling a CatalogSource, OLM will now ensure that the RegistryServiceStatus of the catalogSource is valid and will update the catalogSource's status to reflect the change. Additionally, this address is stored within the status of the catalogSource within the status.GRPCConnectionState.Address field. If the address changes, OLM will update this field to reflect the new address as well.
Result: The `.status.connectionState.address` field within a catalogSource should no longer be nil.
Description of problem:
upgrade path: 4.5.41-x86_64--> 4.6.0-0.nightly-2021-11-22-174225
after upgrade, .status.connectionState.address of catsrc community-operators is not correct
zhaoxia@xzha-mac JIRA-2196 % oc get catsrc community-operators -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
annotations:
operatorframework.io/managed-by: marketplace-operator
creationTimestamp: "2021-11-24T08:39:47Z"
generation: 2
labels:
olm-visibility: hidden
openshift-marketplace: "true"
opsrc-datastore: "true"
opsrc-provider: community
name: community-operators
namespace: openshift-marketplace
resourceVersion: "106325"
selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-marketplace/catalogsources/community-operators
uid: 6084e6b3-5d13-458d-9cca-a68f02ee36de
spec:
displayName: Community Operators
icon:
base64data: ""
mediatype: ""
image: registry.redhat.io/redhat/community-operator-index:v4.6
priority: -400
publisher: Red Hat
sourceType: grpc
updateStrategy:
registryPoll:
interval: 10m0s
status:
connectionState:
address: '..svc:'
lastConnect: "2021-11-24T11:56:04Z"
lastObservedState: TRANSIENT_FAILURE
latestImageRegistryPoll: "2021-11-24T11:52:03Z"
registryService:
createdAt: "2021-11-24T08:39:48Z"
protocol: grpc
zhaoxia@xzha-mac JIRA-2196 % oc get packagemanifests | grep -i comm
zhaoxia@xzha-mac JIRA-2196 %
Version-Release number of selected component (if applicable):
zhaoxia@xzha-mac JIRA-2196 % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.0-0.nightly-2021-11-22-174225 True False 23m Cluster version is 4.6.0-0.nightly-2021-11-22-174225
How reproducible:
not always
Steps to Reproduce:
1. upgrade path: 4.5.41-x86_64--> 4.6.0-0.nightly-2021-11-22-174225
2. check catsrc
3.
Actual results:
address: '..svc:'
Expected results:
address have the correct value.
Additional info:
If I delete the pod community-operators-crxrv, after the new pod is created, the address is correct.
verify:
upgrade to 4.10.0-0.nightly-2022-01-27-144113, no such issue, address is correct.
zhaoxia@xzha-mac ~ % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2022-01-27-144113 True False 29m Cluster version is 4.10.0-0.nightly-2022-01-27-144113
zhaoxia@xzha-mac ~ % oc get catsrc -A -o yaml| grep address
address: certified-operators.openshift-marketplace.svc:50051
address: community-operators.openshift-marketplace.svc:50051
address: qe-app-registry.openshift-marketplace.svc:50051
address: redhat-marketplace.openshift-marketplace.svc:50051
address: redhat-operators.openshift-marketplace.svc:50051
Check latest upgrade ci result, no such issue.
LGTM, verified.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2022:0056
Description of problem: upgrade path: 4.5.41-x86_64--> 4.6.0-0.nightly-2021-11-22-174225 after upgrade, .status.connectionState.address of catsrc community-operators is not correct zhaoxia@xzha-mac JIRA-2196 % oc get catsrc community-operators -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: annotations: operatorframework.io/managed-by: marketplace-operator creationTimestamp: "2021-11-24T08:39:47Z" generation: 2 labels: olm-visibility: hidden openshift-marketplace: "true" opsrc-datastore: "true" opsrc-provider: community name: community-operators namespace: openshift-marketplace resourceVersion: "106325" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-marketplace/catalogsources/community-operators uid: 6084e6b3-5d13-458d-9cca-a68f02ee36de spec: displayName: Community Operators icon: base64data: "" mediatype: "" image: registry.redhat.io/redhat/community-operator-index:v4.6 priority: -400 publisher: Red Hat sourceType: grpc updateStrategy: registryPoll: interval: 10m0s status: connectionState: address: '..svc:' lastConnect: "2021-11-24T11:56:04Z" lastObservedState: TRANSIENT_FAILURE latestImageRegistryPoll: "2021-11-24T11:52:03Z" registryService: createdAt: "2021-11-24T08:39:48Z" protocol: grpc zhaoxia@xzha-mac JIRA-2196 % oc get packagemanifests | grep -i comm zhaoxia@xzha-mac JIRA-2196 % Version-Release number of selected component (if applicable): zhaoxia@xzha-mac JIRA-2196 % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2021-11-22-174225 True False 23m Cluster version is 4.6.0-0.nightly-2021-11-22-174225 How reproducible: not always Steps to Reproduce: 1. upgrade path: 4.5.41-x86_64--> 4.6.0-0.nightly-2021-11-22-174225 2. check catsrc 3. Actual results: address: '..svc:' Expected results: address have the correct value. Additional info: If I delete the pod community-operators-crxrv, after the new pod is created, the address is correct.