1973582 – [upgrade from 4.5 to 4.6] .status.connectionState.address of catsrc certified-operators is not correct

Bug 1973582 - [upgrade from 4.5 to 4.6] .status.connectionState.address of catsrc certified-operators is not correct

Summary: [upgrade from 4.5 to 4.6] .status.connectionState.address of catsrc certified...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	OLM
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Alexander Greene
QA Contact:	xzha
Docs Contact:
URL:
Whiteboard:
Depends On:	1967621
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-18 08:23 UTC by xzha
Modified:	2021-10-18 17:35 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: When updating a Catalog Source a Get call is immediately followed by a Delete call on a number of resources related to the Catalog Source. Consequence: In some instances, the resource has already been deleted but the resource still exists in the cache. This allows the Get call to succeed but the following delete call fails as the resource does not exist on cluster. This leads to the catalog address not being updated to the new source. Fix: Updated OLM to ignore the error returned by the Delete call if the resource is not found. Result: OLM no longer reports an error when updating a catalog due to a caching issue that results in a "Resource Not Found" error from the delete call.
Clone Of:
Environment:
Last Closed:	2021-10-18 17:35:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
openshift-operator-lifecycle-manager log (7.81 MB, application/x-tar) 2021-06-18 08:23 UTC, xzha	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:35:41 UTC

Description xzha 2021-06-18 08:23:10 UTC

Created attachment 1792003 [details]
openshift-operator-lifecycle-manager log

Created attachment 1792003 [details]
openshift-operator-lifecycle-manager log

Description of problem:
after upgrade from 4.5.40-x86_64 to 4.6.35-x86_64
.status.connectionState.address of catsrc certified-operators is not correct

[root@preserve-olm-agent-test ~]# oc get catsrc certified-operators -o yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  annotations:
    operatorframework.io/managed-by: marketplace-operator
  creationTimestamp: "2021-06-18T01:08:51Z"
  generation: 2
  labels:
    olm-visibility: hidden
    openshift-marketplace: "true"
    opsrc-datastore: "true"
    opsrc-provider: certified
  name: certified-operators
  namespace: openshift-marketplace
  resourceVersion: "211535"
  selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-marketplace/catalogsources/certified-operators
  uid: 92c1031b-245b-4292-92fc-d958019fc1c5
spec:
  displayName: Certified Operators
  icon:
    base64data: ""
    mediatype: ""
  image: registry.redhat.io/redhat/certified-operator-index:v4.6
  priority: -200
  publisher: Red Hat
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 10m0s
status:
  connectionState:
    address: '..svc:'
    lastConnect: "2021-06-18T08:01:33Z"
    lastObservedState: TRANSIENT_FAILURE
  latestImageRegistryPoll: "2021-06-18T07:53:29Z"
  registryService:
    createdAt: "2021-06-18T01:08:52Z"
    protocol: grpc


time="2021-06-18T04:12:45Z" level=error msg="failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup ..svc: no such host\"" catalog="{certified-operators openshift-marketplace}"


Version-Release number of selected component (if applicable):
upgrade from 4.5.40-x86_64 to 4.6.35-x86_64
[root@preserve-olm-agent-test ~]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.35    True        False         4h30m   Cluster version is 4.6.35

How reproducible:
not always

Steps to Reproduce:
1.upgrade from 4.5.40-x86_64 to 4.6.35-x86_64
2.
3.

Actual results:
.status.connectionState.address of catsrc certified-operators is not correct

Expected results:
.status.connectionState.address of catsrc certified-operators is correct

Additional info:
attached is the log on ns openshift-operator-lifecycle-manager

Comment 5 Kevin Rizza 2021-08-09 17:49:20 UTC

Looks like https://bugzilla.redhat.com/show_bug.cgi?id=1967621 was resolved. We believe this is likely the same issue. Can QE confirm and, if so, mark this one as a duplicate?

Comment 7 xzha 2021-08-11 09:02:01 UTC

Checking the upgrade ci result, looks good from now, didn't find the issue on version release 4.9. Marking as verified.

test case "[upgrade] Check the marketplace status" is success.

LGTM, verified.

Comment 9 W. Trevor King 2021-09-30 17:25:34 UTC

(In reply to Kevin Rizza from comment #5)
> Looks like https://bugzilla.redhat.com/show_bug.cgi?id=1967621 was resolved.
> We believe this is likely the same issue. Can QE confirm and, if so, mark
> this one as a duplicate?

(In reply to xzha from comment #7)
> LGTM, verified.

Do we really want both this bug and bug 1967621 in the 4.9 errata?  I thought the confirmation from comment 7 would lead to this being closed as a dup, per the request in comment 5.

Comment 10 xzha 2021-10-08 01:42:08 UTC

Hi, 
There is no need to add this bug and bug 1967621 in the 4.9 errata, only bug 1967621 in 4.9 will be OK.

Comment 12 errata-xmlrpc 2021-10-18 17:35:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.