1728536 – Cannot get all pakcagemanifest resources due to 504 timeout

Bug 1728536 - Cannot get all pakcagemanifest resources due to 504 timeout

Summary: Cannot get all pakcagemanifest resources due to 504 timeout

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	OLM
Sub Component:
Version:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.1.z
Assignee:	Nick Hale
QA Contact:	Jian Zhang
Docs Contact:
URL:
Whiteboard:
Depends On:	1746406
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-10 06:40 UTC by Jian Zhang
Modified:	2020-03-12 21:23 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1746406 (view as bug list)
Environment:
Last Closed:	2020-03-12 21:23:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:0691	0	None	None	None	2020-03-12 21:23:49 UTC

Comment 1 chhu 2019-07-11 07:08:59 UTC

Hi, Evan

Will you please help to have a check ? This bug is blocking our current test, 
please see the reproduce steps as below:

1. Install ocp4.1.4
2. Add new el7 worker
3. Create project  
# oc create ns kubevirt-hyperconverged
# oc project kubevirt-hyperconverged

4. Login to the web console, click the kubevirt-hyperconverged project, 
  click Catalog -> Operator management, click create Subscription,
  no Error

5. Create an OperatorGroup
[root@hp-dl360g9-16 ~]# cat <<EOF | oc create -f -
> apiVersion: operators.coreos.com/v1alpha2
> kind: OperatorGroup
> metadata:
>   name: hco-operatorgroup
>   namespace: kubevirt-hyperconverged
> EOF
operatorgroup.operators.coreos.com/hco-operatorgroup created

6. Add patch for insecureRegistries
oc patch --type=merge --patch='{"spec":{"registrySources":{"insecureRegistries":["brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888"]}}}' image.config.openshift.io/cluster
image.config.openshift.io/cluster patched

Or Edit /etc/containers/registries.conf:
[registries.insecure]
registries = ["brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888"]
# systemctl restart crio

7. Create a catalog source
[root@hp-dl360g9-16 ~]# cat <<EOF | oc create -f -
> apiVersion: operators.coreos.com/v1alpha1
> kind: CatalogSource
> metadata:
>   name: hco-catalogsource
>   namespace: openshift-operator-lifecycle-manager
>   imagePullPolicy: Always
> spec:
>   sourceType: grpc
>   image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/hco-bundle-registry:v2.0.0-36
>   displayName: KubeVirt HyperConverged
>   publisher: Red Hat
> EOF
catalogsource.operators.coreos.com/hco-catalogsource created

8. Check the pod: hco-catalogsource-qzt4r is running:

[root@hp-dl360g9-16 ~]# oc get pods -n openshift-operator-lifecycle-manager
NAME                               READY   STATUS    RESTARTS   AGE
catalog-operator-dc45db975-zsgk9   1/1     Running   0          10d
hco-catalogsource-qzt4r            0/1     Running   0          54s
......

9.  Login to the web console, click the kubevirt-hyperconverged project, 
  click Catalog -> Operator management, click create Subscription,
  get error:
 https://user-images.githubusercontent.com/15416633/60945176-75f6d300-a31d-11e9-871b-242eb310538f.png


Regards,
Chenli Hu

Comment 2 Evan Cordell 2019-07-11 12:33:41 UTC

It looks like the hco catalogsource pod is not ready, right? Can you look at that pod or grab logs?

Even if that is the issue the packageserver shouldn’t 504 just because one catalog is bad. We will try to make a small repro for this.

Comment 3 Jian Zhang 2019-07-12 09:40:25 UTC

Evan,

> It looks like the hco catalogsource pod is not ready, right? Can you look at that pod or grab logs?

This issue also occurs after a rhel7(location: Beijing) worker added in even if do nothing.
I have added the cluster info on above "Additional info:" section for your deep debugging.

Comment 4 Fabian Deutsch 2019-07-15 09:21:29 UTC

Ryan, FYI

Comment 5 Vu Dinh 2019-07-30 15:16:29 UTC

Hey Chenli,

I have been trying to reproduce this issue with a cluster 4.1.4 that is created by cluster-bot. However, unfortunately, I'm unable to reproduce at this point.
There are two things that I have done differently from the steps that you provided:

1. I didn't add new el7 worker node as the cluster already has multiple worker nodes already.
2. I have tried to add brew image reference but it didn't work due to some VPN access problem with the AWS cluster that I have. Instead, I use an actual "hco-bundle-registry" in registry.redhat.io. (registry.redhat.io/container-native-virtualization/hco-bundle-registry:v2.0.0).

My experience was that the "hco-catalogsource" pod was up and running as expected. No errors were noticed on my end. I was able to create subscription for kubevirt-hyperconverged just fine.

Thanks,
Vu

Comment 23 errata-xmlrpc 2020-03-12 21:23:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0691

Note You need to log in before you can comment on or make changes to this bug.