Bug 1728536

Summary: Cannot get all pakcagemanifest resources due to 504 timeout
Product: OpenShift Container Platform Reporter: Jian Zhang <jiazha>
Component: OLMAssignee: Nick Hale <nhale>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: bandrade, chezhang, chhu, chuo, ecordell, fdeutsch, jfan, kuiwang, nhale, piqin, rhallise, scolange, tbuskey, wjiang
Version: 4.1.z   
Target Milestone: ---   
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1746406 (view as bug list) Environment:
Last Closed: 2020-03-12 21:23:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1746406    
Bug Blocks:    

Comment 1 chhu 2019-07-11 07:08:59 UTC
Hi, Evan

Will you please help to have a check ? This bug is blocking our current test, 
please see the reproduce steps as below:

1. Install ocp4.1.4
2. Add new el7 worker
3. Create project  
# oc create ns kubevirt-hyperconverged
# oc project kubevirt-hyperconverged

4. Login to the web console, click the kubevirt-hyperconverged project, 
  click Catalog -> Operator management, click create Subscription,
  no Error

5. Create an OperatorGroup
[root@hp-dl360g9-16 ~]# cat <<EOF | oc create -f -
> apiVersion: operators.coreos.com/v1alpha2
> kind: OperatorGroup
> metadata:
>   name: hco-operatorgroup
>   namespace: kubevirt-hyperconverged
> EOF
operatorgroup.operators.coreos.com/hco-operatorgroup created

6. Add patch for insecureRegistries
oc patch --type=merge --patch='{"spec":{"registrySources":{"insecureRegistries":["brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888"]}}}' image.config.openshift.io/cluster
image.config.openshift.io/cluster patched

Or Edit /etc/containers/registries.conf:
[registries.insecure]
registries = ["brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888"]
# systemctl restart crio

7. Create a catalog source
[root@hp-dl360g9-16 ~]# cat <<EOF | oc create -f -
> apiVersion: operators.coreos.com/v1alpha1
> kind: CatalogSource
> metadata:
>   name: hco-catalogsource
>   namespace: openshift-operator-lifecycle-manager
>   imagePullPolicy: Always
> spec:
>   sourceType: grpc
>   image: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/hco-bundle-registry:v2.0.0-36
>   displayName: KubeVirt HyperConverged
>   publisher: Red Hat
> EOF
catalogsource.operators.coreos.com/hco-catalogsource created

8. Check the pod: hco-catalogsource-qzt4r is running:

[root@hp-dl360g9-16 ~]# oc get pods -n openshift-operator-lifecycle-manager
NAME                               READY   STATUS    RESTARTS   AGE
catalog-operator-dc45db975-zsgk9   1/1     Running   0          10d
hco-catalogsource-qzt4r            0/1     Running   0          54s
......

9.  Login to the web console, click the kubevirt-hyperconverged project, 
  click Catalog -> Operator management, click create Subscription,
  get error:
 https://user-images.githubusercontent.com/15416633/60945176-75f6d300-a31d-11e9-871b-242eb310538f.png


Regards,
Chenli Hu

Comment 2 Evan Cordell 2019-07-11 12:33:41 UTC
It looks like the hco catalogsource pod is not ready, right? Can you look at that pod or grab logs?

Even if that is the issue the packageserver shouldn’t 504 just because one catalog is bad. We will try to make a small repro for this.

Comment 3 Jian Zhang 2019-07-12 09:40:25 UTC
Evan,

> It looks like the hco catalogsource pod is not ready, right? Can you look at that pod or grab logs?

This issue also occurs after a rhel7(location: Beijing) worker added in even if do nothing.
I have added the cluster info on above "Additional info:" section for your deep debugging.

Comment 4 Fabian Deutsch 2019-07-15 09:21:29 UTC
Ryan, FYI

Comment 5 Vu Dinh 2019-07-30 15:16:29 UTC
Hey Chenli,

I have been trying to reproduce this issue with a cluster 4.1.4 that is created by cluster-bot. However, unfortunately, I'm unable to reproduce at this point.
There are two things that I have done differently from the steps that you provided:

1. I didn't add new el7 worker node as the cluster already has multiple worker nodes already.
2. I have tried to add brew image reference but it didn't work due to some VPN access problem with the AWS cluster that I have. Instead, I use an actual "hco-bundle-registry" in registry.redhat.io. (registry.redhat.io/container-native-virtualization/hco-bundle-registry:v2.0.0).

My experience was that the "hco-catalogsource" pod was up and running as expected. No errors were noticed on my end. I was able to create subscription for kubevirt-hyperconverged just fine.

Thanks,
Vu

Comment 23 errata-xmlrpc 2020-03-12 21:23:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0691