Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2115814

Summary: Issues with samples in a disconnected cluster in OCP 4.9
Product: OpenShift Container Platform Reporter: Andy Bartlett <andbartl>
Component: Dev ConsoleAssignee: Christoph Jerolimov <cjerolim>
Status: CLOSED ERRATA QA Contact: spathak <spathak>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.9CC: cjerolim, nmukherj
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:54:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andy Bartlett 2022-08-05 13:02:21 UTC
Description of problem:

Hi,
 My customer has reported the following issue:

We upgraded from 4.9.33 to 4.10.22 on our development and test clusters. We are now able, after an approximate 30 second loading screen, to see the samples. However we noticed that these samples use code from github. We do sync the images, but since the samples are on a remote source, we cannot use them on our clusters. Would this mean that the samples are not usable on our clusters?

Version-Release number of selected component (if applicable):

OCP 4.9.33

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Christoph Jerolimov 2022-08-05 15:44:14 UTC
@andbartl this happens after updating to 4.10, right? So I would like to update the affected version to 4.10 then?


My guess (we need to verify this) is that our UI waits in the catalog until all network calls finished (successfully or not) to show the samples.

Unfortunately the network call for devfile samples doesn't fail immediately (why not?) and it took "exactly" 30 seconds until the call timeouts and we show the result, which means the other samples in this case.

What we can do:
1. Reproduce this on a totally disconnected cluster (tested this today on a disconnected cluster and learned that this means a disconnected cluster with a proxy :))
2. Check if there is any parameter that we're running on a totally disconnected cluster so that we don't fetch Devfile samples (from the official devfile registry) in this case?!
3. Update the developer catalog so that it shows results after n seconds (n round about 2-5 seconds?) also if other network calls are still pending. Waiting that all network calls finish or this n seconds makes sense so that the UI doesn't flicker when everything responds in 100-500ms.

Comment 5 Christoph Jerolimov 2022-10-26 22:44:57 UTC
Hi Andy (andbartl),

TL;DR: Samples has the same issue as the developer catalog. Both showing Devfiles and we had timeout issues with them on disconnected clusters. We have two fixes for the Samples implemented and backported.

1. The proxy support, so that Devfiles on a disconnected cluster could get loaded. (I need to check if the import works also. I will followup on this asap.)
2. The developer catalog shows all items after 3 seconds, independent of any network call taking more time.

Both changes are available in our releases 4.12.0, 4.11.8, 4.10.37.

The proxy support is also available in 4.9.50, the UI fix to show other items after 3 seconds in our merge queue, and should be part of the next 4.9 release.

Additional fixes that are not released yet:

3. A reduced Devfile API timeout from 30 to 10 seconds is in code review.
4. We implemented a reduced timeout when loading Helm chart for the Developer catalog.

Let me know if you need more details, I will try to update this ticket from time to time until all PRs are merged.


========================================================================

Here is a full overview of all related issues. (October 27th)

========================================================================

## 1. Developer catalog fails to load => Proxy support added when loading Devfiles

Old versions of the Devfile api ignores a proxy configuration on a disconnected cluster. The new version uses the proxy configuration correctly. This doesn't help fully disconnected clusters. With this fix alone the API calls still timeouted after 30seconds. (See next two fixes!)

- 4.12.0  / https://bugzilla.redhat.com/show_bug.cgi?id=2112812 / https://github.com/openshift/console/pull/12011
- 4.11.5  / https://issues.redhat.com/browse/OCPBUGS-1030       / https://github.com/openshift/console/pull/12028
- 4.10.35 / https://issues.redhat.com/browse/OCPBUGS-1634       / https://github.com/openshift/console/pull/12040
- 4.9.50  / https://issues.redhat.com/browse/OCPBUGS-1635       / https://github.com/openshift/console/pull/12041
- 4.8     / doesn't load devfiles from the devfile registry, so no update is needed

========================================================================

## 2. Show already loaded catalog items after a timeout (3sec)

The first issue is that the Developer catalog and Samples catalog waits 30 second (until the Devfile network call timed-out) to show anything. This was a frontend issue we fixed. After 3 seconds we show now everything that is loaded until then. It still takes 30 second until the error is shown, at least until the timeout in the next fix is get merged.

- 4.12.0  / https://issues.redhat.com/browse/OCPBUGS-270  / https://github.com/openshift/console/pull/12019
- 4.11.8  / https://issues.redhat.com/browse/OCPBUGS-1523 / https://github.com/openshift/console/pull/12070
- 4.10.37 / https://issues.redhat.com/browse/OCPBUGS-1759 / https://github.com/openshift/console/pull/12106
- 4.9.?   / https://issues.redhat.com/browse/OCPBUGS-2008 / https://github.com/openshift/console/pull/12136 in merge queue
- 4.8.?   / planned when 4.9 is merged

========================================================================

## 3. Developer catalog fails to load => Reduce Devfile timeout

On fully disconnected clusters the API call to the devfile registry takes up to 30 seconds. The devfile registry calls uses now a reduced
timeout of 10 seconds. Whatever delays the network call, this will help that the UI shows an error earlier.

- 4.12.0 / https://issues.redhat.com/browse/OCPBUGS-1106 / https://github.com/openshift/console/pull/12043 / needs validation
- 4.11.? / https://issues.redhat.com/browse/OCPBUGS-2716 / https://github.com/openshift/console/pull/12186 / in code review
- 4.10.? / https://issues.redhat.com/browse/OCPBUGS-2717 / https://github.com/openshift/console/pull/12191 / in code review
- 4.9.?  / https://issues.redhat.com/browse/OCPBUGS-2718 / https://github.com/openshift/console/pull/12192 / in code review
- 4.8     / doesn't load devfiles from the devfile registry, so no update is needed

========================================================================

## 4. No helm chart could be loaded if one timeouted (reduced timeout per chart repository to 5 seconds)

4.12.0   / https://issues.redhat.com/browse/OCPBUGS-803  / https://github.com/openshift/console/pull/12096
4.11.8   / https://issues.redhat.com/browse/OCPBUGS-1782 / https://github.com/openshift/console/pull/12107

(internal follow up)
4.12.0   / https://issues.redhat.com/browse/OCPBUGS-2344 / https://github.com/openshift/console/pull/12141
4.11.8   / https://issues.redhat.com/browse/OCPBUGS-2515 / https://github.com/openshift/console/pull/12182

UI change to show alerts when some chart repositories could not be fetched
4.12.0   / https://issues.redhat.com/browse/OCPBUGS-1959 / https://github.com/openshift/console/pull/12200

4.8-4.10 / tbd.

========================================================================

Comment 6 Christoph Jerolimov 2022-10-26 22:48:01 UTC
I updated this to in progress and would recommend to close it when the 4.9 PR https://github.com/openshift/console/pull/12136 is merged.

Comment 8 Christoph Jerolimov 2022-11-25 10:12:46 UTC
Hi @andbartl

we backported the most PRs already. Esp. "Show already loaded catalog items after a timeout (3sec)" is now released, it is part of 4.9.52. 4.8 is still in progress.

I'm closing this, you can follow this ticket for 4.8: https://issues.redhat.com/browse/OCPBUGS-4120

Comment 11 spathak@redhat.com 2022-11-30 14:47:52 UTC
Verified on a fully disconnected cluster with version: 4.9.0-0.nightly-2022-11-30-072039
Browser version: Chrome 106

Comment 12 Christoph Jerolimov 2022-12-19 10:23:04 UTC
Changed Doc Type to "No Doc Update" because this issue is from the customer perspective really similar to https://issues.redhat.com/browse/OCPBUGS-270

We kept two bugs because we improved the backend and frontend as well.

Comment 14 errata-xmlrpc 2023-01-17 19:54:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399