Bug 1878246 - [WebScale] Disconnected cluster still reaches out to the internet
Summary: [WebScale] Disconnected cluster still reaches out to the internet
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Brandi Munilla
QA Contact: Amit Ugol
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks: dit 1878842
TreeView+ depends on / blocked
 
Reported: 2020-09-11 17:24 UTC by Jason Huddleston
Modified: 2023-12-15 19:18 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1878842 (view as bug list)
Environment:
Last Closed: 2021-03-22 19:33:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jason Huddleston 2020-09-11 17:24:13 UTC
Description of problem:

If a cluster is deployed with disconnected options in an environment where it  has access to the internet it will pull content from registries.redhat.io and quay.io. If an internal registry is defined the cluster should not reach out to other sources. 

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

$ oc version
Client Version: 4.4.5
Server Version: 4.4.5
Kubernetes Version: v1.17.1


How reproducible:

Steps to Reproduce:
1. Install a cluster pointed to local image repo ( Artifactory )
2. If cluster has access to the internet you will see the full operator catalog show up and all of the sample applications


Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Cluster should not reach out to the internet if local sources are provided. 

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 W. Trevor King 2020-09-11 20:07:10 UTC
Seems unlikely to me that this is an installer issue.  It might be an OLM issue?  A must-gather would be useful for debugging [1], but I don't see one in the associated case.

[1]: https://docs.openshift.com/container-platform/4.1/support/gathering-cluster-data.html#support_gathering_data_gathering-cluster-data

Comment 2 Abhinav Dahiya 2020-09-11 20:46:18 UTC
> 2. If cluster has access to the internet you will see the full operator catalog show up and all of the sample applications

Seems like an samples operator bug.

Comment 3 Gabe Montero 2020-09-11 21:29:07 UTC
@Abhinav @Trevor

In the past I've asked how CVO managed operators could determine that a cluster is "deployed with disconnected options" and was told there was no explicit way to determine that.

Hence, yes, by design, samples operator queries redhat.registry.io to see if the cluster is isolated or not, hence being "disconnected".

Please elaborate on what samples operator should be looking for form a cluster deployment perspective to see if the cluster is meant to be disconnected?

I would love to start to leverage that.

Adam / Ben FYI

Comment 4 W. Trevor King 2020-09-11 21:52:02 UTC
> Install a cluster pointed to local image repo ( Artifactory )

Can we get more detail on how this was implemented?  Samples sets up ImageStreams backed by two types of images:

a. openshift/cli [1] and similar, which are backed by images referenced by the release image [2].  Those images will be mirrored into the local repository when you 'oc adm release mirror ...' which will suggest imageContentSources to use so that CRI-O and other pullers will fetch them from the local registry instead of the canonical registry [3].
b. golang [4] and similar, which are not referenced from the release image and thus not mirrored by 'oc adm release mirror ...'.  If you want to avoid samples trying to hit the canonical registry for (b), you should mirror those samples into your local registry and make sure imageContentSources also included the entries required for this [5].  Or fill in skippedImagestreams or other samples configuration [6]. 

I don't see a need for the samples operator to automatically detect and react to a restricted-network state.

[1]: https://github.com/openshift/cluster-samples-operator/blob/14d1ed194c97a5d1c28b9f5a530252c404a0e1a5/manifests/08-openshift-imagestreams.yaml#L4-L16
[2]: https://github.com/openshift/cluster-samples-operator/blob/14d1ed194c97a5d1c28b9f5a530252c404a0e1a5/manifests/image-references#L5-L8
[3]: https://docs.openshift.com/container-platform/4.5/installing/installing_bare_metal/installing-restricted-networks-bare-metal.html#installation-bare-metal-config-yaml_installing-restricted-networks-bare-metal
[4]: https://github.com/openshift/cluster-samples-operator/blob/14d1ed194c97a5d1c28b9f5a530252c404a0e1a5/assets/operator/ocp-x86_64/golang/imagestreams/golang-rhel.json#L50
[5]: https://docs.openshift.com/container-platform/4.5/openshift_images/samples-operator-alt-registry.html
[6]: https://docs.openshift.com/container-platform/4.5/openshift_images/configuring-samples-operator.html#samples-operator-configuration_configuring-samples-operator

Comment 5 Ben Parees 2020-09-11 22:12:40 UTC
> I don't see a need for the samples operator to automatically detect and react to a restricted-network state.

the intent is so the samples operator can avoid firing off alerts about failing imports(or creating imagestreams that will fail to import and confuse users about why they are broken) when there is no reason to expect the imports could possibly succeed.


As an aside, this probably needs to be split into two bugs, one for samples operator and one for OLM catalog, though as Gabe alluded to, unless there is an explicit config mechanism for a component to determine "this cluster is supposed to be disconnected, don't try to reach out to the internet even if it's accessible", they will both likely get closed as "can't fix".  Presence of an ICSP that refs registry.redhat.io is not sufficient to make that determination (you might just have that for performance or redundancy/resiliency reasons).

Comment 6 W. Trevor King 2020-09-11 22:50:07 UTC
> the intent is so the samples operator can avoid firing off alerts about failing imports(or creating imagestreams that will fail to import and confuse users about why they are broken) when there is no reason to expect the imports could possibly succeed.

Isn't the solution to "$IMPORT will never succeed" adding it to skippedImagestreams or setting managementState=Removed?

Comment 7 Ben Parees 2020-09-11 22:59:25 UTC
> Isn't the solution to "$IMPORT will never succeed" adding it to skippedImagestreams or setting managementState=Removed?

yes.  which is what the samples operator does for you automatically(sets itself to removed), so that each disconnected admin doesn't have to run into this problem themselves and address it on day 2.

Comment 8 Ben Parees 2020-09-11 23:01:22 UTC
> Isn't the solution to "$IMPORT will never succeed" adding it to skippedImagestreams or setting managementState=Removed?

regardless, the samples operator *not* marking itself removed when it thinks we're disconnected(on the assumption you'd just make the admin do it), would not address this customer's issue which is that they want the samples operator to try *harder* to mark itself removed(or otherwise not import things) when the cluster should act disconnected even when it is not disconnected.

Comment 9 W. Trevor King 2020-09-11 23:11:44 UTC
> ...on the assumption you'd just make the admin do it...

Yes, I think this is "admins who are ok not having particular samples should explicitly tell the samples operator that".  If the operator extrapolates from "I can't reach these images today" do decide "I'll never reach these images", then it could misconstrue a temporary network hiccup and never recover.

> ...the cluster should act disconnected even when it is not disconnected...

And you can do this by either setting skippedImagestreams, managementState=Removed, or ICSPs.  Although in the ICSP case, the cluster might occasionally fall back to the canonical registry if an attempt to reach the mirrors hiccups for some reason.  If that's a concern... I guess we'd need to either blackhole the canonical registries or talk to the CRI-O folks about a mirror setup where you explicitly block access to the canonical registries instead of just putting it at the back of the fallback chain.

Comment 16 Gabe Montero 2020-09-14 16:04:46 UTC
OK I was able to employ the support documented in https://docs.openshift.com/container-platform/4.5/installing/install_config/installing-customizing.html
to force samples operator to bootstrap as removed.

I added a yaml file in the openshift directory produced by openshift-install create manifests with the following content:

apiVersion: samples.operator.openshift.io/v1
kind: Config
metadata:
  name: cluster
spec:
  architectures:
  - x86_64
  managementState: Removed


This is what is required for users who do not want the default samples installed if a cluster is deployed with disconnected options in an environment where it  has access to the internet.

I'm sending this bugzilla over to our Docs team to update the disconnected related doc, cross referencing the install and samples doc as needed around the scenario in question, to note this fact.

Comment 17 Gabe Montero 2020-09-14 16:18:23 UTC
Ben reminded me there is also an OLM catalog piece to this as well.

Will clone this bug and send it to that team for review.

Comment 19 Brandi Munilla 2020-10-26 17:22:43 UTC
We updated the related content here: https://github.com/openshift/openshift-docs/pull/26134

The PR went through SME and QE review before merging, so I'm updating the status to RELEASE_PENDING.

Link to updated documentation:
docs.openshift.com https://docs.openshift.com/container-platform/4.5/openshift_images/configuring-samples-operator.html#samples-operator-restricted-network-install-with-access

Please let me know if additional documentation updates are required. Thank you!

Comment 21 Red Hat Bugzilla 2023-09-15 01:30:33 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.