Description of problem: If a cluster is deployed with disconnected options in an environment where it has access to the internet it will pull content from registries.redhat.io and quay.io. If an internal registry is defined the cluster should not reach out to other sources. Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version $ oc version Client Version: 4.4.5 Server Version: 4.4.5 Kubernetes Version: v1.17.1 How reproducible: Steps to Reproduce: 1. Install a cluster pointed to local image repo ( Artifactory ) 2. If cluster has access to the internet you will see the full operator catalog show up and all of the sample applications Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Cluster should not reach out to the internet if local sources are provided. Additional info: Please attach logs from ansible-playbook with the -vvv flag
Seems unlikely to me that this is an installer issue. It might be an OLM issue? A must-gather would be useful for debugging [1], but I don't see one in the associated case. [1]: https://docs.openshift.com/container-platform/4.1/support/gathering-cluster-data.html#support_gathering_data_gathering-cluster-data
> 2. If cluster has access to the internet you will see the full operator catalog show up and all of the sample applications Seems like an samples operator bug.
@Abhinav @Trevor In the past I've asked how CVO managed operators could determine that a cluster is "deployed with disconnected options" and was told there was no explicit way to determine that. Hence, yes, by design, samples operator queries redhat.registry.io to see if the cluster is isolated or not, hence being "disconnected". Please elaborate on what samples operator should be looking for form a cluster deployment perspective to see if the cluster is meant to be disconnected? I would love to start to leverage that. Adam / Ben FYI
> Install a cluster pointed to local image repo ( Artifactory ) Can we get more detail on how this was implemented? Samples sets up ImageStreams backed by two types of images: a. openshift/cli [1] and similar, which are backed by images referenced by the release image [2]. Those images will be mirrored into the local repository when you 'oc adm release mirror ...' which will suggest imageContentSources to use so that CRI-O and other pullers will fetch them from the local registry instead of the canonical registry [3]. b. golang [4] and similar, which are not referenced from the release image and thus not mirrored by 'oc adm release mirror ...'. If you want to avoid samples trying to hit the canonical registry for (b), you should mirror those samples into your local registry and make sure imageContentSources also included the entries required for this [5]. Or fill in skippedImagestreams or other samples configuration [6]. I don't see a need for the samples operator to automatically detect and react to a restricted-network state. [1]: https://github.com/openshift/cluster-samples-operator/blob/14d1ed194c97a5d1c28b9f5a530252c404a0e1a5/manifests/08-openshift-imagestreams.yaml#L4-L16 [2]: https://github.com/openshift/cluster-samples-operator/blob/14d1ed194c97a5d1c28b9f5a530252c404a0e1a5/manifests/image-references#L5-L8 [3]: https://docs.openshift.com/container-platform/4.5/installing/installing_bare_metal/installing-restricted-networks-bare-metal.html#installation-bare-metal-config-yaml_installing-restricted-networks-bare-metal [4]: https://github.com/openshift/cluster-samples-operator/blob/14d1ed194c97a5d1c28b9f5a530252c404a0e1a5/assets/operator/ocp-x86_64/golang/imagestreams/golang-rhel.json#L50 [5]: https://docs.openshift.com/container-platform/4.5/openshift_images/samples-operator-alt-registry.html [6]: https://docs.openshift.com/container-platform/4.5/openshift_images/configuring-samples-operator.html#samples-operator-configuration_configuring-samples-operator
> I don't see a need for the samples operator to automatically detect and react to a restricted-network state. the intent is so the samples operator can avoid firing off alerts about failing imports(or creating imagestreams that will fail to import and confuse users about why they are broken) when there is no reason to expect the imports could possibly succeed. As an aside, this probably needs to be split into two bugs, one for samples operator and one for OLM catalog, though as Gabe alluded to, unless there is an explicit config mechanism for a component to determine "this cluster is supposed to be disconnected, don't try to reach out to the internet even if it's accessible", they will both likely get closed as "can't fix". Presence of an ICSP that refs registry.redhat.io is not sufficient to make that determination (you might just have that for performance or redundancy/resiliency reasons).
> the intent is so the samples operator can avoid firing off alerts about failing imports(or creating imagestreams that will fail to import and confuse users about why they are broken) when there is no reason to expect the imports could possibly succeed. Isn't the solution to "$IMPORT will never succeed" adding it to skippedImagestreams or setting managementState=Removed?
> Isn't the solution to "$IMPORT will never succeed" adding it to skippedImagestreams or setting managementState=Removed? yes. which is what the samples operator does for you automatically(sets itself to removed), so that each disconnected admin doesn't have to run into this problem themselves and address it on day 2.
> Isn't the solution to "$IMPORT will never succeed" adding it to skippedImagestreams or setting managementState=Removed? regardless, the samples operator *not* marking itself removed when it thinks we're disconnected(on the assumption you'd just make the admin do it), would not address this customer's issue which is that they want the samples operator to try *harder* to mark itself removed(or otherwise not import things) when the cluster should act disconnected even when it is not disconnected.
> ...on the assumption you'd just make the admin do it... Yes, I think this is "admins who are ok not having particular samples should explicitly tell the samples operator that". If the operator extrapolates from "I can't reach these images today" do decide "I'll never reach these images", then it could misconstrue a temporary network hiccup and never recover. > ...the cluster should act disconnected even when it is not disconnected... And you can do this by either setting skippedImagestreams, managementState=Removed, or ICSPs. Although in the ICSP case, the cluster might occasionally fall back to the canonical registry if an attempt to reach the mirrors hiccups for some reason. If that's a concern... I guess we'd need to either blackhole the canonical registries or talk to the CRI-O folks about a mirror setup where you explicitly block access to the canonical registries instead of just putting it at the back of the fallback chain.
OK I was able to employ the support documented in https://docs.openshift.com/container-platform/4.5/installing/install_config/installing-customizing.html to force samples operator to bootstrap as removed. I added a yaml file in the openshift directory produced by openshift-install create manifests with the following content: apiVersion: samples.operator.openshift.io/v1 kind: Config metadata: name: cluster spec: architectures: - x86_64 managementState: Removed This is what is required for users who do not want the default samples installed if a cluster is deployed with disconnected options in an environment where it has access to the internet. I'm sending this bugzilla over to our Docs team to update the disconnected related doc, cross referencing the install and samples doc as needed around the scenario in question, to note this fact.
Ben reminded me there is also an OLM catalog piece to this as well. Will clone this bug and send it to that team for review.
We updated the related content here: https://github.com/openshift/openshift-docs/pull/26134 The PR went through SME and QE review before merging, so I'm updating the status to RELEASE_PENDING. Link to updated documentation: docs.openshift.com https://docs.openshift.com/container-platform/4.5/openshift_images/configuring-samples-operator.html#samples-operator-restricted-network-install-with-access Please let me know if additional documentation updates are required. Thank you!
Link to updated documentation on the Customer Portal: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.7/html-single/images/index#samples-operator-overview_configuring-samples-operator
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days