As per enhancement merged for 4.6 release [1], CVO should use the CLUSTER_PROFILE env variable to select the manifests to apply. As of today, it doesn't work because the code simply doesn't exist. [1] https://github.com/openshift/enhancements/pull/200 --- https://bugzilla.redhat.com/show_bug.cgi?id=1871890 prerequisite is not done yet and 2 PRs are still opened: https://github.com/openshift/cluster-storage-operator/pull/117 https://github.com/operator-framework/operator-lifecycle-manager/pull/1887 After that, the env. variable can be used by the CVO with, for instance, this PR: https://github.com/openshift/cluster-version-operator/pull/404
I am following https://github.com/openshift/enhancements/blob/master/enhancements/update/cluster-profiles.md#with-the-installer to test this. $ export OPENSHIFT_INSTALL_EXPERIMENTAL_CLUSTER_PROFILE=single-node-developer $ openshift-install version openshift-install 4.7.0-0.nightly-2021-01-12-203716 built from commit b3dae7f4736bcd1dbf5a1e0ddafa826ee1738d81 release image registry.ci.openshift.org/ocp/release@sha256:c97466158d19a6e6b5563da4365d42ebe5579421b1163f3a2d6778ceb5388aed $ openshift-install create cluster Installation get completed successfully, but I did not see CLUSTER_PROFILE env. variable is injected into CVO deployment. $ oc -n openshift-cluster-version get deployment.apps/cluster-version-operator -o yaml|grep -i PROFILE <empty> Run the following command to find out some source is for `self-managed-high-availability` cluster, but not for `single-node-developer` $ for i in `ls`; do if grep -q "include.release.openshift.io/self-managed-high-availability" $i; then if ! grep -q "include.release.openshift.io/single-node-developer" $i; then echo $i; fi; fi; done <--snip--> 0000_90_cluster-update-keys_configmap.yaml <--snip--> Take 0000_90_cluster-update-keys_configmap.yaml as testing target. $ cat 0000_90_cluster-update-keys_configmap.yaml <--snip--> kind: ConfigMap metadata: annotations: include.release.openshift.io/self-managed-high-availability: "true" release.openshift.io/verification-config-map: "" creationTimestamp: null name: release-verification namespace: openshift-config-managed $ oc get cm release-verification -n openshift-config-managed NAME DATA AGE release-verification 3 147m Per my understanding, `release-verification` should NOT be created once user set CLUSTER_PROFILE=single-node-developer Go to bootstrap node, check /usr/local/bin/bootkube.sh, if [ ! -f cvo-bootstrap.done ] then echo "Rendering Cluster Version Operator Manifests..." rm --recursive --force cvo-bootstrap bootkube_podman_run \ --volume "$PWD:/assets:z" \ --env CLUSTER_PROFILE="single-node-developer" \ "${RELEASE_IMAGE_DIGEST}" \ render \ --output-dir=/assets/cvo-bootstrap \ --release-image="${RELEASE_IMAGE_DIGEST}" cp cvo-bootstrap/bootstrap/* bootstrap-manifests/ cp cvo-bootstrap/manifests/* manifests/ ## FIXME: CVO should use `/etc/kubernetes/bootstrap-secrets/kubeconfig` instead cp auth/kubeconfig-loopback /etc/kubernetes/kubeconfig touch cvo-bootstrap.done fi The setting already take effect in bootkube.sh. But seem like this env VAR is never respected by CVO, do I miss anything?
The work we did for 4.7 is preparatory work for 4.8. It doesn't work with the installer. The current implementation is really made for IBM Cloud: if you deploy the CVO as they do and pass the CLUSTER_PROFILE env var, then it will be used. With the installer, it's different. We didn't add yet the env. variable in manifests. It will be added in 4.8. If we did it here, it would break the upgrade from 4.6 as the old CVO doesn't know what to do with {{ .ClusterProfile }} template variable in manifests. Also, I don´t think we have 2 complete profiles. I guess we only have the default one. single-node-developer is still in progress: https://bugzilla.redhat.com/show_bug.cgi?id=1915473
Thanks for details. I have no touch with IBM Cloud. Can you help verify this bug? or can you tell me a simple way to verify this bug on a common cloud, such as, ipi on aws.
We need at least bug 1891068 to be addressed to add a missing annotation for our current two profiles.
For verification, building on Guillaume's suggestion in comment 3, you could install a vanilla AWS cluster, scale the CVO Deployment down to zero, bump the CVO Deployment to set CLUSTER_PROFILE=single-node-production-edge, and scale the CVO Deployment back up to one. Then check the resulting CVO logs to confirm that it is only pushing single-node-production-edge manifests. Looking for a useful manifest in a recent 4.7 image: $ oc adm release extract --to manifests quay.io/openshift-release-dev/ocp-release:4.7.0-fc.2-x86_64 $ grep -rA 2 self-managed-high-availability manifests | grep -B6 single-node-production-edge | head -n7 manifests/0000_50_console-operator_sample-application-quickstart.yaml: include.release.openshift.io/self-managed-high-availability: "true" manifests/0000_50_console-operator_sample-application-quickstart.yaml- include.release.openshift.io/single-node-developer: "true" manifests/0000_50_console-operator_sample-application-quickstart.yaml-spec: -- manifests/0000_50_console-operator_ocs-install-tour-quickstart.yaml: include.release.openshift.io/self-managed-high-availability: "true" manifests/0000_50_console-operator_ocs-install-tour-quickstart.yaml- include.release.openshift.io/single-node-developer: "true" manifests/0000_50_console-operator_ocs-install-tour-quickstart.yaml- include.release.openshift.io/single-node-production-edge: "true" $ head -n4 manifests/0000_50_console-operator_sample-application-quickstart.yaml apiVersion: console.openshift.io/v1 kind: ConsoleQuickStart metadata: name: sample-application So CVO should not be attempting to sync the sample-application ConsoleQuickStart object in the single-node-production-edge profile. We aren't currently actually supporting changing profiles on the fly like this, but if we get lucky and the CVO doesn't blow up on the profile change, seeing the CVO get past that the spot where that manifest used to live without trying to push that manifest will show that the 404 code is working.
Thanks for your detailed steps. Verified this bug with 4.7.0-0.nightly-2021-01-12-203716. 1. run IPI install on aws. 2. oc scale --replicas=0 deployment.apps/cluster-version-operator -n openshift-cluster-version 3. edit deployment.apps/cluster-version-operator to set CLUSTER_PROFILE=single-node-production-edge <--snip--> spec: containers: - args: - start - --release-image=registry.ci.openshift.org/ocp/release@sha256:c97466158d19a6e6b5563da4365d42ebe5579421b1163f3a2d6778ceb5388aed - --enable-auto-update=false - --enable-default-cluster-version=true - --serving-cert-file=/etc/tls/serving-cert/tls.crt - --serving-key-file=/etc/tls/serving-cert/tls.key - --v=5 env: - name: CLUSTER_PROFILE value: single-node-production-edge <--snip--> 4. oc scale --replicas=1 deployment.apps/cluster-version-operator -n openshift-cluster-version 5. check 'sample-application' ConsoleQuickStart $ oc get ConsoleQuickStart NAME AGE add-healthchecks 21h explore-pipelines 21h explore-serverless 21h monitor-sampleapp 21h ocs-install-tour 21h sample-application 21h 6. delete 'ocs-install-tour' and 'sample-application' ConsoleQuickStart together $ oc delete ConsoleQuickStart ocs-install-tour sample-application consolequickstart.console.openshift.io "ocs-install-tour" deleted consolequickstart.console.openshift.io "sample-application" deleted $ oc get ConsoleQuickStart NAME AGE add-healthchecks 21h explore-pipelines 21h explore-serverless 21h monitor-sampleapp 21h 7. wait some minutes 8. check again, "ocs-install-tour" will be recreated, but "sample-application" not $ oc get ConsoleQuickStart NAME AGE add-healthchecks 21h explore-pipelines 21h explore-serverless 21h monitor-sampleapp 21h ocs-install-tour 3m25s 9. edit deployment.apps/cluster-version-operator to remove CLUSTER_PROFILE=single-node-production-edge setting 10. wait from some minutes, cvo will be redeployed, and wait sample-application is synced, and check again $ oc get ConsoleQuickStart NAME AGE add-healthchecks 21h explore-pipelines 21h explore-serverless 21h monitor-sampleapp 21h ocs-install-tour 12m sample-application 17s Now 'sample-application' is synced and recreated. So that means CLUSTER_PROFILE take effect.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633