+++ This bug was initially created as a clone of Bug #2212773 +++ Description of problem (please be detailed as possible and provide log snippests): While using both private link and non private link clusters, the ocs-provider-server service tries to come up on the non private subnets of the VPC. This would mean that the endpoint will be exposed and from outside the subnets we can ping the endpoint. The AWS ELB created is of type Classic which doesn't support private link clusters. So we need to move to Network Load Balancer and use a internal facing load balancer so that it's only accessible from within the VPC. We need to add annotations to the service as aws controller looks at the annotation to reconcile the service. More info: https://docs.google.com/document/d/10J-J8EuDm8Q-ZMtY0A3mtmHOx8Xvhn-i28faxfWZwts/edit?usp=sharing Version of all relevant components (if applicable): Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. 2. 3. Actual results: ocs provider server should be deployed on private subnets Expected results: ocs provider server is deployed on public subnets Additional info: --- Additional comment from RHEL Program Management on 2023-06-06 10:12:39 UTC --- This bug having no release flag set previously, is now set with release flag 'odf‑4.13.0' to '?', and so is being proposed to be fixed at the ODF 4.13.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag. --- Additional comment from Rewant on 2023-06-19 05:33:46 UTC --- The ocs-operator should not be responsible for adding annotations based on the cloud provider, instead the loadbalancer should be created externally based on the cloud provider and the type of network (private/public), hence adding a new field in StorageCluster CR to toggle between the service type.
*** Bug 2218867 has been marked as a duplicate of this bug. ***
Giving devel ack on Rewant request
I tested the BZ with the following steps: 1. Deploy an AWS 4.13 cluster without ODF. 2. Disable the default Red-hat operator: $ oc patch operatorhub.config.openshift.io/cluster -p='{"spec":{"sources":[{"disabled":true,"name":"redhat-operators"}]}}' --type=merge 3. Get and apply ICPS from catalog image using the commands(in my local): $ oc image extract --filter-by-os linux/amd64 --registry-config ~/IBMProjects/ocs-ci/data/pull-secret quay.io/rhceph-dev/ocs-registry:latest-stable-4.13 --confirm --path /icsp.yaml:~/IBMProjects/ocs-ci/icsp $ oc apply -f ~/IBMProjects/ocs-ci/icsp/icsp.yaml 5. Wait for the MachineConfigPool to be ready. $ oc get MachineConfigPool worker 6. Create the Namespace, CatalogSource, and Subscription using the Yaml file above: https://bugzilla.redhat.com/show_bug.cgi?id=2218863#c9. $ oc apply -f ~/Downloads/deploy-with-olm.yaml 7. Wait until the ocs-operator pod is ready in the openshift-namespace. 8. Create the Storagecluster using the Yaml file above: https://bugzilla.redhat.com/show_bug.cgi?id=2218863#c10. (If there is an issue with Noobaa CRDs, we may also need to apply this Yaml file https://raw.githubusercontent.com/red-hat-storage/mcg-osd-deployer/1eec1147b1ae70e938fa42dabc60453b8cd9449b/shim/crds/noobaa.noobaa.io.yaml). The field 'providerAPIServer' is empty in this Yaml file. 9. Check the pods: $ oc get pods NAME READY STATUS RESTARTS AGE ocs-metrics-exporter-c67ff7957-v8p5b 1/1 Running 0 21m ocs-operator-5f569b66cf-2dkcz 1/1 Running 5 (9m33s ago) 22m rook-ceph-operator-7757dffc4c-ccdb5 1/1 Running 3 (77s ago) 7m42s 10. Check the service and see that the provider server is NodePort as this is the default value: $ oc get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ocs-provider-server NodePort 172.30.104.23 <none> 50051:31659/TCP 8m3s 11. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to LoadBalancer. Check the pods again: $ oc get pods | grep ocs ocs-metrics-exporter-c67ff7957-v8p5b 1/1 Running 0 25m ocs-operator-5f569b66cf-2dkcz 1/1 Running 5 (13m ago) 26m ocs-provider-server-599c77c4db-2jm6t 1/1 Running 0 99s 12. Check the service type again: $ oc get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ocs-metrics-exporter ClusterIP 172.30.30.232 <none> 8080/TCP,8081/TCP 2m3s ocs-provider-server LoadBalancer 172.30.104.23 ac72c186bbc7847669404e8b05d02e19-487159474.us-east-2.elb.amazonaws.com 50051:31659/TCP 12m 13. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to NodePort. Check the service again: $ oc get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ocs-metrics-exporter ClusterIP 172.30.30.232 <none> 8080/TCP,8081/TCP 4m ocs-provider-server NodePort 172.30.104.23 <none> 50051:31659/TCP 14m 14. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to a dummy value "someValue". Check the ocs-operator logs and see the expected error: $ oc logs ocs-operator-5f569b66cf-2dkcz | tail -n 1 {"level":"error","ts":"2023-08-08T10:20:50Z","msg":"Reconciler error","controller":"storagecluster","controllerGroup":"ocs.openshift.io","controllerKind":"StorageCluster","StorageCluster":{"name":"ocs-storagecluster","namespace":"openshift-storage"},"namespace":"openshift-storage","name":"ocs-storagecluster","reconcileID":"c02bb1f2-ddf9-46a0-a986-bef98c60a07f","error":"providerAPIServer only supports service of type NodePort and LoadBalancer","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"} 15. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to LoadBalancer. Check the service again: $ oc get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ocs-metrics-exporter ClusterIP 172.30.30.232 <none> 8080/TCP,8081/TCP 36m ocs-provider-server LoadBalancer 172.30.104.23 ac72c186bbc7847669404e8b05d02e19-1874304719.us-east-2.elb.amazonaws.com 50051:31659/TCP 46m rook-ceph-exporter ClusterIP 172.30.107.181 <none> 9926/TCP 19m rook-ceph-mgr ClusterIP 172.30.111.135 <none> 9283/TCP 19m Additional info: Link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/27909/ OC version: Client Version: 4.10.24 Server Version: 4.13.0-0.nightly-2023-08-07-165810 Kubernetes Version: v1.26.6+73ac561 OCS version: ocs-operator.v4.13.2-rhodf OpenShift Container Storage 4.13.2-rhodf ocs-operator.v4.13.1-rhodf Installing Cluster version: NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-2023-08-07-165810 True False 111m Cluster version is 4.13.0-0.nightly-2023-08-07-165810 Rook version: rook: v4.13.2-0.b57f0c7db8116e754fc77b55825d7fd75c6f1aa3 go: go1.19.10 Ceph version: ceph version 17.2.6-100.el9cp (ea4e3ef8df2cf26540aae06479df031dcfc80343) quincy (stable)
We found an issue with the new ocs 4.11.10-1 image. In the case of NodePort Service, the ocs operator pod keeps requiring. We need to fix this from the ocs-operator side, backport the fix and test it again.