Bug 2213117
| Summary: | [Backport to 4.11.z]OCS Provider Server service comes up on public subnets | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Jilju Joy <jijoy> |
| Component: | ocs-operator | Assignee: | Rewant <resoni> |
| Status: | ON_QA --- | QA Contact: | Itzhak <ikave> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.11 | CC: | ebenahar, ikave, nigoyal, odf-bz-bot, resoni, sheggodu |
| Target Milestone: | --- | Keywords: | AutomationBackLog |
| Target Release: | ODF 4.11.10 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.11.10-3 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 2212773 | Environment: | |
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2212773, 2218863, 2218867 | ||
| Bug Blocks: | 2213114 | ||
|
Description
Jilju Joy
2023-06-07 06:58:27 UTC
Giving devel ack on Rewant request I tested the BZ with the following steps:
1. Deploy an AWS 4.11 cluster without ODF.
2. Disable the default Red-hat operator:
$ oc patch operatorhub.config.openshift.io/cluster -p='{"spec":{"sources":[{"disabled":true,"name":"redhat-operators"}]}}' --type=merge
3. Get and apply ICPS from catalog image using the commands(in my local):
$ oc image extract --filter-by-os linux/amd64 --registry-config ~/IBMProjects/ocs-ci/data/pull-secret quay.io/rhceph-dev/ocs-registry:latest-stable-4.11.10 --confirm --path /icsp.yaml:~/IBMProjects/ocs-ci/icsp
$ oc apply -f ~/IBMProjects/ocs-ci/icsp/icsp.yaml
5. Wait for the MachineConfigPool to be ready.
$ oc get MachineConfigPool worker
6. Create the Namespace, CatalogSource, and Subscription using the Yaml file above: https://bugzilla.redhat.com/show_bug.cgi?id=2213117#c7.
$ oc apply -f ~/Downloads/deploy-with-olm.yaml
7. Wait until the ocs-operator pod is ready in the openshift-namespace.
8. Create the Storagecluster using the Yaml file above: https://bugzilla.redhat.com/show_bug.cgi?id=2213117#c8.
(If there is an issue with Noobaa CRDs, we may also need to apply this Yaml file https://raw.githubusercontent.com/red-hat-storage/mcg-osd-deployer/1eec1147b1ae70e938fa42dabc60453b8cd9449b/shim/crds/noobaa.noobaa.io.yaml). The field 'providerAPIServer' is empty in this Yaml file.
9. Check the pods:
$ oc get pods
NAME READY STATUS RESTARTS AGE
noobaa-operator-76464cdd89-k6nst 1/1 Running 0 18m
ocs-metrics-exporter-77c684f4ff-w2mtv 1/1 Running 0 7m48s
ocs-operator-85b8d66b86-wpg49 1/1 Running 0 18m
rook-ceph-operator-764f4cbc78-66bm9 1/1 Running 0 18m
10. Check the service and see that the provider server is NodePort as this is the default value:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
noobaa-operator-service ClusterIP 172.30.37.82 <none> 443/TCP 12m
ocs-provider-server NodePort 172.30.137.131 <none> 50051:31659/TCP 4m41s
11. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to LoadBalancer. Check the pods again:
$ oc get pods
NAME READY STATUS RESTARTS AGE
noobaa-operator-76464cdd89-k6nst 1/1 Running 0 21m
ocs-metrics-exporter-77c684f4ff-w2mtv 1/1 Running 0 10m
ocs-operator-85b8d66b86-wpg49 1/1 Running 0 21m
ocs-provider-server-5d7f659cf-t42ms 1/1 Running 0 12s
rook-ceph-operator-764f4cbc78-66bm9 1/1 Running 0 21m
12. Check the service type again:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
noobaa-operator-service ClusterIP 172.30.37.82 <none> 443/TCP 17m
ocs-provider-server LoadBalancer 172.30.137.131 a8225434e09614eda83bfe20c964088b-2002374777.us-east-2.elb.amazonaws.com 50051:31659/TCP 9m16s
13. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to NodePort. Check the service again:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
noobaa-operator-service ClusterIP 172.30.37.82 <none> 443/TCP 57m
ocs-provider-server NodePort 172.30.137.131 <none> 50051:31659/TCP 49m
14. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to a dummy value "someValue". Check the ocs-operator logs and see the expected error:
$ oc logs ocs-operator-85b8d66b86-wpg49 | tail -n 1
{"level":"error","ts":1690802919.280578,"logger":"controller.storagecluster","msg":"Reconciler error","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"providerAPIServer only supports service of type NodePort and LoadBalancer","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
15. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to LoadBalancer. Check the service again:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
noobaa-operator-service ClusterIP 172.30.37.82 <none> 443/TCP 131m
ocs-provider-server LoadBalancer 172.30.137.131 a8225434e09614eda83bfe20c964088b-1866348329.us-east-2.elb.amazonaws.com 50051:31659/TCP 123m
Additional info:
Link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/27592/.
Versions:
OC version:
Client Version: 4.10.24
Server Version: 4.11.0-0.nightly-2023-07-29-013834
Kubernetes Version: v1.24.15+a9da4a8
OCS version:
ocs-operator.v4.11.10 OpenShift Container Storage 4.11.10 ocs-operator.v4.11.9 Succeeded
Cluster version
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2023-07-29-013834 True False 74m Cluster version is 4.11.0-0.nightly-2023-07-29-013834
We found an issue with the new ocs 4.11.10-1 image. In the case of NodePort Service, the ocs operator pod keeps requiring. We need to fix this from the ocs-operator side, backport the fix and test it again. |