Bug 2213114
| Summary: | [Backport to 4.12.z]OCS Provider Server service comes up on public subnets | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Jilju Joy <jijoy> |
| Component: | ocs-operator | Assignee: | Rewant <resoni> |
| Status: | ON_QA --- | QA Contact: | Itzhak <ikave> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.12 | CC: | ebenahar, ikave, nigoyal, odf-bz-bot, resoni, sheggodu |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.12.6 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.12.6-3 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 2212773 | Environment: | |
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2212773, 2213117, 2218863, 2218867 | ||
| Bug Blocks: | |||
|
Description
Jilju Joy
2023-06-07 06:57:34 UTC
Giving devel ack on Rewant request I tested the BZ with the following steps:
1. Deploy an AWS 4.12 cluster without ODF.
2. Disable the default Red-hat operator:
$ oc patch operatorhub.config.openshift.io/cluster -p='{"spec":{"sources":[{"disabled":true,"name":"redhat-operators"}]}}' --type=merge
3. Get and apply ICPS from catalog image using the commands(in my local):
$ oc image extract --filter-by-os linux/amd64 --registry-config ~/IBMProjects/ocs-ci/data/pull-secret quay.io/rhceph-dev/ocs-registry:latest-stable-4.12 --confirm --path /icsp.yaml:~/IBMProjects/ocs-ci/icsp
$ oc apply -f ~/IBMProjects/ocs-ci/icsp/icsp.yaml
5. Wait for the MachineConfigPool to be ready.
$ oc get MachineConfigPool worker
6. Create the Namespace, CatalogSource, and Subscription using the Yaml file above: https://bugzilla.redhat.com/show_bug.cgi?id=2213114#c9.
$ oc apply -f ~/Downloads/deploy-with-olm.yaml
7. Wait until the ocs-operator pod is ready in the openshift-namespace.
8. Create the Storagecluster using the Yaml file above: https://bugzilla.redhat.com/show_bug.cgi?id=2213114#c10.
(If there is an issue with Noobaa CRDs, we may also need to apply this Yaml file https://raw.githubusercontent.com/red-hat-storage/mcg-osd-deployer/1eec1147b1ae70e938fa42dabc60453b8cd9449b/shim/crds/noobaa.noobaa.io.yaml). The field 'providerAPIServer' is empty in this Yaml file.
9. Check the pods:
$ oc get pods
NAME READY STATUS RESTARTS AGE
ocs-metrics-exporter-7567744868-zs9tm 1/1 Running 0 20m
ocs-operator-7b866f884c-zftfx 1/1 Running 6 (4m58s ago) 20m
rook-ceph-operator-7f74bd847c-2mptm 1/1 Running 6 (4m37s ago) 20m
10. Check the service and see that the provider server is NodePort as this is the default value:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ocs-provider-server NodePort 172.30.45.3 <none> 50051:31659/TCP 32s
11. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to LoadBalancer. Check the pods again:
oc get pods
NAME READY STATUS RESTARTS AGE
ocs-metrics-exporter-7567744868-zs9tm 1/1 Running 0 38m
ocs-operator-7b866f884c-8j9b6 1/1 Running 4 (3m54s ago) 12m
ocs-provider-server-55cc5d648d-w7g7t 1/1 Running 0 2m31s
rook-ceph-operator-7f74bd847c-2mptm 0/1 CrashLoopBackOff 8 (3m25s ago) 38m
12. Check the service type again:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ocs-metrics-exporter ClusterIP 172.30.39.247 <none> 8080/TCP,8081/TCP 24s
ocs-provider-server LoadBalancer 172.30.45.3 afb0aa7144fa2419bb35319aeb119af9-558404978.us-east-2.elb.amazonaws.com 50051:31659/TCP 10m
13. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to NodePort. Check the service again:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ocs-metrics-exporter ClusterIP 172.30.39.247 <none> 8080/TCP,8081/TCP 2m4s
ocs-provider-server NodePort 172.30.45.3 <none> 50051:31659/TCP 12m
14. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to a dummy value "bar". Check the ocs-operator logs and see the expected error:
$ oc logs ocs-operator-7b866f884c-8j9b6 | tail -n 1
{"level":"error","ts":1690887618.9425237,"msg":"Reconciler error","controller":"storagecluster","controllerGroup":"ocs.openshift.io","controllerKind":"StorageCluster","storageCluster":{"name":"ocs-storagecluster","namespace":"openshift-storage"},"namespace":"openshift-storage","name":"ocs-storagecluster","reconcileID":"778a3caf-9fef-4737-88cb-58ee8330d9fb","error":"providerAPIServer only supports service of type NodePort and LoadBalancer","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}
15. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to LoadBalancer. Check the service again:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ocs-metrics-exporter ClusterIP 172.30.39.247 <none> 8080/TCP,8081/TCP 29m
ocs-provider-server LoadBalancer 172.30.45.3 afb0aa7144fa2419bb35319aeb119af9-181750681.us-east-2.elb.amazonaws.com 50051:31659/TCP 39m
rook-ceph-mgr ClusterIP 172.30.184.49 <none> 9283/TCP 17m
Additional info:
Link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/27647/.
Versions:
OC version:
Client Version: 4.10.24
Server Version: 4.12.0-0.nightly-2023-07-31-091252
Kubernetes Version: v1.25.11+1485cc9
OCS version:
ocs-operator.v4.12.6-rhodf OpenShift Container Storage 4.12.6-rhodf ocs-operator.v4.12.5-rhodf Succeeded
Cluster version
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.0-0.nightly-2023-07-31-091252 True False 81m Cluster version is 4.12.0-0.nightly-2023-07-31-091252
Rook version:
rook: v4.12.6-0.bc1e9806c3281090b58872e303e947ff5437c078
go: go1.18.10
We found an issue with the new ocs 4.11.10-1 image. In the case of NodePort Service, the ocs operator pod keeps requiring. We need to fix this from the ocs-operator side, backport the fix and test it again. |