Bug 2218863
| Summary: | [Backport to 4.13.z]OCS Provider Server service comes up on public subnets | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Jilju Joy <jijoy> |
| Component: | ocs-operator | Assignee: | Rewant <resoni> |
| Status: | ON_QA --- | QA Contact: | Itzhak <ikave> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.13 | CC: | ebenahar, ikave, muagarwa, nigoyal, odf-bz-bot, resoni, sheggodu |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.13.2 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.13.2-3 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 2212773 | Environment: | |
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2212773, 2218867 | ||
| Bug Blocks: | 2213114, 2213117 | ||
|
Description
Jilju Joy
2023-06-30 10:17:19 UTC
*** Bug 2218867 has been marked as a duplicate of this bug. *** Giving devel ack on Rewant request I tested the BZ with the following steps:
1. Deploy an AWS 4.13 cluster without ODF.
2. Disable the default Red-hat operator:
$ oc patch operatorhub.config.openshift.io/cluster -p='{"spec":{"sources":[{"disabled":true,"name":"redhat-operators"}]}}' --type=merge
3. Get and apply ICPS from catalog image using the commands(in my local):
$ oc image extract --filter-by-os linux/amd64 --registry-config ~/IBMProjects/ocs-ci/data/pull-secret quay.io/rhceph-dev/ocs-registry:latest-stable-4.13 --confirm --path /icsp.yaml:~/IBMProjects/ocs-ci/icsp
$ oc apply -f ~/IBMProjects/ocs-ci/icsp/icsp.yaml
5. Wait for the MachineConfigPool to be ready.
$ oc get MachineConfigPool worker
6. Create the Namespace, CatalogSource, and Subscription using the Yaml file above: https://bugzilla.redhat.com/show_bug.cgi?id=2218863#c9.
$ oc apply -f ~/Downloads/deploy-with-olm.yaml
7. Wait until the ocs-operator pod is ready in the openshift-namespace.
8. Create the Storagecluster using the Yaml file above: https://bugzilla.redhat.com/show_bug.cgi?id=2218863#c10.
(If there is an issue with Noobaa CRDs, we may also need to apply this Yaml file https://raw.githubusercontent.com/red-hat-storage/mcg-osd-deployer/1eec1147b1ae70e938fa42dabc60453b8cd9449b/shim/crds/noobaa.noobaa.io.yaml). The field 'providerAPIServer' is empty in this Yaml file.
9. Check the pods:
$ oc get pods
NAME READY STATUS RESTARTS AGE
ocs-metrics-exporter-c67ff7957-v8p5b 1/1 Running 0 21m
ocs-operator-5f569b66cf-2dkcz 1/1 Running 5 (9m33s ago) 22m
rook-ceph-operator-7757dffc4c-ccdb5 1/1 Running 3 (77s ago) 7m42s
10. Check the service and see that the provider server is NodePort as this is the default value:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ocs-provider-server NodePort 172.30.104.23 <none> 50051:31659/TCP 8m3s
11. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to LoadBalancer. Check the pods again:
$ oc get pods | grep ocs
ocs-metrics-exporter-c67ff7957-v8p5b 1/1 Running 0 25m
ocs-operator-5f569b66cf-2dkcz 1/1 Running 5 (13m ago) 26m
ocs-provider-server-599c77c4db-2jm6t 1/1 Running 0 99s
12. Check the service type again:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ocs-metrics-exporter ClusterIP 172.30.30.232 <none> 8080/TCP,8081/TCP 2m3s
ocs-provider-server LoadBalancer 172.30.104.23 ac72c186bbc7847669404e8b05d02e19-487159474.us-east-2.elb.amazonaws.com 50051:31659/TCP 12m
13. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to NodePort. Check the service again:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ocs-metrics-exporter ClusterIP 172.30.30.232 <none> 8080/TCP,8081/TCP 4m
ocs-provider-server NodePort 172.30.104.23 <none> 50051:31659/TCP 14m
14. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to a dummy value "someValue". Check the ocs-operator logs and see the expected error:
$ oc logs ocs-operator-5f569b66cf-2dkcz | tail -n 1
{"level":"error","ts":"2023-08-08T10:20:50Z","msg":"Reconciler error","controller":"storagecluster","controllerGroup":"ocs.openshift.io","controllerKind":"StorageCluster","StorageCluster":{"name":"ocs-storagecluster","namespace":"openshift-storage"},"namespace":"openshift-storage","name":"ocs-storagecluster","reconcileID":"c02bb1f2-ddf9-46a0-a986-bef98c60a07f","error":"providerAPIServer only supports service of type NodePort and LoadBalancer","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
15. Edit the ocs-storagecluster and change the value of "providerAPIServerServiceType" to LoadBalancer. Check the service again:
$ oc get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ocs-metrics-exporter ClusterIP 172.30.30.232 <none> 8080/TCP,8081/TCP 36m
ocs-provider-server LoadBalancer 172.30.104.23 ac72c186bbc7847669404e8b05d02e19-1874304719.us-east-2.elb.amazonaws.com 50051:31659/TCP 46m
rook-ceph-exporter ClusterIP 172.30.107.181 <none> 9926/TCP 19m
rook-ceph-mgr ClusterIP 172.30.111.135 <none> 9283/TCP 19m
Additional info:
Link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/27909/
OC version:
Client Version: 4.10.24
Server Version: 4.13.0-0.nightly-2023-08-07-165810
Kubernetes Version: v1.26.6+73ac561
OCS version:
ocs-operator.v4.13.2-rhodf OpenShift Container Storage 4.13.2-rhodf ocs-operator.v4.13.1-rhodf Installing
Cluster version:
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.13.0-0.nightly-2023-08-07-165810 True False 111m Cluster version is 4.13.0-0.nightly-2023-08-07-165810
Rook version:
rook: v4.13.2-0.b57f0c7db8116e754fc77b55825d7fd75c6f1aa3
go: go1.19.10
Ceph version:
ceph version 17.2.6-100.el9cp (ea4e3ef8df2cf26540aae06479df031dcfc80343) quincy (stable)
We found an issue with the new ocs 4.11.10-1 image. In the case of NodePort Service, the ocs operator pod keeps requiring. We need to fix this from the ocs-operator side, backport the fix and test it again. |