Description of problem: In ODF v4.10.z the providerService is of type nodePort so the consumer can give any worker nodeIP:31659 to connect with the provider. But in ODF v4.11.z this service is changed to the service of the type of loadBalancer where the consumers will need to update the storageProviderEndpoint to hostname:50051. The consumers won't be able to reach out to the provider unless the storageProviderEndpoint is updated. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Install ODF-MS Provider addon 2. Install ODF-MS Consumer addon 3. Update the ODF version to v4.11.0 on both clusters Actual results: Consumer cannot connect with Provider Expected results: Consumer should be able to connect with Provider Additional info:
Background: Earlier in 4.10, we were using node port svc which was pruned to failures and we changed it to load balancer which creates another problem as soon as the provider is upgraded consumers will lose the connection and someone needs to change the ep in the consumers. To get out of the above situation we are now creating both the SVC and consumers will stay connected to the provider. It will also give a good amount of time to consumers to update their EP to the load balancer svc. We will deprecate node port SVC in the future. Verification: As soon as we upgrade the provider, a consumer should not lose the connection with the provider. We should have 2 svc one with node port and one with a load balancer on the provider. We should be able to connect to the provider after changing the EP (load balancer) in the consumer.
Tested upgrade to ODF 4.11.0. Provider storagecluster is not Ready after upgrade. BEFORE upgrading the provider and consumer to ODF 4.11: ODF 4.10.2-3 OCP 4.10.18 From provider cluster: $ oc -n openshift-storage get storagecluster -o yaml| grep -i Endpoint storageProviderEndpoint: 10.0.138.52:31659 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-134-89.ec2.internal Ready master 4h43m v1.23.5+3afdacb 10.0.134.89 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-138-52.ec2.internal Ready worker 4h37m v1.23.5+3afdacb 10.0.138.52 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-139-224.ec2.internal Ready infra,worker 4h26m v1.23.5+3afdacb 10.0.139.224 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-150-162.ec2.internal Ready master 4h43m v1.23.5+3afdacb 10.0.150.162 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-150-242.ec2.internal Ready infra,worker 4h26m v1.23.5+3afdacb 10.0.150.242 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-155-183.ec2.internal Ready worker 4h37m v1.23.5+3afdacb 10.0.155.183 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-161-252.ec2.internal Ready worker 4h37m v1.23.5+3afdacb 10.0.161.252 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-164-146.ec2.internal Ready master 4h43m v1.23.5+3afdacb 10.0.164.146 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-173-63.ec2.internal Ready infra,worker 4h26m v1.23.5+3afdacb 10.0.173.63 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 $ oc -n openshift-storage get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES addon-ocs-provider-qe-catalog-7btmx 1/1 Running 0 4h17m 10.131.0.20 ip-10-0-138-52.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-0 2/2 Running 0 4h17m 10.131.0.19 ip-10-0-138-52.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-1 2/2 Running 0 4h14m 10.128.2.20 ip-10-0-155-183.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-2 2/2 Running 0 4h14m 10.128.2.26 ip-10-0-155-183.ec2.internal <none> <none> csi-addons-controller-manager-b4495976c-b966j 2/2 Running 0 4h24m 10.131.0.9 ip-10-0-138-52.ec2.internal <none> <none> ocs-metrics-exporter-5dcf6f88df-m55j7 1/1 Running 0 4h14m 10.128.2.25 ip-10-0-155-183.ec2.internal <none> <none> ocs-operator-5985b8b5f4-zjpcw 1/1 Running 0 4h14m 10.128.2.13 ip-10-0-155-183.ec2.internal <none> <none> ocs-osd-controller-manager-688ddc4cfb-9xp2h 3/3 Running 0 4h17m 10.131.0.13 ip-10-0-138-52.ec2.internal <none> <none> ocs-provider-server-766b88c486-d2dmv 1/1 Running 0 4h14m 10.128.2.8 ip-10-0-155-183.ec2.internal <none> <none> odf-console-58f6b6f5bb-sx4b4 1/1 Running 0 4h14m 10.128.2.12 ip-10-0-155-183.ec2.internal <none> <none> odf-operator-controller-manager-584df64f8-4zjm6 2/2 Running 0 4h17m 10.131.0.21 ip-10-0-138-52.ec2.internal <none> <none> prometheus-managed-ocs-prometheus-0 3/3 Running 0 4h17m 10.131.0.14 ip-10-0-138-52.ec2.internal <none> <none> prometheus-operator-8547cc9f89-ksfdt 1/1 Running 0 4h17m 10.131.0.16 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-crashcollector-ip-10-0-138-52.ec2.internal-5c9c6rzb2n 1/1 Running 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-crashcollector-ip-10-0-155-183.ec2.internal-8578jqp8r 1/1 Running 0 4h16m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-crashcollector-ip-10-0-161-252.ec2.internal-779fqh62v 1/1 Running 0 4h12m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-59b95ccfwjktw 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-58995c4dznfb7 2/2 Running 0 4h14m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-mgr-a-854875bc7c-bmgjt 2/2 Running 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-mon-a-5d5874b99-cp2kg 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-mon-b-684d77d5cf-hjl5q 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-mon-c-6d4cf68965-hx6nh 2/2 Running 0 4h23m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-operator-5678fcf74-9j8r6 1/1 Running 0 4h14m 10.128.2.24 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-osd-0-5c85bd4675-fhfr7 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-osd-1-dfcb97c8d-2mrd4 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-osd-10-57f4f48547-xlqq8 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-11-df6d566cd-r42gw 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-12-7d5ddf4566-dlhc6 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-13-dbf878d66-hkkqr 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-14-858657646-t4wmr 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-2-7795548cc8-dtvdm 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-osd-3-5c4896ff45-wsc6v 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-osd-4-7b59fd5b6b-nfgjm 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-osd-5-89d747ccd-z6hbn 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-osd-6-756ff79568-pqldc 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-osd-7-7546c6df7-gdw92 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-osd-8-74bf4c4954-fhgp9 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-osd-9-77568f5856-fgv59 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-osd-prepare-default-0-data-0hsbn5-hk4tf 0/1 Completed 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-prepare-default-0-data-357khh-z89m7 0/1 Completed 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-prepare-default-1-data-1mj7vb-nptmt 0/1 Completed 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-prepare-default-1-data-46tzst-jbmf7 0/1 Completed 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-prepare-default-2-data-2vphcr-dhs4v 0/1 Completed 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-tools-79ccc8ddc5-rggt4 1/1 Running 0 4h14m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.4 NooBaa Operator 4.10.4 mcg-operator.v4.10.3 Succeeded ocs-operator.v4.10.2 OpenShift Container Storage 4.10.2 ocs-operator.v4.10.1 Succeeded ocs-osd-deployer.v2.0.2 OCS OSD Deployer 2.0.2 ocs-osd-deployer.v2.0.1 Succeeded odf-csi-addons-operator.v4.10.4 CSI Addons 4.10.4 odf-csi-addons-operator.v4.10.2 Succeeded odf-operator.v4.10.2 OpenShift Data Foundation 4.10.2 odf-operator.v4.10.1 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.18 True False 5h5m Cluster version is 4.10.18 $ oc get storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 4h29m Ready 2022-06-29T08:23:04Z $ oc get storagesystem NAME STORAGE-SYSTEM-KIND STORAGE-SYSTEM-NAME ocs-storagecluster-storagesystem storagecluster.ocs.openshift.io/v1 ocs-storagecluster $ oc get cephcluster NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL ocs-storagecluster-cephcluster /var/lib/rook 3 4h29m Ready Cluster created successfully HEALTH_OK $ oc get managedocs -o yaml apiVersion: v1 items: - apiVersion: ocs.openshift.io/v1alpha1 kind: ManagedOCS metadata: creationTimestamp: "2022-06-29T08:22:51Z" finalizers: - managedocs.ocs.openshift.io generation: 1 name: managedocs namespace: openshift-storage resourceVersion: "84251" uid: 2d609aa9-6fbd-49cd-9ab3-d3f41001ebb5 spec: {} status: components: alertmanager: state: Ready prometheus: state: Ready storageCluster: state: Ready reconcileStrategy: strict kind: List metadata: resourceVersion: "" selfLink: "" $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE addon-ocs-provider-qe-catalog ClusterIP 172.30.92.188 <none> 50051/TCP 4h35m alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 4h33m csi-addons-controller-manager-metrics-service ClusterIP 172.30.96.99 <none> 8443/TCP 4h32m noobaa-operator-service ClusterIP 172.30.17.229 <none> 443/TCP 4h33m ocs-metrics-exporter ClusterIP 172.30.186.247 <none> 8080/TCP,8081/TCP 4h32m ocs-osd-controller-manager-metrics-service ClusterIP 172.30.238.148 <none> 8443/TCP 4h33m ocs-provider-server NodePort 172.30.117.57 <none> 50051:31659/TCP 4h33m odf-console-service ClusterIP 172.30.92.91 <none> 9001/TCP 4h33m odf-operator-controller-manager-metrics-service ClusterIP 172.30.254.88 <none> 8443/TCP 4h34m prometheus ClusterIP 172.30.201.202 <none> 9339/TCP 4h33m prometheus-operated ClusterIP None <none> 9090/TCP 4h33m rook-ceph-mgr ClusterIP 172.30.44.21 <none> 9283/TCP 4h27m $ oc get svc ocs-provider-server -o yaml apiVersion: v1 kind: Service metadata: annotations: service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1656489985 service.beta.openshift.io/serving-cert-secret-name: ocs-provider-server-cert service.beta.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1656489985 creationTimestamp: "2022-06-29T08:23:05Z" name: ocs-provider-server namespace: openshift-storage ownerReferences: - apiVersion: ocs.openshift.io/v1 kind: StorageCluster name: ocs-storagecluster uid: 112f14d9-024e-4053-a117-04ac0fc348ee resourceVersion: "40080" uid: ae7338b2-8412-4369-a8eb-c2918945f3aa spec: clusterIP: 172.30.117.57 clusterIPs: - 172.30.117.57 externalTrafficPolicy: Cluster internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - nodePort: 31659 port: 50051 protocol: TCP targetPort: ocs-provider selector: app: ocsProviderApiServer sessionAffinity: None type: NodePort status: loadBalancer: {} ------------------------------------------------------------------------------------- From consumer cluster: $ oc -n openshift-storage get storagecluster -o yaml| grep -i Endpoint storageProviderEndpoint: 10.0.138.52:31659 $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-128-172.ec2.internal Ready infra,worker 3h4m v1.23.5+3afdacb 10.0.128.172 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-140-201.ec2.internal Ready worker 3h18m v1.23.5+3afdacb 10.0.140.201 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-142-150.ec2.internal Ready master 3h23m v1.23.5+3afdacb 10.0.142.150 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-149-134.ec2.internal Ready infra,worker 3h4m v1.23.5+3afdacb 10.0.149.134 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-150-132.ec2.internal Ready worker 3h17m v1.23.5+3afdacb 10.0.150.132 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-158-115.ec2.internal Ready master 3h23m v1.23.5+3afdacb 10.0.158.115 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-165-8.ec2.internal Ready master 3h24m v1.23.5+3afdacb 10.0.165.8 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-173-10.ec2.internal Ready infra,worker 3h5m v1.23.5+3afdacb 10.0.173.10 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 ip-10-0-174-54.ec2.internal Ready worker 3h18m v1.23.5+3afdacb 10.0.174.54 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8 $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES 08aa0deaba969e7904ad889667c93cc277552a20b17685c3beb6e478fe572dd 0/1 Completed 0 3h4m 10.131.0.31 ip-10-0-174-54.ec2.internal <none> <none> 1dcc34b3a106d396dc409ff46b9d0db7fbac8634502fc768b71230b464vzxk8 0/1 Completed 0 3h4m 10.131.0.34 ip-10-0-174-54.ec2.internal <none> <none> 3bad4d15272db3fa9a7f04749a3b88f88091663dc8d1e7454b68a1c5e9jb6mw 0/1 Completed 0 3h4m 10.131.0.32 ip-10-0-174-54.ec2.internal <none> <none> 6e9a6d05bebac324419c47259d443223a75858ee9e6eb87751b0ddb24b278gg 0/1 Completed 0 3h4m 10.131.0.33 ip-10-0-174-54.ec2.internal <none> <none> a0d6d7ea93ef0f905e0d25c9e9506251b53905213368078caad6aee4bc8p9vm 0/1 Completed 0 3h4m 10.131.0.35 ip-10-0-174-54.ec2.internal <none> <none> addon-ocs-consumer-qe-catalog-68v92 1/1 Running 0 3h4m 10.131.0.28 ip-10-0-174-54.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-0 2/2 Running 0 3h3m 10.128.2.9 ip-10-0-140-201.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-1 2/2 Running 0 3h3m 10.128.2.10 ip-10-0-140-201.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-2 2/2 Running 0 3h3m 10.128.2.11 ip-10-0-140-201.ec2.internal <none> <none> csi-addons-controller-manager-b4495976c-fgtks 2/2 Running 0 3h1m 10.131.0.50 ip-10-0-174-54.ec2.internal <none> <none> csi-cephfsplugin-2v5xz 3/3 Running 0 3h2m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none> csi-cephfsplugin-6gd74 3/3 Running 0 3h2m 10.0.140.201 ip-10-0-140-201.ec2.internal <none> <none> csi-cephfsplugin-d58fm 3/3 Running 3 3h2m 10.0.150.132 ip-10-0-150-132.ec2.internal <none> <none> csi-cephfsplugin-provisioner-599bbfcd9-5mkzd 6/6 Running 0 3h2m 10.128.2.16 ip-10-0-140-201.ec2.internal <none> <none> csi-cephfsplugin-provisioner-599bbfcd9-wfcv2 6/6 Running 0 3h2m 10.131.0.49 ip-10-0-174-54.ec2.internal <none> <none> csi-rbdplugin-5s42s 4/4 Running 0 3h2m 10.0.140.201 ip-10-0-140-201.ec2.internal <none> <none> csi-rbdplugin-cs8mn 4/4 Running 4 3h2m 10.0.150.132 ip-10-0-150-132.ec2.internal <none> <none> csi-rbdplugin-provisioner-86755fff69-cwd5f 7/7 Running 0 3h2m 10.128.2.15 ip-10-0-140-201.ec2.internal <none> <none> csi-rbdplugin-provisioner-86755fff69-jwhpw 7/7 Running 0 3h2m 10.131.0.48 ip-10-0-174-54.ec2.internal <none> <none> csi-rbdplugin-t86r5 4/4 Running 0 3h2m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none> ocs-metrics-exporter-5dcf6f88df-gpqj6 1/1 Running 0 3h2m 10.128.2.13 ip-10-0-140-201.ec2.internal <none> <none> ocs-operator-5985b8b5f4-dlmr8 1/1 Running 0 3h2m 10.131.0.45 ip-10-0-174-54.ec2.internal <none> <none> ocs-osd-controller-manager-5bb548944-znb4v 3/3 Running 0 3h3m 10.131.0.37 ip-10-0-174-54.ec2.internal <none> <none> odf-console-58f6b6f5bb-jc8dd 1/1 Running 0 3h3m 10.131.0.43 ip-10-0-174-54.ec2.internal <none> <none> odf-operator-controller-manager-584df64f8-n5m52 2/2 Running 0 3h3m 10.131.0.36 ip-10-0-174-54.ec2.internal <none> <none> prometheus-managed-ocs-prometheus-0 3/3 Running 0 3h3m 10.128.2.8 ip-10-0-140-201.ec2.internal <none> <none> prometheus-operator-8547cc9f89-jvwwx 1/1 Running 0 3h3m 10.131.0.42 ip-10-0-174-54.ec2.internal <none> <none> redhat-operators-t5hqt 1/1 Running 0 3h4m 10.131.0.30 ip-10-0-174-54.ec2.internal <none> <none> rook-ceph-operator-5678fcf74-z2mvt 1/1 Running 0 3h2m 10.128.2.12 ip-10-0-140-201.ec2.internal <none> <none> rook-ceph-tools-7cfb87c645-vlffd 1/1 Running 0 162m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none> $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.4 NooBaa Operator 4.10.4 mcg-operator.v4.10.3 Succeeded ocs-operator.v4.10.2 OpenShift Container Storage 4.10.2 ocs-operator.v4.10.1 Succeeded ocs-osd-deployer.v2.0.2 OCS OSD Deployer 2.0.2 ocs-osd-deployer.v2.0.1 Succeeded odf-csi-addons-operator.v4.10.4 CSI Addons 4.10.4 odf-csi-addons-operator.v4.10.2 Succeeded odf-operator.v4.10.2 OpenShift Data Foundation 4.10.2 odf-operator.v4.10.1 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.18 True False 3h41m Cluster version is 4.10.18 $ oc get storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 3h3m Ready true 2022-06-29T09:48:56Z $ oc get storagesystem NAME STORAGE-SYSTEM-KIND STORAGE-SYSTEM-NAME ocs-storagecluster-storagesystem storagecluster.ocs.openshift.io/v1 ocs-storagecluster $ oc get cephcluster NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL ocs-storagecluster-cephcluster 3h4m Connected Cluster connected successfully HEALTH_OK true $ oc get managedocs -o yaml apiVersion: v1 items: - apiVersion: ocs.openshift.io/v1alpha1 kind: ManagedOCS metadata: creationTimestamp: "2022-06-29T09:48:37Z" finalizers: - managedocs.ocs.openshift.io generation: 1 name: managedocs namespace: openshift-storage resourceVersion: "50222" uid: 9cbce1e2-9e61-4054-b54d-9cd70fb78175 spec: {} status: components: alertmanager: state: Ready prometheus: state: Ready storageCluster: state: Ready reconcileStrategy: strict kind: List metadata: resourceVersion: "" selfLink: "" $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE addon-ocs-consumer-qe-catalog ClusterIP 172.30.123.223 <none> 50051/TCP 3h9m alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 3h8m csi-addons-controller-manager-metrics-service ClusterIP 172.30.156.17 <none> 8443/TCP 3h7m csi-cephfsplugin-metrics ClusterIP 172.30.167.58 <none> 8080/TCP,8081/TCP 3h7m csi-rbdplugin-metrics ClusterIP 172.30.242.248 <none> 8080/TCP,8081/TCP 3h7m noobaa-operator-service ClusterIP 172.30.107.87 <none> 443/TCP 3h8m ocs-metrics-exporter ClusterIP 172.30.127.9 <none> 8080/TCP,8081/TCP 3h7m ocs-osd-controller-manager-metrics-service ClusterIP 172.30.124.249 <none> 8443/TCP 3h9m odf-console-service ClusterIP 172.30.162.31 <none> 9001/TCP 3h8m odf-operator-controller-manager-metrics-service ClusterIP 172.30.146.98 <none> 8443/TCP 3h9m prometheus ClusterIP 172.30.28.30 <none> 9339/TCP 3h8m prometheus-operated ClusterIP None <none> 9090/TCP 3h8m redhat-operators ClusterIP 172.30.151.142 <none> 50051/TCP 3h9m rook-ceph-mgr-external ClusterIP 172.30.167.4 <none> 9283/TCP 3h7m ============================================================================================================= ============================================================================================================= AFTER upgrading the provider cluster to ODF 4.11 From provider cluster: $ oc logs ocs-operator-6c75d4bc49-7pk72 --tail 10 {"level":"info","ts":1656510972.439494,"logger":"controllers.StorageCluster","msg":"Service create/update succeeded","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"} {"level":"info","ts":1656510972.4395082,"logger":"controllers.StorageCluster","msg":"status.storageProviderEndpoint is updated","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","Endpoint":"aae7338b284124369a8ebc2918945f3a-1835295265.us-east-1.elb.amazonaws.com:50051"} {"level":"error","ts":1656510972.4534578,"logger":"controllers.StorageCluster","msg":"Failed to create/update service","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","Name":"ocs-provider-server-node-port-svc","error":"Service \"ocs-provider-server-node-port-svc\" is invalid: spec.ports[0].nodePort: Invalid value: 31659: provided port is already allocated","stacktrace":"github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*ocsProviderServer).ensureCreated\n\t/remote-source/app/controllers/storagecluster/provider_server.go:63\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).reconcilePhases\n\t/remote-source/app/controllers/storagecluster/reconcile.go:411\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).Reconcile\n\t/remote-source/app/controllers/storagecluster/reconcile.go:161\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"} {"level":"error","ts":1656510972.4705205,"logger":"controller.storagecluster","msg":"Reconciler error","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"Service \"ocs-provider-server-node-port-svc\" is invalid: spec.ports[0].nodePort: Invalid value: 31659: provided port is already allocated","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"} {"level":"info","ts":1656511034.934945,"logger":"controllers.StorageCluster","msg":"Reconciling StorageCluster.","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","StorageCluster":{"name":"ocs-storagecluster","namespace":"openshift-storage"}} {"level":"info","ts":1656511034.9349763,"logger":"controllers.StorageCluster","msg":"Spec.AllowRemoteStorageConsumers is enabled. Creating Provider API resources","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"} {"level":"info","ts":1656511034.9423943,"logger":"controllers.StorageCluster","msg":"Service create/update succeeded","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"} {"level":"info","ts":1656511034.94241,"logger":"controllers.StorageCluster","msg":"status.storageProviderEndpoint is updated","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","Endpoint":"aae7338b284124369a8ebc2918945f3a-1835295265.us-east-1.elb.amazonaws.com:50051"} {"level":"error","ts":1656511034.9563773,"logger":"controllers.StorageCluster","msg":"Failed to create/update service","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","Name":"ocs-provider-server-node-port-svc","error":"Service \"ocs-provider-server-node-port-svc\" is invalid: spec.ports[0].nodePort: Invalid value: 31659: provided port is already allocated","stacktrace":"github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*ocsProviderServer).ensureCreated\n\t/remote-source/app/controllers/storagecluster/provider_server.go:63\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).reconcilePhases\n\t/remote-source/app/controllers/storagecluster/reconcile.go:411\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).Reconcile\n\t/remote-source/app/controllers/storagecluster/reconcile.go:161\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"} {"level":"error","ts":1656511034.9755225,"logger":"controller.storagecluster","msg":"Reconciler error","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"Service \"ocs-provider-server-node-port-svc\" is invalid: spec.ports[0].nodePort: Invalid value: 31659: provided port is already allocated","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"} $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.11.0 NooBaa Operator 4.11.0 mcg-operator.v4.10.4 Succeeded ocs-operator.v4.11.0 OpenShift Container Storage 4.11.0 ocs-operator.v4.10.4 Succeeded ocs-osd-deployer.v2.0.2 OCS OSD Deployer 2.0.2 ocs-osd-deployer.v2.0.1 Installing odf-csi-addons-operator.v4.11.0 CSI Addons 4.11.0 odf-csi-addons-operator.v4.10.4 Succeeded odf-operator.v4.11.0 OpenShift Data Foundation 4.11.0 odf-operator.v4.10.2 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded $ oc get storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 5h38m Error 2022-06-29T08:23:04Z $ oc get csv odf-operator.v4.11.0 -o yaml | grep full_version full_version: 4.11.0-107 $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES addon-ocs-provider-qe-catalog-mztvf 1/1 Running 0 62m 10.129.2.33 ip-10-0-161-252.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-0 2/2 Running 0 5h30m 10.131.0.19 ip-10-0-138-52.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-1 2/2 Running 0 5h28m 10.128.2.20 ip-10-0-155-183.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-2 2/2 Running 0 5h28m 10.128.2.26 ip-10-0-155-183.ec2.internal <none> <none> csi-addons-controller-manager-6bdb87bf84-kqmnw 2/2 Running 0 24m 10.129.2.62 ip-10-0-161-252.ec2.internal <none> <none> ocs-metrics-exporter-599b56c475-vvvk9 1/1 Running 0 23m 10.129.2.69 ip-10-0-161-252.ec2.internal <none> <none> ocs-operator-6c75d4bc49-7pk72 1/1 Running 0 24m 10.129.2.65 ip-10-0-161-252.ec2.internal <none> <none> ocs-osd-controller-manager-6cbb8889fc-6c6qg 2/3 Running 0 28m 10.129.2.38 ip-10-0-161-252.ec2.internal <none> <none> ocs-provider-server-6fff49c89c-l748v 1/1 Running 0 26m 10.129.2.48 ip-10-0-161-252.ec2.internal <none> <none> odf-console-6f84b6444c-22xlb 1/1 Running 0 27m 10.129.2.42 ip-10-0-161-252.ec2.internal <none> <none> odf-operator-controller-manager-5d975d6485-lch8k 2/2 Running 0 27m 10.129.2.41 ip-10-0-161-252.ec2.internal <none> <none> prometheus-managed-ocs-prometheus-0 3/3 Running 0 5h30m 10.131.0.14 ip-10-0-138-52.ec2.internal <none> <none> prometheus-operator-8547cc9f89-ksfdt 1/1 Running 0 5h30m 10.131.0.16 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-crashcollector-ip-10-0-138-52.ec2.internal-7cdcbtdkn9 1/1 Running 0 23m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-crashcollector-ip-10-0-155-183.ec2.internal-7cd8jcv4l 1/1 Running 0 23m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-crashcollector-ip-10-0-161-252.ec2.internal-56c7n224z 1/1 Running 0 23m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6fdd776dcrrxz 2/2 Running 0 21m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-78994f45bgz2g 2/2 Running 0 21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-mgr-a-656d959cc7-zrtqb 2/2 Running 0 20m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-mon-a-644fbf869b-kbjmk 2/2 Running 0 21m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-mon-b-6795596bc6-665s6 2/2 Running 0 22m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-mon-c-74669f8648-sq724 2/2 Running 0 22m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-operator-94b7f48d-c67mx 1/1 Running 0 24m 10.129.2.64 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-osd-0-85dd9c45fc-zgskb 2/2 Running 0 19m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-osd-1-587b76f58-vfhtj 2/2 Running 0 19m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-osd-10-76447fb585-xv869 2/2 Running 0 19m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-11-5d6bb4f9b-tqt47 2/2 Running 0 19m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-12-7d8f877cdd-znjpw 2/2 Running 0 19m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-13-869dc74c84-wvf6j 2/2 Running 0 19m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-14-557c496d88-985bz 2/2 Running 0 19m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-2-5769747688-bqjhw 2/2 Running 0 19m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-osd-3-64cf6cd755-lvcsx 2/2 Running 0 19m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-osd-4-7544d7565b-x9nt4 2/2 Running 0 18m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-osd-5-5f6f876d7b-brsqn 2/2 Running 0 18m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-osd-6-7b9c8d869d-5xkkm 2/2 Running 0 18m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-osd-7-6d46bd6c95-r7ff8 2/2 Running 0 18m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-osd-8-84df8c57b-zjjlt 2/2 Running 0 18m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none> rook-ceph-osd-9-5d78db49db-jlgt7 2/2 Running 0 19m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none> rook-ceph-osd-prepare-default-0-data-0hsbn5-hk4tf 0/1 Completed 0 5h34m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-prepare-default-0-data-357khh-z89m7 0/1 Completed 0 5h34m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-prepare-default-1-data-1mj7vb-nptmt 0/1 Completed 0 5h34m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-prepare-default-1-data-46tzst-jbmf7 0/1 Completed 0 5h34m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-osd-prepare-default-2-data-2vphcr-dhs4v 0/1 Completed 0 5h34m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none> rook-ceph-tools-75c98bc644-hxgt8 1/1 Running 0 23m 10.129.2.67 ip-10-0-161-252.ec2.internal <none> <none> $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE addon-ocs-provider-qe-catalog ClusterIP 172.30.92.188 <none> 50051/TCP 5h41m alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 5h40m csi-addons-controller-manager-metrics-service ClusterIP 172.30.96.99 <none> 8443/TCP 5h39m ocs-metrics-exporter ClusterIP 172.30.186.247 <none> 8080/TCP,8081/TCP 5h39m ocs-osd-controller-manager-metrics-service ClusterIP 172.30.238.148 <none> 8443/TCP 5h40m ocs-provider-server LoadBalancer 172.30.117.57 aae7338b284124369a8ebc2918945f3a-1835295265.us-east-1.elb.amazonaws.com 50051:31659/TCP 5h40m odf-console-service ClusterIP 172.30.92.91 <none> 9001/TCP 5h40m odf-operator-controller-manager-metrics-service ClusterIP 172.30.254.88 <none> 8443/TCP 5h41m prometheus ClusterIP 172.30.201.202 <none> 9339/TCP 5h40m prometheus-operated ClusterIP None <none> 9090/TCP 5h40m rook-ceph-mgr ClusterIP 172.30.44.21 <none> 9283/TCP 5h34m $ oc get svc ocs-provider-server -o yaml apiVersion: v1 kind: Service metadata: annotations: service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1656489985 service.beta.openshift.io/serving-cert-secret-name: ocs-provider-server-cert service.beta.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1656489985 creationTimestamp: "2022-06-29T08:23:05Z" finalizers: - service.kubernetes.io/load-balancer-cleanup name: ocs-provider-server namespace: openshift-storage ownerReferences: - apiVersion: ocs.openshift.io/v1 kind: StorageCluster name: ocs-storagecluster uid: 112f14d9-024e-4053-a117-04ac0fc348ee resourceVersion: "309780" uid: ae7338b2-8412-4369-a8eb-c2918945f3aa spec: allocateLoadBalancerNodePorts: true clusterIP: 172.30.117.57 clusterIPs: - 172.30.117.57 externalTrafficPolicy: Cluster internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - nodePort: 31659 port: 50051 protocol: TCP targetPort: ocs-provider selector: app: ocsProviderApiServer sessionAffinity: None type: LoadBalancer status: loadBalancer: ingress: - hostname: aae7338b284124369a8ebc2918945f3a-1835295265.us-east-1.elb.amazonaws.com $ oc -n openshift-storage get storagecluster -o yaml| grep -i Endpoint storageProviderEndpoint: aae7338b284124369a8ebc2918945f3a-1835295265.us-east-1.elb.amazonaws.com:50051 $ oc get managedocs managedocs -o yaml apiVersion: ocs.openshift.io/v1alpha1 kind: ManagedOCS metadata: creationTimestamp: "2022-06-29T08:22:51Z" finalizers: - managedocs.ocs.openshift.io generation: 1 name: managedocs namespace: openshift-storage resourceVersion: "309802" uid: 2d609aa9-6fbd-49cd-9ab3-d3f41001ebb5 spec: {} status: components: alertmanager: state: Ready prometheus: state: Ready storageCluster: state: Pending reconcileStrategy: strict ----------------------------------------------------------------------------------------------------------- From consumer cluster: (consumer is not upgraded to ODF 4.11) $ oc -n openshift-storage get storagecluster -o yaml| grep -i Endpoint storageProviderEndpoint: 10.0.138.52:31659 $ oc get storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 4h26m Progressing true 2022-06-29T09:48:56Z $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.4 NooBaa Operator 4.10.4 mcg-operator.v4.10.3 Succeeded ocs-operator.v4.10.2 OpenShift Container Storage 4.10.2 ocs-operator.v4.10.1 Succeeded ocs-osd-deployer.v2.0.2 OCS OSD Deployer 2.0.2 ocs-osd-deployer.v2.0.1 Installing odf-csi-addons-operator.v4.10.4 CSI Addons 4.10.4 odf-csi-addons-operator.v4.10.2 Succeeded odf-operator.v4.10.2 OpenShift Data Foundation 4.10.2 odf-operator.v4.10.1 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded $ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES 08aa0deaba969e7904ad889667c93cc277552a20b17685c3beb6e478fe572dd 0/1 Completed 0 4h28m 10.131.0.31 ip-10-0-174-54.ec2.internal <none> <none> 1dcc34b3a106d396dc409ff46b9d0db7fbac8634502fc768b71230b464vzxk8 0/1 Completed 0 4h28m 10.131.0.34 ip-10-0-174-54.ec2.internal <none> <none> 3bad4d15272db3fa9a7f04749a3b88f88091663dc8d1e7454b68a1c5e9jb6mw 0/1 Completed 0 4h28m 10.131.0.32 ip-10-0-174-54.ec2.internal <none> <none> 6e9a6d05bebac324419c47259d443223a75858ee9e6eb87751b0ddb24b278gg 0/1 Completed 0 4h28m 10.131.0.33 ip-10-0-174-54.ec2.internal <none> <none> a0d6d7ea93ef0f905e0d25c9e9506251b53905213368078caad6aee4bc8p9vm 0/1 Completed 0 4h28m 10.131.0.35 ip-10-0-174-54.ec2.internal <none> <none> addon-ocs-consumer-qe-catalog-4sbcj 1/1 Running 0 72m 10.129.2.55 ip-10-0-150-132.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-0 2/2 Running 0 4h27m 10.128.2.9 ip-10-0-140-201.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-1 2/2 Running 0 4h27m 10.128.2.10 ip-10-0-140-201.ec2.internal <none> <none> alertmanager-managed-ocs-alertmanager-2 2/2 Running 0 4h27m 10.128.2.11 ip-10-0-140-201.ec2.internal <none> <none> csi-addons-controller-manager-b4495976c-fgtks 2/2 Running 0 4h25m 10.131.0.50 ip-10-0-174-54.ec2.internal <none> <none> csi-cephfsplugin-2v5xz 3/3 Running 0 4h26m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none> csi-cephfsplugin-6gd74 3/3 Running 0 4h26m 10.0.140.201 ip-10-0-140-201.ec2.internal <none> <none> csi-cephfsplugin-d58fm 3/3 Running 3 4h26m 10.0.150.132 ip-10-0-150-132.ec2.internal <none> <none> csi-cephfsplugin-provisioner-599bbfcd9-5mkzd 6/6 Running 0 4h26m 10.128.2.16 ip-10-0-140-201.ec2.internal <none> <none> csi-cephfsplugin-provisioner-599bbfcd9-wfcv2 6/6 Running 0 4h26m 10.131.0.49 ip-10-0-174-54.ec2.internal <none> <none> csi-rbdplugin-5s42s 4/4 Running 0 4h26m 10.0.140.201 ip-10-0-140-201.ec2.internal <none> <none> csi-rbdplugin-cs8mn 4/4 Running 4 4h26m 10.0.150.132 ip-10-0-150-132.ec2.internal <none> <none> csi-rbdplugin-provisioner-86755fff69-cwd5f 7/7 Running 0 4h26m 10.128.2.15 ip-10-0-140-201.ec2.internal <none> <none> csi-rbdplugin-provisioner-86755fff69-jwhpw 7/7 Running 0 4h26m 10.131.0.48 ip-10-0-174-54.ec2.internal <none> <none> csi-rbdplugin-t86r5 4/4 Running 0 4h26m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none> ocs-metrics-exporter-5dcf6f88df-gpqj6 1/1 Running 0 4h27m 10.128.2.13 ip-10-0-140-201.ec2.internal <none> <none> ocs-operator-5985b8b5f4-dlmr8 1/1 Running 0 4h26m 10.131.0.45 ip-10-0-174-54.ec2.internal <none> <none> ocs-osd-controller-manager-5bb548944-znb4v 2/3 Running 0 4h27m 10.131.0.37 ip-10-0-174-54.ec2.internal <none> <none> odf-console-58f6b6f5bb-jc8dd 1/1 Running 0 4h27m 10.131.0.43 ip-10-0-174-54.ec2.internal <none> <none> odf-operator-controller-manager-584df64f8-n5m52 2/2 Running 0 4h27m 10.131.0.36 ip-10-0-174-54.ec2.internal <none> <none> prometheus-managed-ocs-prometheus-0 3/3 Running 0 4h27m 10.128.2.8 ip-10-0-140-201.ec2.internal <none> <none> prometheus-operator-8547cc9f89-jvwwx 1/1 Running 0 4h27m 10.131.0.42 ip-10-0-174-54.ec2.internal <none> <none> redhat-operators-t5hqt 1/1 Running 0 4h28m 10.131.0.30 ip-10-0-174-54.ec2.internal <none> <none> rook-ceph-operator-5678fcf74-z2mvt 1/1 Running 0 4h27m 10.128.2.12 ip-10-0-140-201.ec2.internal <none> <none> rook-ceph-tools-7cfb87c645-vlffd 1/1 Running 0 4h6m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none> $ oc get managedocs managedocs -o yaml apiVersion: ocs.openshift.io/v1alpha1 kind: ManagedOCS metadata: creationTimestamp: "2022-06-29T09:48:37Z" finalizers: - managedocs.ocs.openshift.io generation: 1 name: managedocs namespace: openshift-storage resourceVersion: "265394" uid: 9cbce1e2-9e61-4054-b54d-9cd70fb78175 spec: {} status: components: alertmanager: state: Ready prometheus: state: Ready storageCluster: state: Pending reconcileStrategy: strict $ oc logs ocs-operator-5985b8b5f4-dlmr8 --tail 2 {"level":"info","ts":1656512483.6247618,"logger":"controllers.StorageCluster","msg":"Reconciling external StorageCluster.","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","StorageCluster":"openshift-storage/ocs-storagecluster"} {"level":"error","ts":1656512483.6407418,"logger":"controllers.StorageCluster","msg":"External-OCS:GetStorageConfig:StorageConsumer is not ready yet. Will requeue after 5 second","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","error":"rpc error: code = Unavailable desc = waiting for the rook resources to be provisioned","stacktrace":"github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).getExternalConfigFromProvider\n\t/remote-source/app/controllers/storagecluster/external_ocs.go:158\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*ocsExternalResources).ensureCreated\n\t/remote-source/app/controllers/storagecluster/external_resources.go:257\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).reconcilePhases\n\t/remote-source/app/controllers/storagecluster/reconcile.go:402\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).Reconcile\n\t/remote-source/app/controllers/storagecluster/reconcile.go:161\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214"} Must-gather collected after upgrading provider cluster to ODF 4.11: Provider cluster must-gather - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j29-pr/jijoy-j29-pr_20220629T074927/logs/testcases_1656511818/ Consumer cluster must-gather - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j29-c1/jijoy-j29-c1_20220629T091015/logs/testcases_1656511889/
Verification steps with the new fix: As soon as we upgrade the provider, a consumer should not lose the connection with the provider. There should be only one service with the load balancer type. We should be able to use this service for node port access as well via using the worker nodes also with the new load balancer endpoint. We should be able to connect to the provider after changing the EP (load balancer) in the consumer.
(In reply to Nitin Goyal from comment #5) > Verification steps with the new fix: > > As soon as we upgrade the provider, a consumer should not lose the > connection with the provider. Verified this after upgrading the provider cluster ODF 4.10.4 to 4.11.0-113. > There should be only one service with the load balancer type. We should be > able to use this service for node port access as well via using the worker > nodes also with the new load balancer endpoint. > We should be able to connect to the provider after changing the EP (load > balancer) in the consumer. Hi Nitin, Is this an automatic process ?
Verified in version: ODF 4.11.0-113 OCP 4.10.20 ocs-osd-deployer.v2.0.3 Upgraded provider and consumer cluster successfully from ODF 4.10.4 to 4.11.0-113 Before upgrading the provider custer: $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE addon-ocs-provider-qe-catalog ClusterIP 172.30.45.237 <none> 50051/TCP 5h32m alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 5h30m csi-addons-controller-manager-metrics-service ClusterIP 172.30.2.154 <none> 8443/TCP 5h31m noobaa-operator-service ClusterIP 172.30.212.211 <none> 443/TCP 5h31m ocs-metrics-exporter ClusterIP 172.30.190.198 <none> 8080/TCP,8081/TCP 5h30m ocs-osd-controller-manager-metrics-service ClusterIP 172.30.156.38 <none> 8443/TCP 5h31m ocs-provider-server NodePort 172.30.232.83 <none> 50051:31659/TCP 5h30m odf-console-service ClusterIP 172.30.187.160 <none> 9001/TCP 5h31m odf-operator-controller-manager-metrics-service ClusterIP 172.30.112.169 <none> 8443/TCP 5h31m prometheus ClusterIP 172.30.184.104 <none> 9339/TCP 5h30m prometheus-operated ClusterIP None <none> 9090/TCP 5h30m rook-ceph-mgr ClusterIP 172.30.79.251 <none> 9283/TCP 5h24m After upgrading the provider cluster: $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE addon-ocs-provider-qe-catalog ClusterIP 172.30.45.237 <none> 50051/TCP 5h52m alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 5h50m csi-addons-controller-manager-metrics-service ClusterIP 172.30.2.154 <none> 8443/TCP 5h52m ocs-metrics-exporter ClusterIP 172.30.190.198 <none> 8080/TCP,8081/TCP 5h50m ocs-osd-controller-manager-metrics-service ClusterIP 172.30.156.38 <none> 8443/TCP 5h51m ocs-provider-server LoadBalancer 172.30.232.83 ab5f87e5fef9e44f89b12eab3be358c2-1799690877.us-east-1.elb.amazonaws.com 50051:31659/TCP 5h50m odf-console-service ClusterIP 172.30.187.160 <none> 9001/TCP 5h51m odf-operator-controller-manager-metrics-service ClusterIP 172.30.112.169 <none> 8443/TCP 5h51m prometheus ClusterIP 172.30.184.104 <none> 9339/TCP 5h50m prometheus-operated ClusterIP None <none> 9090/TCP 5h50m rook-ceph-mgr ClusterIP 172.30.79.251 <none> 9283/TCP 5h44m From provider cluster after upgrade: $ oc get storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 9h Ready 2022-07-13T05:13:47Z $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.11.0 NooBaa Operator 4.11.0 mcg-operator.v4.10.4 Succeeded ocs-operator.v4.11.0 OpenShift Container Storage 4.11.0 ocs-operator.v4.10.4 Succeeded ocs-osd-deployer.v2.0.3 OCS OSD Deployer 2.0.3 ocs-osd-deployer.v2.0.2 Succeeded odf-csi-addons-operator.v4.11.0 CSI Addons 4.11.0 odf-csi-addons-operator.v4.10.4 Succeeded odf-operator.v4.11.0 OpenShift Data Foundation 4.11.0 odf-operator.v4.10.4 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded From consumer cluster after upgrade: $ oc get storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 5h50m Ready true 2022-07-13T08:59:25Z $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.11.0 NooBaa Operator 4.11.0 mcg-operator.v4.10.4 Succeeded ocs-operator.v4.11.0 OpenShift Container Storage 4.11.0 ocs-operator.v4.10.4 Succeeded ocs-osd-deployer.v2.0.3 OCS OSD Deployer 2.0.3 ocs-osd-deployer.v2.0.2 Succeeded odf-csi-addons-operator.v4.11.0 CSI Addons 4.11.0 odf-csi-addons-operator.v4.10.4 Succeeded odf-operator.v4.11.0 OpenShift Data Foundation 4.11.0 odf-operator.v4.10.4 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded Adding must-gather logs for reference: Must gather before upgrading provider and consumer cluster from ODF 4.10.4 to 4.11.0-113: Consumer http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-c3/jijoy-j13-c3_20220713T081317/logs/testcases_1657705862/ Provider http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-pr/jijoy-j13-pr_20220713T043423/logs/testcases_1657705913/ Provider upgrade test report: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-pr/jijoy-j13-pr_20220713T043423/logs/test_report_1657709000.html Test run to create PVCs and pod after upgrading provider cluster and before upgrading consumer cluster: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-c3/jijoy-j13-c3_20220713T081317/logs/test_report_1657710514.html Must gather logs collected after upgrading the provider cluster to ODF 4.11.0-113: Consumer http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-c3/jijoy-j13-c3_20220713T081317/logs/testcases_1657715380/ Provider http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-pr/jijoy-j13-pr_20220713T043423/logs/testcases_1657715387/ Consumer upgrade test report: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-c3/jijoy-j13-c3_20220713T081317/logs/test_report_1657717959.html Must gather logs collected after upgrading the consumer cluster to ODF 4.11.0-113: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-c3/jijoy-j13-c3_20220713T081317/logs/testcases_1657721461/
(In reply to Jilju Joy from comment #6) > (In reply to Nitin Goyal from comment #5) > > Verification steps with the new fix: > > > > As soon as we upgrade the provider, a consumer should not lose the > > connection with the provider. > Verified this after upgrading the provider cluster ODF 4.10.4 to 4.11.0-113. > > > There should be only one service with the load balancer type. We should be > > able to use this service for node port access as well via using the worker > > nodes also with the new load balancer endpoint. > > > We should be able to connect to the provider after changing the EP (load > > balancer) in the consumer. > Hi Nitin, > Is this an automatic process ? No this is a manual process. Storagecluster CR endpoint needs to be changed manually on the consumer cluster. You can get this new endpoint in the storagecluster CR status from the provider cluster.
(In reply to Nitin Goyal from comment #8) > (In reply to Jilju Joy from comment #6) > > (In reply to Nitin Goyal from comment #5) > > > Verification steps with the new fix: > > > > > > As soon as we upgrade the provider, a consumer should not lose the > > > connection with the provider. > > Verified this after upgrading the provider cluster ODF 4.10.4 to 4.11.0-113. > > > > > There should be only one service with the load balancer type. We should be > > > able to use this service for node port access as well via using the worker > > > nodes also with the new load balancer endpoint. > > > > > We should be able to connect to the provider after changing the EP (load > > > balancer) in the consumer. > > Hi Nitin, > > Is this an automatic process ? > > No this is a manual process. Storagecluster CR endpoint needs to be changed > manually on the consumer cluster. You can get this new endpoint in the > storagecluster CR status from the provider cluster. Hi Neha, FYI If this manual step is not done, we will hit the issue described in the bug #2060487