Bug 2099958
| Summary: | While upgrade from v4.10.z to v4.11.z the consumer connection will be lost due to new loadbalancer service instead of nodePort service | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Dhruv Bindra <dbindra> |
| Component: | ocs-operator | Assignee: | Nitin Goyal <nigoyal> |
| Status: | VERIFIED --- | QA Contact: | Jilju Joy <jijoy> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.10 | CC: | aeyal, muagarwa, nberry, nigoyal, odf-bz-bot, sostapov |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.11.0-113 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Dhruv Bindra
2022-06-22 05:09:34 UTC
Background: Earlier in 4.10, we were using node port svc which was pruned to failures and we changed it to load balancer which creates another problem as soon as the provider is upgraded consumers will lose the connection and someone needs to change the ep in the consumers. To get out of the above situation we are now creating both the SVC and consumers will stay connected to the provider. It will also give a good amount of time to consumers to update their EP to the load balancer svc. We will deprecate node port SVC in the future. Verification: As soon as we upgrade the provider, a consumer should not lose the connection with the provider. We should have 2 svc one with node port and one with a load balancer on the provider. We should be able to connect to the provider after changing the EP (load balancer) in the consumer.
Tested upgrade to ODF 4.11.0. Provider storagecluster is not Ready after upgrade.
BEFORE upgrading the provider and consumer to ODF 4.11:
ODF 4.10.2-3
OCP 4.10.18
From provider cluster:
$ oc -n openshift-storage get storagecluster -o yaml| grep -i Endpoint
storageProviderEndpoint: 10.0.138.52:31659
$ oc get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-134-89.ec2.internal Ready master 4h43m v1.23.5+3afdacb 10.0.134.89 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-138-52.ec2.internal Ready worker 4h37m v1.23.5+3afdacb 10.0.138.52 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-139-224.ec2.internal Ready infra,worker 4h26m v1.23.5+3afdacb 10.0.139.224 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-150-162.ec2.internal Ready master 4h43m v1.23.5+3afdacb 10.0.150.162 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-150-242.ec2.internal Ready infra,worker 4h26m v1.23.5+3afdacb 10.0.150.242 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-155-183.ec2.internal Ready worker 4h37m v1.23.5+3afdacb 10.0.155.183 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-161-252.ec2.internal Ready worker 4h37m v1.23.5+3afdacb 10.0.161.252 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-164-146.ec2.internal Ready master 4h43m v1.23.5+3afdacb 10.0.164.146 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-173-63.ec2.internal Ready infra,worker 4h26m v1.23.5+3afdacb 10.0.173.63 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
$ oc -n openshift-storage get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
addon-ocs-provider-qe-catalog-7btmx 1/1 Running 0 4h17m 10.131.0.20 ip-10-0-138-52.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-0 2/2 Running 0 4h17m 10.131.0.19 ip-10-0-138-52.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-1 2/2 Running 0 4h14m 10.128.2.20 ip-10-0-155-183.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-2 2/2 Running 0 4h14m 10.128.2.26 ip-10-0-155-183.ec2.internal <none> <none>
csi-addons-controller-manager-b4495976c-b966j 2/2 Running 0 4h24m 10.131.0.9 ip-10-0-138-52.ec2.internal <none> <none>
ocs-metrics-exporter-5dcf6f88df-m55j7 1/1 Running 0 4h14m 10.128.2.25 ip-10-0-155-183.ec2.internal <none> <none>
ocs-operator-5985b8b5f4-zjpcw 1/1 Running 0 4h14m 10.128.2.13 ip-10-0-155-183.ec2.internal <none> <none>
ocs-osd-controller-manager-688ddc4cfb-9xp2h 3/3 Running 0 4h17m 10.131.0.13 ip-10-0-138-52.ec2.internal <none> <none>
ocs-provider-server-766b88c486-d2dmv 1/1 Running 0 4h14m 10.128.2.8 ip-10-0-155-183.ec2.internal <none> <none>
odf-console-58f6b6f5bb-sx4b4 1/1 Running 0 4h14m 10.128.2.12 ip-10-0-155-183.ec2.internal <none> <none>
odf-operator-controller-manager-584df64f8-4zjm6 2/2 Running 0 4h17m 10.131.0.21 ip-10-0-138-52.ec2.internal <none> <none>
prometheus-managed-ocs-prometheus-0 3/3 Running 0 4h17m 10.131.0.14 ip-10-0-138-52.ec2.internal <none> <none>
prometheus-operator-8547cc9f89-ksfdt 1/1 Running 0 4h17m 10.131.0.16 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-138-52.ec2.internal-5c9c6rzb2n 1/1 Running 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-155-183.ec2.internal-8578jqp8r 1/1 Running 0 4h16m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-161-252.ec2.internal-779fqh62v 1/1 Running 0 4h12m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-59b95ccfwjktw 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-58995c4dznfb7 2/2 Running 0 4h14m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-mgr-a-854875bc7c-bmgjt 2/2 Running 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-mon-a-5d5874b99-cp2kg 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-mon-b-684d77d5cf-hjl5q 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-mon-c-6d4cf68965-hx6nh 2/2 Running 0 4h23m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-operator-5678fcf74-9j8r6 1/1 Running 0 4h14m 10.128.2.24 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-osd-0-5c85bd4675-fhfr7 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-osd-1-dfcb97c8d-2mrd4 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-osd-10-57f4f48547-xlqq8 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-11-df6d566cd-r42gw 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-12-7d5ddf4566-dlhc6 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-13-dbf878d66-hkkqr 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-14-858657646-t4wmr 2/2 Running 0 4h20m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-2-7795548cc8-dtvdm 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-osd-3-5c4896ff45-wsc6v 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-osd-4-7b59fd5b6b-nfgjm 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-osd-5-89d747ccd-z6hbn 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-osd-6-756ff79568-pqldc 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-osd-7-7546c6df7-gdw92 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-osd-8-74bf4c4954-fhgp9 2/2 Running 0 4h17m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-osd-9-77568f5856-fgv59 2/2 Running 0 4h14m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-osd-prepare-default-0-data-0hsbn5-hk4tf 0/1 Completed 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-prepare-default-0-data-357khh-z89m7 0/1 Completed 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-prepare-default-1-data-1mj7vb-nptmt 0/1 Completed 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-prepare-default-1-data-46tzst-jbmf7 0/1 Completed 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-prepare-default-2-data-2vphcr-dhs4v 0/1 Completed 0 4h21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-tools-79ccc8ddc5-rggt4 1/1 Running 0 4h14m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
mcg-operator.v4.10.4 NooBaa Operator 4.10.4 mcg-operator.v4.10.3 Succeeded
ocs-operator.v4.10.2 OpenShift Container Storage 4.10.2 ocs-operator.v4.10.1 Succeeded
ocs-osd-deployer.v2.0.2 OCS OSD Deployer 2.0.2 ocs-osd-deployer.v2.0.1 Succeeded
odf-csi-addons-operator.v4.10.4 CSI Addons 4.10.4 odf-csi-addons-operator.v4.10.2 Succeeded
odf-operator.v4.10.2 OpenShift Data Foundation 4.10.2 odf-operator.v4.10.1 Succeeded
ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded
route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.18 True False 5h5m Cluster version is 4.10.18
$ oc get storagecluster
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-storagecluster 4h29m Ready 2022-06-29T08:23:04Z
$ oc get storagesystem
NAME STORAGE-SYSTEM-KIND STORAGE-SYSTEM-NAME
ocs-storagecluster-storagesystem storagecluster.ocs.openshift.io/v1 ocs-storagecluster
$ oc get cephcluster
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
ocs-storagecluster-cephcluster /var/lib/rook 3 4h29m Ready Cluster created successfully HEALTH_OK
$ oc get managedocs -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1alpha1
kind: ManagedOCS
metadata:
creationTimestamp: "2022-06-29T08:22:51Z"
finalizers:
- managedocs.ocs.openshift.io
generation: 1
name: managedocs
namespace: openshift-storage
resourceVersion: "84251"
uid: 2d609aa9-6fbd-49cd-9ab3-d3f41001ebb5
spec: {}
status:
components:
alertmanager:
state: Ready
prometheus:
state: Ready
storageCluster:
state: Ready
reconcileStrategy: strict
kind: List
metadata:
resourceVersion: ""
selfLink: ""
$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
addon-ocs-provider-qe-catalog ClusterIP 172.30.92.188 <none> 50051/TCP 4h35m
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 4h33m
csi-addons-controller-manager-metrics-service ClusterIP 172.30.96.99 <none> 8443/TCP 4h32m
noobaa-operator-service ClusterIP 172.30.17.229 <none> 443/TCP 4h33m
ocs-metrics-exporter ClusterIP 172.30.186.247 <none> 8080/TCP,8081/TCP 4h32m
ocs-osd-controller-manager-metrics-service ClusterIP 172.30.238.148 <none> 8443/TCP 4h33m
ocs-provider-server NodePort 172.30.117.57 <none> 50051:31659/TCP 4h33m
odf-console-service ClusterIP 172.30.92.91 <none> 9001/TCP 4h33m
odf-operator-controller-manager-metrics-service ClusterIP 172.30.254.88 <none> 8443/TCP 4h34m
prometheus ClusterIP 172.30.201.202 <none> 9339/TCP 4h33m
prometheus-operated ClusterIP None <none> 9090/TCP 4h33m
rook-ceph-mgr ClusterIP 172.30.44.21 <none> 9283/TCP 4h27m
$ oc get svc ocs-provider-server -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1656489985
service.beta.openshift.io/serving-cert-secret-name: ocs-provider-server-cert
service.beta.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1656489985
creationTimestamp: "2022-06-29T08:23:05Z"
name: ocs-provider-server
namespace: openshift-storage
ownerReferences:
- apiVersion: ocs.openshift.io/v1
kind: StorageCluster
name: ocs-storagecluster
uid: 112f14d9-024e-4053-a117-04ac0fc348ee
resourceVersion: "40080"
uid: ae7338b2-8412-4369-a8eb-c2918945f3aa
spec:
clusterIP: 172.30.117.57
clusterIPs:
- 172.30.117.57
externalTrafficPolicy: Cluster
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- nodePort: 31659
port: 50051
protocol: TCP
targetPort: ocs-provider
selector:
app: ocsProviderApiServer
sessionAffinity: None
type: NodePort
status:
loadBalancer: {}
-------------------------------------------------------------------------------------
From consumer cluster:
$ oc -n openshift-storage get storagecluster -o yaml| grep -i Endpoint
storageProviderEndpoint: 10.0.138.52:31659
$ oc get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-128-172.ec2.internal Ready infra,worker 3h4m v1.23.5+3afdacb 10.0.128.172 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-140-201.ec2.internal Ready worker 3h18m v1.23.5+3afdacb 10.0.140.201 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-142-150.ec2.internal Ready master 3h23m v1.23.5+3afdacb 10.0.142.150 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-149-134.ec2.internal Ready infra,worker 3h4m v1.23.5+3afdacb 10.0.149.134 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-150-132.ec2.internal Ready worker 3h17m v1.23.5+3afdacb 10.0.150.132 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-158-115.ec2.internal Ready master 3h23m v1.23.5+3afdacb 10.0.158.115 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-165-8.ec2.internal Ready master 3h24m v1.23.5+3afdacb 10.0.165.8 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-173-10.ec2.internal Ready infra,worker 3h5m v1.23.5+3afdacb 10.0.173.10 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
ip-10-0-174-54.ec2.internal Ready worker 3h18m v1.23.5+3afdacb 10.0.174.54 <none> Red Hat Enterprise Linux CoreOS 410.84.202206080346-0 (Ootpa) 4.18.0-305.49.1.el8_4.x86_64 cri-o://1.23.3-3.rhaos4.10.git5fe1720.el8
$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
08aa0deaba969e7904ad889667c93cc277552a20b17685c3beb6e478fe572dd 0/1 Completed 0 3h4m 10.131.0.31 ip-10-0-174-54.ec2.internal <none> <none>
1dcc34b3a106d396dc409ff46b9d0db7fbac8634502fc768b71230b464vzxk8 0/1 Completed 0 3h4m 10.131.0.34 ip-10-0-174-54.ec2.internal <none> <none>
3bad4d15272db3fa9a7f04749a3b88f88091663dc8d1e7454b68a1c5e9jb6mw 0/1 Completed 0 3h4m 10.131.0.32 ip-10-0-174-54.ec2.internal <none> <none>
6e9a6d05bebac324419c47259d443223a75858ee9e6eb87751b0ddb24b278gg 0/1 Completed 0 3h4m 10.131.0.33 ip-10-0-174-54.ec2.internal <none> <none>
a0d6d7ea93ef0f905e0d25c9e9506251b53905213368078caad6aee4bc8p9vm 0/1 Completed 0 3h4m 10.131.0.35 ip-10-0-174-54.ec2.internal <none> <none>
addon-ocs-consumer-qe-catalog-68v92 1/1 Running 0 3h4m 10.131.0.28 ip-10-0-174-54.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-0 2/2 Running 0 3h3m 10.128.2.9 ip-10-0-140-201.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-1 2/2 Running 0 3h3m 10.128.2.10 ip-10-0-140-201.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-2 2/2 Running 0 3h3m 10.128.2.11 ip-10-0-140-201.ec2.internal <none> <none>
csi-addons-controller-manager-b4495976c-fgtks 2/2 Running 0 3h1m 10.131.0.50 ip-10-0-174-54.ec2.internal <none> <none>
csi-cephfsplugin-2v5xz 3/3 Running 0 3h2m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none>
csi-cephfsplugin-6gd74 3/3 Running 0 3h2m 10.0.140.201 ip-10-0-140-201.ec2.internal <none> <none>
csi-cephfsplugin-d58fm 3/3 Running 3 3h2m 10.0.150.132 ip-10-0-150-132.ec2.internal <none> <none>
csi-cephfsplugin-provisioner-599bbfcd9-5mkzd 6/6 Running 0 3h2m 10.128.2.16 ip-10-0-140-201.ec2.internal <none> <none>
csi-cephfsplugin-provisioner-599bbfcd9-wfcv2 6/6 Running 0 3h2m 10.131.0.49 ip-10-0-174-54.ec2.internal <none> <none>
csi-rbdplugin-5s42s 4/4 Running 0 3h2m 10.0.140.201 ip-10-0-140-201.ec2.internal <none> <none>
csi-rbdplugin-cs8mn 4/4 Running 4 3h2m 10.0.150.132 ip-10-0-150-132.ec2.internal <none> <none>
csi-rbdplugin-provisioner-86755fff69-cwd5f 7/7 Running 0 3h2m 10.128.2.15 ip-10-0-140-201.ec2.internal <none> <none>
csi-rbdplugin-provisioner-86755fff69-jwhpw 7/7 Running 0 3h2m 10.131.0.48 ip-10-0-174-54.ec2.internal <none> <none>
csi-rbdplugin-t86r5 4/4 Running 0 3h2m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none>
ocs-metrics-exporter-5dcf6f88df-gpqj6 1/1 Running 0 3h2m 10.128.2.13 ip-10-0-140-201.ec2.internal <none> <none>
ocs-operator-5985b8b5f4-dlmr8 1/1 Running 0 3h2m 10.131.0.45 ip-10-0-174-54.ec2.internal <none> <none>
ocs-osd-controller-manager-5bb548944-znb4v 3/3 Running 0 3h3m 10.131.0.37 ip-10-0-174-54.ec2.internal <none> <none>
odf-console-58f6b6f5bb-jc8dd 1/1 Running 0 3h3m 10.131.0.43 ip-10-0-174-54.ec2.internal <none> <none>
odf-operator-controller-manager-584df64f8-n5m52 2/2 Running 0 3h3m 10.131.0.36 ip-10-0-174-54.ec2.internal <none> <none>
prometheus-managed-ocs-prometheus-0 3/3 Running 0 3h3m 10.128.2.8 ip-10-0-140-201.ec2.internal <none> <none>
prometheus-operator-8547cc9f89-jvwwx 1/1 Running 0 3h3m 10.131.0.42 ip-10-0-174-54.ec2.internal <none> <none>
redhat-operators-t5hqt 1/1 Running 0 3h4m 10.131.0.30 ip-10-0-174-54.ec2.internal <none> <none>
rook-ceph-operator-5678fcf74-z2mvt 1/1 Running 0 3h2m 10.128.2.12 ip-10-0-140-201.ec2.internal <none> <none>
rook-ceph-tools-7cfb87c645-vlffd 1/1 Running 0 162m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none>
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
mcg-operator.v4.10.4 NooBaa Operator 4.10.4 mcg-operator.v4.10.3 Succeeded
ocs-operator.v4.10.2 OpenShift Container Storage 4.10.2 ocs-operator.v4.10.1 Succeeded
ocs-osd-deployer.v2.0.2 OCS OSD Deployer 2.0.2 ocs-osd-deployer.v2.0.1 Succeeded
odf-csi-addons-operator.v4.10.4 CSI Addons 4.10.4 odf-csi-addons-operator.v4.10.2 Succeeded
odf-operator.v4.10.2 OpenShift Data Foundation 4.10.2 odf-operator.v4.10.1 Succeeded
ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded
route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.18 True False 3h41m Cluster version is 4.10.18
$ oc get storagecluster
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-storagecluster 3h3m Ready true 2022-06-29T09:48:56Z
$ oc get storagesystem
NAME STORAGE-SYSTEM-KIND STORAGE-SYSTEM-NAME
ocs-storagecluster-storagesystem storagecluster.ocs.openshift.io/v1 ocs-storagecluster
$ oc get cephcluster
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
ocs-storagecluster-cephcluster 3h4m Connected Cluster connected successfully HEALTH_OK true
$ oc get managedocs -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1alpha1
kind: ManagedOCS
metadata:
creationTimestamp: "2022-06-29T09:48:37Z"
finalizers:
- managedocs.ocs.openshift.io
generation: 1
name: managedocs
namespace: openshift-storage
resourceVersion: "50222"
uid: 9cbce1e2-9e61-4054-b54d-9cd70fb78175
spec: {}
status:
components:
alertmanager:
state: Ready
prometheus:
state: Ready
storageCluster:
state: Ready
reconcileStrategy: strict
kind: List
metadata:
resourceVersion: ""
selfLink: ""
$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
addon-ocs-consumer-qe-catalog ClusterIP 172.30.123.223 <none> 50051/TCP 3h9m
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 3h8m
csi-addons-controller-manager-metrics-service ClusterIP 172.30.156.17 <none> 8443/TCP 3h7m
csi-cephfsplugin-metrics ClusterIP 172.30.167.58 <none> 8080/TCP,8081/TCP 3h7m
csi-rbdplugin-metrics ClusterIP 172.30.242.248 <none> 8080/TCP,8081/TCP 3h7m
noobaa-operator-service ClusterIP 172.30.107.87 <none> 443/TCP 3h8m
ocs-metrics-exporter ClusterIP 172.30.127.9 <none> 8080/TCP,8081/TCP 3h7m
ocs-osd-controller-manager-metrics-service ClusterIP 172.30.124.249 <none> 8443/TCP 3h9m
odf-console-service ClusterIP 172.30.162.31 <none> 9001/TCP 3h8m
odf-operator-controller-manager-metrics-service ClusterIP 172.30.146.98 <none> 8443/TCP 3h9m
prometheus ClusterIP 172.30.28.30 <none> 9339/TCP 3h8m
prometheus-operated ClusterIP None <none> 9090/TCP 3h8m
redhat-operators ClusterIP 172.30.151.142 <none> 50051/TCP 3h9m
rook-ceph-mgr-external ClusterIP 172.30.167.4 <none> 9283/TCP 3h7m
=============================================================================================================
=============================================================================================================
AFTER upgrading the provider cluster to ODF 4.11
From provider cluster:
$ oc logs ocs-operator-6c75d4bc49-7pk72 --tail 10
{"level":"info","ts":1656510972.439494,"logger":"controllers.StorageCluster","msg":"Service create/update succeeded","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":1656510972.4395082,"logger":"controllers.StorageCluster","msg":"status.storageProviderEndpoint is updated","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","Endpoint":"aae7338b284124369a8ebc2918945f3a-1835295265.us-east-1.elb.amazonaws.com:50051"}
{"level":"error","ts":1656510972.4534578,"logger":"controllers.StorageCluster","msg":"Failed to create/update service","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","Name":"ocs-provider-server-node-port-svc","error":"Service \"ocs-provider-server-node-port-svc\" is invalid: spec.ports[0].nodePort: Invalid value: 31659: provided port is already allocated","stacktrace":"github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*ocsProviderServer).ensureCreated\n\t/remote-source/app/controllers/storagecluster/provider_server.go:63\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).reconcilePhases\n\t/remote-source/app/controllers/storagecluster/reconcile.go:411\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).Reconcile\n\t/remote-source/app/controllers/storagecluster/reconcile.go:161\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
{"level":"error","ts":1656510972.4705205,"logger":"controller.storagecluster","msg":"Reconciler error","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"Service \"ocs-provider-server-node-port-svc\" is invalid: spec.ports[0].nodePort: Invalid value: 31659: provided port is already allocated","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":1656511034.934945,"logger":"controllers.StorageCluster","msg":"Reconciling StorageCluster.","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","StorageCluster":{"name":"ocs-storagecluster","namespace":"openshift-storage"}}
{"level":"info","ts":1656511034.9349763,"logger":"controllers.StorageCluster","msg":"Spec.AllowRemoteStorageConsumers is enabled. Creating Provider API resources","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":1656511034.9423943,"logger":"controllers.StorageCluster","msg":"Service create/update succeeded","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster"}
{"level":"info","ts":1656511034.94241,"logger":"controllers.StorageCluster","msg":"status.storageProviderEndpoint is updated","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","Endpoint":"aae7338b284124369a8ebc2918945f3a-1835295265.us-east-1.elb.amazonaws.com:50051"}
{"level":"error","ts":1656511034.9563773,"logger":"controllers.StorageCluster","msg":"Failed to create/update service","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","Name":"ocs-provider-server-node-port-svc","error":"Service \"ocs-provider-server-node-port-svc\" is invalid: spec.ports[0].nodePort: Invalid value: 31659: provided port is already allocated","stacktrace":"github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*ocsProviderServer).ensureCreated\n\t/remote-source/app/controllers/storagecluster/provider_server.go:63\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).reconcilePhases\n\t/remote-source/app/controllers/storagecluster/reconcile.go:411\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).Reconcile\n\t/remote-source/app/controllers/storagecluster/reconcile.go:161\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
{"level":"error","ts":1656511034.9755225,"logger":"controller.storagecluster","msg":"Reconciler error","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"Service \"ocs-provider-server-node-port-svc\" is invalid: spec.ports[0].nodePort: Invalid value: 31659: provided port is already allocated","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
mcg-operator.v4.11.0 NooBaa Operator 4.11.0 mcg-operator.v4.10.4 Succeeded
ocs-operator.v4.11.0 OpenShift Container Storage 4.11.0 ocs-operator.v4.10.4 Succeeded
ocs-osd-deployer.v2.0.2 OCS OSD Deployer 2.0.2 ocs-osd-deployer.v2.0.1 Installing
odf-csi-addons-operator.v4.11.0 CSI Addons 4.11.0 odf-csi-addons-operator.v4.10.4 Succeeded
odf-operator.v4.11.0 OpenShift Data Foundation 4.11.0 odf-operator.v4.10.2 Succeeded
ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded
route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded
$ oc get storagecluster
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-storagecluster 5h38m Error 2022-06-29T08:23:04Z
$ oc get csv odf-operator.v4.11.0 -o yaml | grep full_version
full_version: 4.11.0-107
$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
addon-ocs-provider-qe-catalog-mztvf 1/1 Running 0 62m 10.129.2.33 ip-10-0-161-252.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-0 2/2 Running 0 5h30m 10.131.0.19 ip-10-0-138-52.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-1 2/2 Running 0 5h28m 10.128.2.20 ip-10-0-155-183.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-2 2/2 Running 0 5h28m 10.128.2.26 ip-10-0-155-183.ec2.internal <none> <none>
csi-addons-controller-manager-6bdb87bf84-kqmnw 2/2 Running 0 24m 10.129.2.62 ip-10-0-161-252.ec2.internal <none> <none>
ocs-metrics-exporter-599b56c475-vvvk9 1/1 Running 0 23m 10.129.2.69 ip-10-0-161-252.ec2.internal <none> <none>
ocs-operator-6c75d4bc49-7pk72 1/1 Running 0 24m 10.129.2.65 ip-10-0-161-252.ec2.internal <none> <none>
ocs-osd-controller-manager-6cbb8889fc-6c6qg 2/3 Running 0 28m 10.129.2.38 ip-10-0-161-252.ec2.internal <none> <none>
ocs-provider-server-6fff49c89c-l748v 1/1 Running 0 26m 10.129.2.48 ip-10-0-161-252.ec2.internal <none> <none>
odf-console-6f84b6444c-22xlb 1/1 Running 0 27m 10.129.2.42 ip-10-0-161-252.ec2.internal <none> <none>
odf-operator-controller-manager-5d975d6485-lch8k 2/2 Running 0 27m 10.129.2.41 ip-10-0-161-252.ec2.internal <none> <none>
prometheus-managed-ocs-prometheus-0 3/3 Running 0 5h30m 10.131.0.14 ip-10-0-138-52.ec2.internal <none> <none>
prometheus-operator-8547cc9f89-ksfdt 1/1 Running 0 5h30m 10.131.0.16 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-138-52.ec2.internal-7cdcbtdkn9 1/1 Running 0 23m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-155-183.ec2.internal-7cd8jcv4l 1/1 Running 0 23m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-crashcollector-ip-10-0-161-252.ec2.internal-56c7n224z 1/1 Running 0 23m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-6fdd776dcrrxz 2/2 Running 0 21m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-78994f45bgz2g 2/2 Running 0 21m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-mgr-a-656d959cc7-zrtqb 2/2 Running 0 20m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-mon-a-644fbf869b-kbjmk 2/2 Running 0 21m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-mon-b-6795596bc6-665s6 2/2 Running 0 22m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-mon-c-74669f8648-sq724 2/2 Running 0 22m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-operator-94b7f48d-c67mx 1/1 Running 0 24m 10.129.2.64 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-osd-0-85dd9c45fc-zgskb 2/2 Running 0 19m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-osd-1-587b76f58-vfhtj 2/2 Running 0 19m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-osd-10-76447fb585-xv869 2/2 Running 0 19m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-11-5d6bb4f9b-tqt47 2/2 Running 0 19m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-12-7d8f877cdd-znjpw 2/2 Running 0 19m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-13-869dc74c84-wvf6j 2/2 Running 0 19m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-14-557c496d88-985bz 2/2 Running 0 19m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-2-5769747688-bqjhw 2/2 Running 0 19m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-osd-3-64cf6cd755-lvcsx 2/2 Running 0 19m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-osd-4-7544d7565b-x9nt4 2/2 Running 0 18m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-osd-5-5f6f876d7b-brsqn 2/2 Running 0 18m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-osd-6-7b9c8d869d-5xkkm 2/2 Running 0 18m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-osd-7-6d46bd6c95-r7ff8 2/2 Running 0 18m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-osd-8-84df8c57b-zjjlt 2/2 Running 0 18m 10.0.155.183 ip-10-0-155-183.ec2.internal <none> <none>
rook-ceph-osd-9-5d78db49db-jlgt7 2/2 Running 0 19m 10.0.161.252 ip-10-0-161-252.ec2.internal <none> <none>
rook-ceph-osd-prepare-default-0-data-0hsbn5-hk4tf 0/1 Completed 0 5h34m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-prepare-default-0-data-357khh-z89m7 0/1 Completed 0 5h34m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-prepare-default-1-data-1mj7vb-nptmt 0/1 Completed 0 5h34m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-prepare-default-1-data-46tzst-jbmf7 0/1 Completed 0 5h34m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-osd-prepare-default-2-data-2vphcr-dhs4v 0/1 Completed 0 5h34m 10.0.138.52 ip-10-0-138-52.ec2.internal <none> <none>
rook-ceph-tools-75c98bc644-hxgt8 1/1 Running 0 23m 10.129.2.67 ip-10-0-161-252.ec2.internal <none> <none>
$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
addon-ocs-provider-qe-catalog ClusterIP 172.30.92.188 <none> 50051/TCP 5h41m
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 5h40m
csi-addons-controller-manager-metrics-service ClusterIP 172.30.96.99 <none> 8443/TCP 5h39m
ocs-metrics-exporter ClusterIP 172.30.186.247 <none> 8080/TCP,8081/TCP 5h39m
ocs-osd-controller-manager-metrics-service ClusterIP 172.30.238.148 <none> 8443/TCP 5h40m
ocs-provider-server LoadBalancer 172.30.117.57 aae7338b284124369a8ebc2918945f3a-1835295265.us-east-1.elb.amazonaws.com 50051:31659/TCP 5h40m
odf-console-service ClusterIP 172.30.92.91 <none> 9001/TCP 5h40m
odf-operator-controller-manager-metrics-service ClusterIP 172.30.254.88 <none> 8443/TCP 5h41m
prometheus ClusterIP 172.30.201.202 <none> 9339/TCP 5h40m
prometheus-operated ClusterIP None <none> 9090/TCP 5h40m
rook-ceph-mgr ClusterIP 172.30.44.21 <none> 9283/TCP 5h34m
$ oc get svc ocs-provider-server -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1656489985
service.beta.openshift.io/serving-cert-secret-name: ocs-provider-server-cert
service.beta.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1656489985
creationTimestamp: "2022-06-29T08:23:05Z"
finalizers:
- service.kubernetes.io/load-balancer-cleanup
name: ocs-provider-server
namespace: openshift-storage
ownerReferences:
- apiVersion: ocs.openshift.io/v1
kind: StorageCluster
name: ocs-storagecluster
uid: 112f14d9-024e-4053-a117-04ac0fc348ee
resourceVersion: "309780"
uid: ae7338b2-8412-4369-a8eb-c2918945f3aa
spec:
allocateLoadBalancerNodePorts: true
clusterIP: 172.30.117.57
clusterIPs:
- 172.30.117.57
externalTrafficPolicy: Cluster
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- nodePort: 31659
port: 50051
protocol: TCP
targetPort: ocs-provider
selector:
app: ocsProviderApiServer
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- hostname: aae7338b284124369a8ebc2918945f3a-1835295265.us-east-1.elb.amazonaws.com
$ oc -n openshift-storage get storagecluster -o yaml| grep -i Endpoint
storageProviderEndpoint: aae7338b284124369a8ebc2918945f3a-1835295265.us-east-1.elb.amazonaws.com:50051
$ oc get managedocs managedocs -o yaml
apiVersion: ocs.openshift.io/v1alpha1
kind: ManagedOCS
metadata:
creationTimestamp: "2022-06-29T08:22:51Z"
finalizers:
- managedocs.ocs.openshift.io
generation: 1
name: managedocs
namespace: openshift-storage
resourceVersion: "309802"
uid: 2d609aa9-6fbd-49cd-9ab3-d3f41001ebb5
spec: {}
status:
components:
alertmanager:
state: Ready
prometheus:
state: Ready
storageCluster:
state: Pending
reconcileStrategy: strict
-----------------------------------------------------------------------------------------------------------
From consumer cluster:
(consumer is not upgraded to ODF 4.11)
$ oc -n openshift-storage get storagecluster -o yaml| grep -i Endpoint
storageProviderEndpoint: 10.0.138.52:31659
$ oc get storagecluster
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-storagecluster 4h26m Progressing true 2022-06-29T09:48:56Z
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
mcg-operator.v4.10.4 NooBaa Operator 4.10.4 mcg-operator.v4.10.3 Succeeded
ocs-operator.v4.10.2 OpenShift Container Storage 4.10.2 ocs-operator.v4.10.1 Succeeded
ocs-osd-deployer.v2.0.2 OCS OSD Deployer 2.0.2 ocs-osd-deployer.v2.0.1 Installing
odf-csi-addons-operator.v4.10.4 CSI Addons 4.10.4 odf-csi-addons-operator.v4.10.2 Succeeded
odf-operator.v4.10.2 OpenShift Data Foundation 4.10.2 odf-operator.v4.10.1 Succeeded
ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded
route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded
$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
08aa0deaba969e7904ad889667c93cc277552a20b17685c3beb6e478fe572dd 0/1 Completed 0 4h28m 10.131.0.31 ip-10-0-174-54.ec2.internal <none> <none>
1dcc34b3a106d396dc409ff46b9d0db7fbac8634502fc768b71230b464vzxk8 0/1 Completed 0 4h28m 10.131.0.34 ip-10-0-174-54.ec2.internal <none> <none>
3bad4d15272db3fa9a7f04749a3b88f88091663dc8d1e7454b68a1c5e9jb6mw 0/1 Completed 0 4h28m 10.131.0.32 ip-10-0-174-54.ec2.internal <none> <none>
6e9a6d05bebac324419c47259d443223a75858ee9e6eb87751b0ddb24b278gg 0/1 Completed 0 4h28m 10.131.0.33 ip-10-0-174-54.ec2.internal <none> <none>
a0d6d7ea93ef0f905e0d25c9e9506251b53905213368078caad6aee4bc8p9vm 0/1 Completed 0 4h28m 10.131.0.35 ip-10-0-174-54.ec2.internal <none> <none>
addon-ocs-consumer-qe-catalog-4sbcj 1/1 Running 0 72m 10.129.2.55 ip-10-0-150-132.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-0 2/2 Running 0 4h27m 10.128.2.9 ip-10-0-140-201.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-1 2/2 Running 0 4h27m 10.128.2.10 ip-10-0-140-201.ec2.internal <none> <none>
alertmanager-managed-ocs-alertmanager-2 2/2 Running 0 4h27m 10.128.2.11 ip-10-0-140-201.ec2.internal <none> <none>
csi-addons-controller-manager-b4495976c-fgtks 2/2 Running 0 4h25m 10.131.0.50 ip-10-0-174-54.ec2.internal <none> <none>
csi-cephfsplugin-2v5xz 3/3 Running 0 4h26m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none>
csi-cephfsplugin-6gd74 3/3 Running 0 4h26m 10.0.140.201 ip-10-0-140-201.ec2.internal <none> <none>
csi-cephfsplugin-d58fm 3/3 Running 3 4h26m 10.0.150.132 ip-10-0-150-132.ec2.internal <none> <none>
csi-cephfsplugin-provisioner-599bbfcd9-5mkzd 6/6 Running 0 4h26m 10.128.2.16 ip-10-0-140-201.ec2.internal <none> <none>
csi-cephfsplugin-provisioner-599bbfcd9-wfcv2 6/6 Running 0 4h26m 10.131.0.49 ip-10-0-174-54.ec2.internal <none> <none>
csi-rbdplugin-5s42s 4/4 Running 0 4h26m 10.0.140.201 ip-10-0-140-201.ec2.internal <none> <none>
csi-rbdplugin-cs8mn 4/4 Running 4 4h26m 10.0.150.132 ip-10-0-150-132.ec2.internal <none> <none>
csi-rbdplugin-provisioner-86755fff69-cwd5f 7/7 Running 0 4h26m 10.128.2.15 ip-10-0-140-201.ec2.internal <none> <none>
csi-rbdplugin-provisioner-86755fff69-jwhpw 7/7 Running 0 4h26m 10.131.0.48 ip-10-0-174-54.ec2.internal <none> <none>
csi-rbdplugin-t86r5 4/4 Running 0 4h26m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none>
ocs-metrics-exporter-5dcf6f88df-gpqj6 1/1 Running 0 4h27m 10.128.2.13 ip-10-0-140-201.ec2.internal <none> <none>
ocs-operator-5985b8b5f4-dlmr8 1/1 Running 0 4h26m 10.131.0.45 ip-10-0-174-54.ec2.internal <none> <none>
ocs-osd-controller-manager-5bb548944-znb4v 2/3 Running 0 4h27m 10.131.0.37 ip-10-0-174-54.ec2.internal <none> <none>
odf-console-58f6b6f5bb-jc8dd 1/1 Running 0 4h27m 10.131.0.43 ip-10-0-174-54.ec2.internal <none> <none>
odf-operator-controller-manager-584df64f8-n5m52 2/2 Running 0 4h27m 10.131.0.36 ip-10-0-174-54.ec2.internal <none> <none>
prometheus-managed-ocs-prometheus-0 3/3 Running 0 4h27m 10.128.2.8 ip-10-0-140-201.ec2.internal <none> <none>
prometheus-operator-8547cc9f89-jvwwx 1/1 Running 0 4h27m 10.131.0.42 ip-10-0-174-54.ec2.internal <none> <none>
redhat-operators-t5hqt 1/1 Running 0 4h28m 10.131.0.30 ip-10-0-174-54.ec2.internal <none> <none>
rook-ceph-operator-5678fcf74-z2mvt 1/1 Running 0 4h27m 10.128.2.12 ip-10-0-140-201.ec2.internal <none> <none>
rook-ceph-tools-7cfb87c645-vlffd 1/1 Running 0 4h6m 10.0.174.54 ip-10-0-174-54.ec2.internal <none> <none>
$ oc get managedocs managedocs -o yaml
apiVersion: ocs.openshift.io/v1alpha1
kind: ManagedOCS
metadata:
creationTimestamp: "2022-06-29T09:48:37Z"
finalizers:
- managedocs.ocs.openshift.io
generation: 1
name: managedocs
namespace: openshift-storage
resourceVersion: "265394"
uid: 9cbce1e2-9e61-4054-b54d-9cd70fb78175
spec: {}
status:
components:
alertmanager:
state: Ready
prometheus:
state: Ready
storageCluster:
state: Pending
reconcileStrategy: strict
$ oc logs ocs-operator-5985b8b5f4-dlmr8 --tail 2
{"level":"info","ts":1656512483.6247618,"logger":"controllers.StorageCluster","msg":"Reconciling external StorageCluster.","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","StorageCluster":"openshift-storage/ocs-storagecluster"}
{"level":"error","ts":1656512483.6407418,"logger":"controllers.StorageCluster","msg":"External-OCS:GetStorageConfig:StorageConsumer is not ready yet. Will requeue after 5 second","Request.Namespace":"openshift-storage","Request.Name":"ocs-storagecluster","error":"rpc error: code = Unavailable desc = waiting for the rook resources to be provisioned","stacktrace":"github.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).getExternalConfigFromProvider\n\t/remote-source/app/controllers/storagecluster/external_ocs.go:158\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*ocsExternalResources).ensureCreated\n\t/remote-source/app/controllers/storagecluster/external_resources.go:257\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).reconcilePhases\n\t/remote-source/app/controllers/storagecluster/reconcile.go:402\ngithub.com/red-hat-storage/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).Reconcile\n\t/remote-source/app/controllers/storagecluster/reconcile.go:161\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214"}
Must-gather collected after upgrading provider cluster to ODF 4.11:
Provider cluster must-gather - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j29-pr/jijoy-j29-pr_20220629T074927/logs/testcases_1656511818/
Consumer cluster must-gather - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j29-c1/jijoy-j29-c1_20220629T091015/logs/testcases_1656511889/
Verification steps with the new fix: As soon as we upgrade the provider, a consumer should not lose the connection with the provider. There should be only one service with the load balancer type. We should be able to use this service for node port access as well via using the worker nodes also with the new load balancer endpoint. We should be able to connect to the provider after changing the EP (load balancer) in the consumer. (In reply to Nitin Goyal from comment #5) > Verification steps with the new fix: > > As soon as we upgrade the provider, a consumer should not lose the > connection with the provider. Verified this after upgrading the provider cluster ODF 4.10.4 to 4.11.0-113. > There should be only one service with the load balancer type. We should be > able to use this service for node port access as well via using the worker > nodes also with the new load balancer endpoint. > We should be able to connect to the provider after changing the EP (load > balancer) in the consumer. Hi Nitin, Is this an automatic process ? Verified in version:
ODF 4.11.0-113
OCP 4.10.20
ocs-osd-deployer.v2.0.3
Upgraded provider and consumer cluster successfully from ODF 4.10.4 to 4.11.0-113
Before upgrading the provider custer:
$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
addon-ocs-provider-qe-catalog ClusterIP 172.30.45.237 <none> 50051/TCP 5h32m
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 5h30m
csi-addons-controller-manager-metrics-service ClusterIP 172.30.2.154 <none> 8443/TCP 5h31m
noobaa-operator-service ClusterIP 172.30.212.211 <none> 443/TCP 5h31m
ocs-metrics-exporter ClusterIP 172.30.190.198 <none> 8080/TCP,8081/TCP 5h30m
ocs-osd-controller-manager-metrics-service ClusterIP 172.30.156.38 <none> 8443/TCP 5h31m
ocs-provider-server NodePort 172.30.232.83 <none> 50051:31659/TCP 5h30m
odf-console-service ClusterIP 172.30.187.160 <none> 9001/TCP 5h31m
odf-operator-controller-manager-metrics-service ClusterIP 172.30.112.169 <none> 8443/TCP 5h31m
prometheus ClusterIP 172.30.184.104 <none> 9339/TCP 5h30m
prometheus-operated ClusterIP None <none> 9090/TCP 5h30m
rook-ceph-mgr ClusterIP 172.30.79.251 <none> 9283/TCP 5h24m
After upgrading the provider cluster:
$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
addon-ocs-provider-qe-catalog ClusterIP 172.30.45.237 <none> 50051/TCP 5h52m
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 5h50m
csi-addons-controller-manager-metrics-service ClusterIP 172.30.2.154 <none> 8443/TCP 5h52m
ocs-metrics-exporter ClusterIP 172.30.190.198 <none> 8080/TCP,8081/TCP 5h50m
ocs-osd-controller-manager-metrics-service ClusterIP 172.30.156.38 <none> 8443/TCP 5h51m
ocs-provider-server LoadBalancer 172.30.232.83 ab5f87e5fef9e44f89b12eab3be358c2-1799690877.us-east-1.elb.amazonaws.com 50051:31659/TCP 5h50m
odf-console-service ClusterIP 172.30.187.160 <none> 9001/TCP 5h51m
odf-operator-controller-manager-metrics-service ClusterIP 172.30.112.169 <none> 8443/TCP 5h51m
prometheus ClusterIP 172.30.184.104 <none> 9339/TCP 5h50m
prometheus-operated ClusterIP None <none> 9090/TCP 5h50m
rook-ceph-mgr ClusterIP 172.30.79.251 <none> 9283/TCP 5h44m
From provider cluster after upgrade:
$ oc get storagecluster
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-storagecluster 9h Ready 2022-07-13T05:13:47Z
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
mcg-operator.v4.11.0 NooBaa Operator 4.11.0 mcg-operator.v4.10.4 Succeeded
ocs-operator.v4.11.0 OpenShift Container Storage 4.11.0 ocs-operator.v4.10.4 Succeeded
ocs-osd-deployer.v2.0.3 OCS OSD Deployer 2.0.3 ocs-osd-deployer.v2.0.2 Succeeded
odf-csi-addons-operator.v4.11.0 CSI Addons 4.11.0 odf-csi-addons-operator.v4.10.4 Succeeded
odf-operator.v4.11.0 OpenShift Data Foundation 4.11.0 odf-operator.v4.10.4 Succeeded
ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded
route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded
From consumer cluster after upgrade:
$ oc get storagecluster
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-storagecluster 5h50m Ready true 2022-07-13T08:59:25Z
$ oc get csv
NAME DISPLAY VERSION REPLACES PHASE
mcg-operator.v4.11.0 NooBaa Operator 4.11.0 mcg-operator.v4.10.4 Succeeded
ocs-operator.v4.11.0 OpenShift Container Storage 4.11.0 ocs-operator.v4.10.4 Succeeded
ocs-osd-deployer.v2.0.3 OCS OSD Deployer 2.0.3 ocs-osd-deployer.v2.0.2 Succeeded
odf-csi-addons-operator.v4.11.0 CSI Addons 4.11.0 odf-csi-addons-operator.v4.10.4 Succeeded
odf-operator.v4.11.0 OpenShift Data Foundation 4.11.0 odf-operator.v4.10.4 Succeeded
ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded
route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded
Adding must-gather logs for reference:
Must gather before upgrading provider and consumer cluster from ODF 4.10.4 to 4.11.0-113:
Consumer http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-c3/jijoy-j13-c3_20220713T081317/logs/testcases_1657705862/
Provider http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-pr/jijoy-j13-pr_20220713T043423/logs/testcases_1657705913/
Provider upgrade test report:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-pr/jijoy-j13-pr_20220713T043423/logs/test_report_1657709000.html
Test run to create PVCs and pod after upgrading provider cluster and before upgrading consumer cluster:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-c3/jijoy-j13-c3_20220713T081317/logs/test_report_1657710514.html
Must gather logs collected after upgrading the provider cluster to ODF 4.11.0-113:
Consumer http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-c3/jijoy-j13-c3_20220713T081317/logs/testcases_1657715380/
Provider http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-pr/jijoy-j13-pr_20220713T043423/logs/testcases_1657715387/
Consumer upgrade test report:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-c3/jijoy-j13-c3_20220713T081317/logs/test_report_1657717959.html
Must gather logs collected after upgrading the consumer cluster to ODF 4.11.0-113:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-j13-c3/jijoy-j13-c3_20220713T081317/logs/testcases_1657721461/
(In reply to Jilju Joy from comment #6) > (In reply to Nitin Goyal from comment #5) > > Verification steps with the new fix: > > > > As soon as we upgrade the provider, a consumer should not lose the > > connection with the provider. > Verified this after upgrading the provider cluster ODF 4.10.4 to 4.11.0-113. > > > There should be only one service with the load balancer type. We should be > > able to use this service for node port access as well via using the worker > > nodes also with the new load balancer endpoint. > > > We should be able to connect to the provider after changing the EP (load > > balancer) in the consumer. > Hi Nitin, > Is this an automatic process ? No this is a manual process. Storagecluster CR endpoint needs to be changed manually on the consumer cluster. You can get this new endpoint in the storagecluster CR status from the provider cluster. (In reply to Nitin Goyal from comment #8) > (In reply to Jilju Joy from comment #6) > > (In reply to Nitin Goyal from comment #5) > > > Verification steps with the new fix: > > > > > > As soon as we upgrade the provider, a consumer should not lose the > > > connection with the provider. > > Verified this after upgrading the provider cluster ODF 4.10.4 to 4.11.0-113. > > > > > There should be only one service with the load balancer type. We should be > > > able to use this service for node port access as well via using the worker > > > nodes also with the new load balancer endpoint. > > > > > We should be able to connect to the provider after changing the EP (load > > > balancer) in the consumer. > > Hi Nitin, > > Is this an automatic process ? > > No this is a manual process. Storagecluster CR endpoint needs to be changed > manually on the consumer cluster. You can get this new endpoint in the > storagecluster CR status from the provider cluster. Hi Neha, FYI If this manual step is not done, we will hit the issue described in the bug #2060487 |