Submariner was reinstalled on a an existing setup. When submariner was removed, it deleted all the exported services. When globalnet operator started again, it created all the exported services but this time with different IPs. For example: cluster c1 had exported IP for a service as 242.1.255.* but after globalnet was reinstalled, the exported services were recreated but this time with 242.0.255.*. Rook is still using 242.1.255.* saved in the config map and has no clue about 242.0.255.* In rook we don't really allow mon IP's to change. That's not a supported case. For global IPs scenario, we allow it by failing over one mon at a time. But this is a different situation, all the mon (global) IPs got changed at once when Submariner was reinstalled. @tnielsen Do you think this can be a scenario that rook should support?
Used the following steps to add mons back to quorum. Here we edit only one mon and let rook operator to failover other mons. Obtain the following information from the cluster fsid mon e exported IP: This can be optioned from `oc get service | grep submariner`. Lets say the exported IP for mon-e is 242.0.255.251 in this case - Scale down OCS operator and Rook deployments oc scale deployment ocs-operator --replicas=0 -n openshift-storage oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage - Update mon deployment to use correct exported IP in `spec.containers[0].args.public_addr` `--public-addr=242.0.255.251` - Copy mon-e deployment oc get deployment rook-ceph-mon-e -o yaml > rook-ceph-mon-e-deployment-c1.yaml - Edit rook-ceph-mon-endpoints to use correct exported IP for mon-e - Patch the rook-ceph-mon-e Deployment to stop this mon working without deleting the mon pod: kubectl patch deployment rook-ceph-mon-e --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' kubectl patch deployment rook-ceph-mon-e -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}' - Connect to mon-e pod: oc exec -it <rook-ceph-mon-e> sh - Inside mon-e pod: - Create a temporary monmap monmaptool --create --add e 242.0.255.251 --set-min-mon-release --enable--all-features --clobber /tmp/monmap --fsid <ceph fsid> - Remove this mon-e entry monmaptool --rm e /tmp/monmap - Add v2 protocol (Add V1 protocol as well if cluster supports both) monmaptool --addv e [v2:242.0.255.251:3300] /tmp/monmap - inject this monmap to mon-e ceph-mon -i e --inject-monmap /tmp/monmap - Exit mon-e pod - Scale back ocs and rook deployments: oc scale deployment ocs-operator --replicas=1 -n openshift-storage oc scale deployment rook-ceph-operator --replicas=1 -n openshift-storage - Wait for rook operator to failover other mons
c1 cluster is `healthy` now after using above workaround ```sh-5.1$ ceph status cluster: id: 6bee5946-d3e4-4999-8110-24ed4325fbe2 health: HEALTH_OK services: mon: 3 daemons, quorum e,g,h (age 21m) mgr: a(active, since 24m) mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3 up (since 21m), 3 in (since 10d) rbd-mirror: 1 daemon active (1 hosts) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 169 pgs objects: 3.05k objects, 3.3 GiB usage: 5.9 GiB used, 1.5 TiB / 1.5 TiB avail pgs: 169 active+clean io: client: 31 KiB/s rd, 1.5 MiB/s wr, 36 op/s rd, 322 op/s wr ``` c2 has daemons crashing but mon's are up now. ``` sh-5.1$ ceph status cluster: id: c2c61349-f7b5-47c5-8fd6-f687ea46b450 health: HEALTH_WARN 1599 daemons have recently crashed services: mon: 3 daemons, quorum e,h,i (age 32m) mgr: a(active, since 34m) mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3 up (since 33m), 3 in (since 10d) rbd-mirror: 1 daemon active (1 hosts) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 169 pgs objects: 2.89k objects, 4.2 GiB usage: 12 GiB used, 1.5 TiB / 1.5 TiB avail pgs: 169 active+clean io: client: 17 KiB/s rd, 64 KiB/s wr, 21 op/s rd, 6 op/s wr ```
(In reply to Santosh Pillai from comment #5) > Used the following steps to add mons back to quorum. Here we edit only one > mon and let rook operator to failover other mons. > > Obtain the following information from the cluster > fsid > mon e exported IP: This can be optioned from `oc get service | grep > submariner`. Lets say the exported IP for mon-e is 242.0.255.251 in this case > > - Scale down OCS operator and Rook deployments > oc scale deployment ocs-operator --replicas=0 -n openshift-storage > oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage > > - Update mon deployment to use correct exported IP in > `spec.containers[0].args.public_addr` > `--public-addr=242.0.255.251` > > - Copy mon-e deployment > oc get deployment rook-ceph-mon-e -o yaml > > rook-ceph-mon-e-deployment-c1.yaml > > - Edit rook-ceph-mon-endpoints to use correct exported IP for mon-e > > - Patch the rook-ceph-mon-e Deployment to stop this mon working without > deleting the mon pod: > kubectl patch deployment rook-ceph-mon-e --type='json' -p > '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' > kubectl patch deployment rook-ceph-mon-e -p '{"spec": {"template": > {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], > "args": []}]}}}}' > > - Connect to mon-e pod: > oc exec -it <rook-ceph-mon-e> sh > > - Inside mon-e pod: > - Create a temporary monmap > monmaptool --create --add e 242.0.255.251 --set-min-mon-release > --enable--all-features --clobber /tmp/monmap --fsid <ceph fsid> > - Remove this mon-e entry > monmaptool --rm e /tmp/monmap > - Add v2 protocol (Add V1 protocol as well if cluster supports both) > monmaptool --addv e [v2:242.0.255.251:3300] /tmp/monmap > - inject this monmap to mon-e > ceph-mon -i e --inject-monmap /tmp/monmap > - Exit mon-e pod > > - Scale back ocs and rook deployments: > oc scale deployment ocs-operator --replicas=1 -n openshift-storage > oc scale deployment rook-ceph-operator --replicas=1 -n openshift-storage > > - Wait for rook operator to failover other mons One last step is to restart rbd mirror pods on both the clusters. Mirroring health is ok on both the clusters now. oc get cephblockpool ocs-storagecluster-cephblockpool -n openshift-storage -o jsonpath='{.status.mirroringStatus.summary}{"\n"}' {"daemon_health":"OK","health":"OK","image_health":"OK","states":{"replaying":20}}
Santosh great to see the workaround for getting the cluster back up in this scenario of reinstalling Submariner. This scenario is very disruptive. Ceph requires immutable IP addresses for the mons. We cannot support this scenario automatically in Rook. The only way we can hope to support this scenario is that if/when it happens in production, they will need to contact the support team to step through these complicated recovery steps. Even better if we can get this recovery working with the krew plugin, which will just need an addition to the existing --restore-quorum command to support the changing IP. Then there is just the separate question of the best way for the support team to use the krew plugin (or an alternative) that is fully tested by QE.
Based on previous comments, moving out of 4.13 since we can't support anything except the customer working with support for disaster recovery steps.
Hi Vikhyat Did you get a chance to check the last comment by Travis regarding doc support?
(In reply to Santosh Pillai from comment #29) > Hi Vikhyat > > Did you get a chance to check the last comment by Travis regarding doc > support? Hi Santosh, Yes, updating IP should be easy - this is documented here https://access.redhat.com/solutions/3093781 for standalone clusters. I think the basic steps should be the same for ODF. Adding @assingh who can help from ODF side.
(In reply to Vikhyat Umrao from comment #30) > (In reply to Santosh Pillai from comment #29) > > Hi Vikhyat > > > > Did you get a chance to check the last comment by Travis regarding doc > > support? > > Hi Santosh, > > Yes, updating IP should be easy - this is documented here > https://access.redhat.com/solutions/3093781 for standalone clusters. I think > the basic steps should be the same for ODF. Adding @assingh who > can help from ODF side. Ahh, I see in comment#5 you are already able to achieve it and the question is do we need to document it or not? I think yes we should document it @
@
@bkunal and Ashish - can you please check from the KCS point of view?
Thanks for the doc Bipin. I'll take a look at the doc tomorrow.
Hi Bipin, This is tested only once (which didn't go well if I remember correctly) and I suspect we might hit other issues (during hub recovery or node failure scenarios) and thus suggest to have it re-tested (may need a tracker and an assignee). Seeking thoughts.
Santosh, please work on the KCS as suggested by Travis https://access.redhat.com/node/add/kcs-solution
Observed the same behaviour when tried to uninstall and reinstall submariner in cluster replacement scenario. Mds pods are also in CLBO state. All the ceph commands are stuck. Submariner connection status was also degraded. rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-74675cc7jhnd6 1/2 CrashLoopBackOff 141 (2m58s ago) 14h rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6bf88858r9nbc 1/2 CrashLoopBackOff 141 (53s ago) 14h ___________________________________________________________________________________________________________________ oc get cm rook-ceph-mon-endpoints -o yaml apiVersion: v1 data: csi-cluster-config-json: '[{"clusterID":"openshift-storage","monitors":["242.1.255.248:3300","242.1.255.250:3300","242.1.255.249:3300"],"cephFS":{"netNamespaceFilePath":"","subvolumeGroup":"","kernelMountOptions":"","fuseMountOptions":""},"rbd":{"netNamespaceFilePath":"","radosNamespace":""},"nfs":{"netNamespaceFilePath":""},"readAffinity":{"enabled":false,"crushLocationLabels":null},"namespace":""}]' data: f=242.1.255.248:3300,d=242.1.255.250:3300,e=242.1.255.249:3300 mapping: '{"node":{"d":null,"e":null,"f":null}}' maxMonId: "5" outOfQuorum: "" kind: ConfigMap metadata: creationTimestamp: "2024-04-28T19:39:59Z" finalizers: - ceph.rook.io/disaster-protection name: rook-ceph-mon-endpoints namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true controller: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 1937512d-7903-4bfc-bfe7-90523c26662e resourceVersion: "102498" uid: e172ee65-8cbf-4ce5-b5ff-bb62e9e62db7 ______________________________________________________________________________ oc get service | grep submariner submariner-3qslc37nybfedm2rae7lou4wkknd25iu ClusterIP 172.30.21.74 242.0.255.252 3300/TCP 14h submariner-alzs2tkhcukgmx55rmttuo6o22vrqubb ClusterIP 172.30.222.93 242.0.255.253 3300/TCP 14h submariner-cpqogstf25uynxvpgw4u34ak42mvdmdb ClusterIP 172.30.192.247 242.0.255.249 6800/TCP 14h submariner-vui6efiepwvd4jr4b7gjvmuicpviuwqg ClusterIP 172.30.181.231 242.0.255.250 6800/TCP 14h submariner-w4mfbcdvdi2oqcadnvzhxdwg75glri52 ClusterIP 172.30.9.17 242.0.255.254 3300/TCP 14h submariner-zoh2rvm7zupu5kldmqt6mduidrg6wek2 ClusterIP 172.30.120.52 242.0.255.251 6800/TCP 14h _________________________________________________________________________________________________________________________ oc get service submariner-cpqogstf25uynxvpgw4u34ak42mvdmdb -o yaml apiVersion: v1 kind: Service metadata: creationTimestamp: "2024-04-28T19:58:06Z" finalizers: - submariner.io/globalnet-internal-service labels: submariner.io/exportedServiceRef: rook-ceph-osd-2 name: submariner-cpqogstf25uynxvpgw4u34ak42mvdmdb namespace: openshift-storage resourceVersion: "168660" uid: 7c930a7e-3d7f-4e11-8c15-bf5f163654a4 spec: clusterIP: 172.30.192.247 clusterIPs: - 172.30.192.247 externalIPs: - 242.0.255.249 externalTrafficPolicy: Cluster internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - name: osd-port-v2 port: 6800 protocol: TCP targetPort: 6800 selector: app: rook-ceph-osd ceph-osd-id: "2" sessionAffinity: None type: ClusterIP status: loadBalancer: {} ______________________________________________________________________________________________________________ Noticed that the different IPs being assigned after re-installation of submariner ODF - 4.16.0-77 ACM - 2.10.2 Submariner - 0.17.0 OCP - 4.16 subtcl gather logs - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/keerthana/4.16/cluster_replacemnet/submariner/submariner-20240429084042/ subctl verify - https://privatebin.corp.redhat.com/?62e391efec155b9c#775F4X1YYPLeUNUemdFV6XBT4mja2yUnTB8TH8vfvBGL
*** Bug 2277936 has been marked as a duplicate of this bug. ***
Tried the manual workaround on a new cluster. The mons got failed over to using correct global IPs. Had to update the `--public-addr` argument in the osd deployments. Only problem I see now is one of the OSD is not coming up due to ``` Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 143m default-scheduler Successfully assigned openshift-storage/rook-ceph-osd-1-7cf9d97d6-5ww9r to compute-2 Warning FailedMapVolume 2m57s (x70 over 143m) kubelet MapVolume.MapPodDevice failed for volume "pvc-5bf80e1b-f943-44cc-b447-6dfdd1080fd1" : rpc error: code = AlreadyExists desc = block volume already mounted in more than one place ``` which might a different issue altogether! Ceph status: ``` oc rsh -n openshift-storage $(oc get pods -o wide -n openshift-storage|grep rook-ceph-operator|awk '{print$1}') ceph --conf=/var/lib/rook/openshift-storage/openshift-storage.config status cluster: id: 084efd57-e82f-4db6-ae39-f005f98c815b health: HEALTH_WARN 1 osds down 1 host (1 osds) down 1 rack (1 osds) down Degraded data redundancy: 4165/12495 objects degraded (33.333%), 115 pgs degraded, 169 pgs undersized services: mon: 3 daemons, quorum d,g,h (age 3h) mgr: a(active, since 104m), standbys: b mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 2 up (since 2h), 3 in (since 3d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 169 pgs objects: 4.17k objects, 8.3 GiB usage: 16 GiB used, 4.0 TiB / 4 TiB avail pgs: 4165/12495 objects degraded (33.333%) 115 active+undersized+degraded 54 active+undersized io: client: 938 B/s rd, 2.2 KiB/s wr, 1 op/s rd, 0 op/s wr ``` still investigating.
(In reply to Santosh Pillai from comment #49) > Tried the manual workaround on a new cluster. The mons got failed over to > using correct global IPs. Had to update the `--public-addr` argument in the > osd deployments. > > Only problem I see now is one of the OSD is not coming up due to > ``` > Events: > Type Reason Age From Message > ---- ------ ---- ---- ------- > Normal Scheduled 143m default-scheduler > Successfully assigned openshift-storage/rook-ceph-osd-1-7cf9d97d6-5ww9r to > compute-2 > Warning FailedMapVolume 2m57s (x70 over 143m) kubelet > MapVolume.MapPodDevice failed for volume > "pvc-5bf80e1b-f943-44cc-b447-6dfdd1080fd1" : rpc error: code = AlreadyExists > desc = block volume already mounted in more than one place > ``` > > which might a different issue altogether! > > Ceph status: > ``` > oc rsh -n openshift-storage $(oc get pods -o wide -n openshift-storage|grep > rook-ceph-operator|awk '{print$1}') ceph > --conf=/var/lib/rook/openshift-storage/openshift-storage.config status > cluster: > id: 084efd57-e82f-4db6-ae39-f005f98c815b > health: HEALTH_WARN > 1 osds down > 1 host (1 osds) down > 1 rack (1 osds) down > Degraded data redundancy: 4165/12495 objects degraded (33.333%), > 115 pgs degraded, 169 pgs undersized > > services: > mon: 3 daemons, quorum d,g,h (age 3h) > mgr: a(active, since 104m), standbys: b > mds: 1/1 daemons up, 1 hot standby > osd: 3 osds: 2 up (since 2h), 3 in (since 3d) > rgw: 1 daemon active (1 hosts, 1 zones) > > data: > volumes: 1/1 healthy > pools: 12 pools, 169 pgs > objects: 4.17k objects, 8.3 GiB > usage: 16 GiB used, 4.0 TiB / 4 TiB avail > pgs: 4165/12495 objects degraded (33.333%) > 115 active+undersized+degraded > 54 active+undersized > > io: > client: 938 B/s rd, 2.2 KiB/s wr, 1 op/s rd, 0 op/s wr > > ``` > > still investigating. This could be some issue with the env because I was not able to ssh into the node `compute-2` where the OSD is failing to get started.
cluster health is ok after restarting the `compute-2` node mentioned in above comment. ``` ux-backend-server-65849c7f6c-vw9bc 2/2 Running 0 3d19h ❯ oc rsh -n openshift-storage $(oc get pods -o wide -n openshift-storage|grep rook-ceph-operator|awk '{print$1}') ceph --conf=/var/lib/rook/openshift-storage/openshift-storage.config status cluster: id: 084efd57-e82f-4db6-ae39-f005f98c815b health: HEALTH_OK services: mon: 3 daemons, quorum d,g,h (age 53m) mgr: a(active, since 42m), standbys: b mds: 1/1 daemons up, 1 hot standby osd: 3 osds: 3 up (since 53m), 3 in (since 3d) rgw: 1 daemon active (1 hosts, 1 zones) data: volumes: 1/1 healthy pools: 12 pools, 169 pgs objects: 4.17k objects, 8.3 GiB usage: 24 GiB used, 6.0 TiB / 6 TiB avail pgs: 169 active+clean io: client: 852 B/s rd, 2.0 KiB/s wr, 1 op/s rd, 0 op/s wr ```
Santosh, what are the next steps for this BZ?
(In reply to Mudit Agarwal from comment #52) > Santosh, what are the next steps for this BZ? Testing is still in progress for DR cluster with globalnet
moving this back to on_QA based on comment #51
Awesome. Thanks Annette for following up with the Submariner team. Good to know that cluster replacement can work without deleting submariner-globalnet on the surviving cluster. This will help bypass the issue and hence the workaround mentioned in this BZ for the cluster-replacement scenarios.
Please update the RDT flag/text appropriately.
Since we now a different approach to uninstall and reinstall the submariner, we won't need the workaround mentioned in the #comment 5. So we can safely close this BZ. We do need a doc BZ to elaborate the uninstall and reinstall of the submariner.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days