Bug 2196628
| Summary: | [RDR] [Globalnet enabled] Rook ceph mon endpoints are not updated with new ips when submariner is re-installed | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Aman Agrawal <amagrawa> |
| Component: | rook | Assignee: | Santosh Pillai <sapillai> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | kmanohar |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.13 | CC: | aclewett, asriram, assingh, asuryana, bkunal, kmanohar, kramdoss, muagarwa, nyechiel, odf-bz-bot, rtalur, sagrawal, sapillai, sgaddam, skitt, tnielsen, vthapar, vumrao |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-09-24 07:35:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 4
Santosh Pillai
2023-05-11 02:51:02 UTC
Used the following steps to add mons back to quorum. Here we edit only one mon and let rook operator to failover other mons.
Obtain the following information from the cluster
fsid
mon e exported IP: This can be optioned from `oc get service | grep submariner`. Lets say the exported IP for mon-e is 242.0.255.251 in this case
- Scale down OCS operator and Rook deployments
oc scale deployment ocs-operator --replicas=0 -n openshift-storage
oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage
- Update mon deployment to use correct exported IP in `spec.containers[0].args.public_addr`
`--public-addr=242.0.255.251`
- Copy mon-e deployment
oc get deployment rook-ceph-mon-e -o yaml > rook-ceph-mon-e-deployment-c1.yaml
- Edit rook-ceph-mon-endpoints to use correct exported IP for mon-e
- Patch the rook-ceph-mon-e Deployment to stop this mon working without deleting the mon pod:
kubectl patch deployment rook-ceph-mon-e --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]'
kubectl patch deployment rook-ceph-mon-e -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'
- Connect to mon-e pod:
oc exec -it <rook-ceph-mon-e> sh
- Inside mon-e pod:
- Create a temporary monmap
monmaptool --create --add e 242.0.255.251 --set-min-mon-release --enable--all-features --clobber /tmp/monmap --fsid <ceph fsid>
- Remove this mon-e entry
monmaptool --rm e /tmp/monmap
- Add v2 protocol (Add V1 protocol as well if cluster supports both)
monmaptool --addv e [v2:242.0.255.251:3300] /tmp/monmap
- inject this monmap to mon-e
ceph-mon -i e --inject-monmap /tmp/monmap
- Exit mon-e pod
- Scale back ocs and rook deployments:
oc scale deployment ocs-operator --replicas=1 -n openshift-storage
oc scale deployment rook-ceph-operator --replicas=1 -n openshift-storage
- Wait for rook operator to failover other mons
c1 cluster is `healthy` now after using above workaround
```sh-5.1$ ceph status
cluster:
id: 6bee5946-d3e4-4999-8110-24ed4325fbe2
health: HEALTH_OK
services:
mon: 3 daemons, quorum e,g,h (age 21m)
mgr: a(active, since 24m)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 21m), 3 in (since 10d)
rbd-mirror: 1 daemon active (1 hosts)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 169 pgs
objects: 3.05k objects, 3.3 GiB
usage: 5.9 GiB used, 1.5 TiB / 1.5 TiB avail
pgs: 169 active+clean
io:
client: 31 KiB/s rd, 1.5 MiB/s wr, 36 op/s rd, 322 op/s wr
```
c2 has daemons crashing but mon's are up now.
```
sh-5.1$ ceph status
cluster:
id: c2c61349-f7b5-47c5-8fd6-f687ea46b450
health: HEALTH_WARN
1599 daemons have recently crashed
services:
mon: 3 daemons, quorum e,h,i (age 32m)
mgr: a(active, since 34m)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 33m), 3 in (since 10d)
rbd-mirror: 1 daemon active (1 hosts)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 169 pgs
objects: 2.89k objects, 4.2 GiB
usage: 12 GiB used, 1.5 TiB / 1.5 TiB avail
pgs: 169 active+clean
io:
client: 17 KiB/s rd, 64 KiB/s wr, 21 op/s rd, 6 op/s wr
```
(In reply to Santosh Pillai from comment #5) > Used the following steps to add mons back to quorum. Here we edit only one > mon and let rook operator to failover other mons. > > Obtain the following information from the cluster > fsid > mon e exported IP: This can be optioned from `oc get service | grep > submariner`. Lets say the exported IP for mon-e is 242.0.255.251 in this case > > - Scale down OCS operator and Rook deployments > oc scale deployment ocs-operator --replicas=0 -n openshift-storage > oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage > > - Update mon deployment to use correct exported IP in > `spec.containers[0].args.public_addr` > `--public-addr=242.0.255.251` > > - Copy mon-e deployment > oc get deployment rook-ceph-mon-e -o yaml > > rook-ceph-mon-e-deployment-c1.yaml > > - Edit rook-ceph-mon-endpoints to use correct exported IP for mon-e > > - Patch the rook-ceph-mon-e Deployment to stop this mon working without > deleting the mon pod: > kubectl patch deployment rook-ceph-mon-e --type='json' -p > '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]' > kubectl patch deployment rook-ceph-mon-e -p '{"spec": {"template": > {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], > "args": []}]}}}}' > > - Connect to mon-e pod: > oc exec -it <rook-ceph-mon-e> sh > > - Inside mon-e pod: > - Create a temporary monmap > monmaptool --create --add e 242.0.255.251 --set-min-mon-release > --enable--all-features --clobber /tmp/monmap --fsid <ceph fsid> > - Remove this mon-e entry > monmaptool --rm e /tmp/monmap > - Add v2 protocol (Add V1 protocol as well if cluster supports both) > monmaptool --addv e [v2:242.0.255.251:3300] /tmp/monmap > - inject this monmap to mon-e > ceph-mon -i e --inject-monmap /tmp/monmap > - Exit mon-e pod > > - Scale back ocs and rook deployments: > oc scale deployment ocs-operator --replicas=1 -n openshift-storage > oc scale deployment rook-ceph-operator --replicas=1 -n openshift-storage > > - Wait for rook operator to failover other mons One last step is to restart rbd mirror pods on both the clusters. Mirroring health is ok on both the clusters now. oc get cephblockpool ocs-storagecluster-cephblockpool -n openshift-storage -o jsonpath='{.status.mirroringStatus.summary}{"\n"}' {"daemon_health":"OK","health":"OK","image_health":"OK","states":{"replaying":20}} Santosh great to see the workaround for getting the cluster back up in this scenario of reinstalling Submariner. This scenario is very disruptive. Ceph requires immutable IP addresses for the mons. We cannot support this scenario automatically in Rook. The only way we can hope to support this scenario is that if/when it happens in production, they will need to contact the support team to step through these complicated recovery steps. Even better if we can get this recovery working with the krew plugin, which will just need an addition to the existing --restore-quorum command to support the changing IP. Then there is just the separate question of the best way for the support team to use the krew plugin (or an alternative) that is fully tested by QE. Based on previous comments, moving out of 4.13 since we can't support anything except the customer working with support for disaster recovery steps. Hi Vikhyat Did you get a chance to check the last comment by Travis regarding doc support? (In reply to Santosh Pillai from comment #29) > Hi Vikhyat > > Did you get a chance to check the last comment by Travis regarding doc > support? Hi Santosh, Yes, updating IP should be easy - this is documented here https://access.redhat.com/solutions/3093781 for standalone clusters. I think the basic steps should be the same for ODF. Adding @assingh who can help from ODF side. (In reply to Vikhyat Umrao from comment #30) > (In reply to Santosh Pillai from comment #29) > > Hi Vikhyat > > > > Did you get a chance to check the last comment by Travis regarding doc > > support? > > Hi Santosh, > > Yes, updating IP should be easy - this is documented here > https://access.redhat.com/solutions/3093781 for standalone clusters. I think > the basic steps should be the same for ODF. Adding @assingh who > can help from ODF side. Ahh, I see in comment#5 you are already able to achieve it and the question is do we need to document it or not? I think yes we should document it @ @ @bkunal and Ashish - can you please check from the KCS point of view? Thanks for the doc Bipin. I'll take a look at the doc tomorrow. Hi Bipin, This is tested only once (which didn't go well if I remember correctly) and I suspect we might hit other issues (during hub recovery or node failure scenarios) and thus suggest to have it re-tested (may need a tracker and an assignee). Seeking thoughts. Santosh, please work on the KCS as suggested by Travis https://access.redhat.com/node/add/kcs-solution Observed the same behaviour when tried to uninstall and reinstall submariner in cluster replacement scenario. Mds pods are also in CLBO state. All the ceph commands are stuck. Submariner connection status was also degraded.
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-74675cc7jhnd6 1/2 CrashLoopBackOff 141 (2m58s ago) 14h
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6bf88858r9nbc 1/2 CrashLoopBackOff 141 (53s ago) 14h
___________________________________________________________________________________________________________________
oc get cm rook-ceph-mon-endpoints -o yaml
apiVersion: v1
data:
csi-cluster-config-json: '[{"clusterID":"openshift-storage","monitors":["242.1.255.248:3300","242.1.255.250:3300","242.1.255.249:3300"],"cephFS":{"netNamespaceFilePath":"","subvolumeGroup":"","kernelMountOptions":"","fuseMountOptions":""},"rbd":{"netNamespaceFilePath":"","radosNamespace":""},"nfs":{"netNamespaceFilePath":""},"readAffinity":{"enabled":false,"crushLocationLabels":null},"namespace":""}]'
data: f=242.1.255.248:3300,d=242.1.255.250:3300,e=242.1.255.249:3300
mapping: '{"node":{"d":null,"e":null,"f":null}}'
maxMonId: "5"
outOfQuorum: ""
kind: ConfigMap
metadata:
creationTimestamp: "2024-04-28T19:39:59Z"
finalizers:
- ceph.rook.io/disaster-protection
name: rook-ceph-mon-endpoints
namespace: openshift-storage
ownerReferences:
- apiVersion: ceph.rook.io/v1
blockOwnerDeletion: true
controller: true
kind: CephCluster
name: ocs-storagecluster-cephcluster
uid: 1937512d-7903-4bfc-bfe7-90523c26662e
resourceVersion: "102498"
uid: e172ee65-8cbf-4ce5-b5ff-bb62e9e62db7
______________________________________________________________________________
oc get service | grep submariner
submariner-3qslc37nybfedm2rae7lou4wkknd25iu ClusterIP 172.30.21.74 242.0.255.252 3300/TCP 14h
submariner-alzs2tkhcukgmx55rmttuo6o22vrqubb ClusterIP 172.30.222.93 242.0.255.253 3300/TCP 14h
submariner-cpqogstf25uynxvpgw4u34ak42mvdmdb ClusterIP 172.30.192.247 242.0.255.249 6800/TCP 14h
submariner-vui6efiepwvd4jr4b7gjvmuicpviuwqg ClusterIP 172.30.181.231 242.0.255.250 6800/TCP 14h
submariner-w4mfbcdvdi2oqcadnvzhxdwg75glri52 ClusterIP 172.30.9.17 242.0.255.254 3300/TCP 14h
submariner-zoh2rvm7zupu5kldmqt6mduidrg6wek2 ClusterIP 172.30.120.52 242.0.255.251 6800/TCP 14h
_________________________________________________________________________________________________________________________
oc get service submariner-cpqogstf25uynxvpgw4u34ak42mvdmdb -o yaml
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2024-04-28T19:58:06Z"
finalizers:
- submariner.io/globalnet-internal-service
labels:
submariner.io/exportedServiceRef: rook-ceph-osd-2
name: submariner-cpqogstf25uynxvpgw4u34ak42mvdmdb
namespace: openshift-storage
resourceVersion: "168660"
uid: 7c930a7e-3d7f-4e11-8c15-bf5f163654a4
spec:
clusterIP: 172.30.192.247
clusterIPs:
- 172.30.192.247
externalIPs:
- 242.0.255.249
externalTrafficPolicy: Cluster
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- name: osd-port-v2
port: 6800
protocol: TCP
targetPort: 6800
selector:
app: rook-ceph-osd
ceph-osd-id: "2"
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
______________________________________________________________________________________________________________
Noticed that the different IPs being assigned after re-installation of submariner
ODF - 4.16.0-77
ACM - 2.10.2
Submariner - 0.17.0
OCP - 4.16
subtcl gather logs - http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/keerthana/4.16/cluster_replacemnet/submariner/submariner-20240429084042/
subctl verify - https://privatebin.corp.redhat.com/?62e391efec155b9c#775F4X1YYPLeUNUemdFV6XBT4mja2yUnTB8TH8vfvBGL
*** Bug 2277936 has been marked as a duplicate of this bug. *** Tried the manual workaround on a new cluster. The mons got failed over to using correct global IPs. Had to update the `--public-addr` argument in the osd deployments.
Only problem I see now is one of the OSD is not coming up due to
```
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 143m default-scheduler Successfully assigned openshift-storage/rook-ceph-osd-1-7cf9d97d6-5ww9r to compute-2
Warning FailedMapVolume 2m57s (x70 over 143m) kubelet MapVolume.MapPodDevice failed for volume "pvc-5bf80e1b-f943-44cc-b447-6dfdd1080fd1" : rpc error: code = AlreadyExists desc = block volume already mounted in more than one place
```
which might a different issue altogether!
Ceph status:
```
oc rsh -n openshift-storage $(oc get pods -o wide -n openshift-storage|grep rook-ceph-operator|awk '{print$1}') ceph --conf=/var/lib/rook/openshift-storage/openshift-storage.config status
cluster:
id: 084efd57-e82f-4db6-ae39-f005f98c815b
health: HEALTH_WARN
1 osds down
1 host (1 osds) down
1 rack (1 osds) down
Degraded data redundancy: 4165/12495 objects degraded (33.333%), 115 pgs degraded, 169 pgs undersized
services:
mon: 3 daemons, quorum d,g,h (age 3h)
mgr: a(active, since 104m), standbys: b
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 2 up (since 2h), 3 in (since 3d)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 169 pgs
objects: 4.17k objects, 8.3 GiB
usage: 16 GiB used, 4.0 TiB / 4 TiB avail
pgs: 4165/12495 objects degraded (33.333%)
115 active+undersized+degraded
54 active+undersized
io:
client: 938 B/s rd, 2.2 KiB/s wr, 1 op/s rd, 0 op/s wr
```
still investigating.
(In reply to Santosh Pillai from comment #49) > Tried the manual workaround on a new cluster. The mons got failed over to > using correct global IPs. Had to update the `--public-addr` argument in the > osd deployments. > > Only problem I see now is one of the OSD is not coming up due to > ``` > Events: > Type Reason Age From Message > ---- ------ ---- ---- ------- > Normal Scheduled 143m default-scheduler > Successfully assigned openshift-storage/rook-ceph-osd-1-7cf9d97d6-5ww9r to > compute-2 > Warning FailedMapVolume 2m57s (x70 over 143m) kubelet > MapVolume.MapPodDevice failed for volume > "pvc-5bf80e1b-f943-44cc-b447-6dfdd1080fd1" : rpc error: code = AlreadyExists > desc = block volume already mounted in more than one place > ``` > > which might a different issue altogether! > > Ceph status: > ``` > oc rsh -n openshift-storage $(oc get pods -o wide -n openshift-storage|grep > rook-ceph-operator|awk '{print$1}') ceph > --conf=/var/lib/rook/openshift-storage/openshift-storage.config status > cluster: > id: 084efd57-e82f-4db6-ae39-f005f98c815b > health: HEALTH_WARN > 1 osds down > 1 host (1 osds) down > 1 rack (1 osds) down > Degraded data redundancy: 4165/12495 objects degraded (33.333%), > 115 pgs degraded, 169 pgs undersized > > services: > mon: 3 daemons, quorum d,g,h (age 3h) > mgr: a(active, since 104m), standbys: b > mds: 1/1 daemons up, 1 hot standby > osd: 3 osds: 2 up (since 2h), 3 in (since 3d) > rgw: 1 daemon active (1 hosts, 1 zones) > > data: > volumes: 1/1 healthy > pools: 12 pools, 169 pgs > objects: 4.17k objects, 8.3 GiB > usage: 16 GiB used, 4.0 TiB / 4 TiB avail > pgs: 4165/12495 objects degraded (33.333%) > 115 active+undersized+degraded > 54 active+undersized > > io: > client: 938 B/s rd, 2.2 KiB/s wr, 1 op/s rd, 0 op/s wr > > ``` > > still investigating. This could be some issue with the env because I was not able to ssh into the node `compute-2` where the OSD is failing to get started. cluster health is ok after restarting the `compute-2` node mentioned in above comment.
```
ux-backend-server-65849c7f6c-vw9bc 2/2 Running 0 3d19h
❯ oc rsh -n openshift-storage $(oc get pods -o wide -n openshift-storage|grep rook-ceph-operator|awk '{print$1}') ceph --conf=/var/lib/rook/openshift-storage/openshift-storage.config status
cluster:
id: 084efd57-e82f-4db6-ae39-f005f98c815b
health: HEALTH_OK
services:
mon: 3 daemons, quorum d,g,h (age 53m)
mgr: a(active, since 42m), standbys: b
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 53m), 3 in (since 3d)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 169 pgs
objects: 4.17k objects, 8.3 GiB
usage: 24 GiB used, 6.0 TiB / 6 TiB avail
pgs: 169 active+clean
io:
client: 852 B/s rd, 2.0 KiB/s wr, 1 op/s rd, 0 op/s wr
```
Santosh, what are the next steps for this BZ? (In reply to Mudit Agarwal from comment #52) > Santosh, what are the next steps for this BZ? Testing is still in progress for DR cluster with globalnet moving this back to on_QA based on comment #51 Awesome. Thanks Annette for following up with the Submariner team. Good to know that cluster replacement can work without deleting submariner-globalnet on the surviving cluster. This will help bypass the issue and hence the workaround mentioned in this BZ for the cluster-replacement scenarios. Please update the RDT flag/text appropriately. Since we now a different approach to uninstall and reinstall the submariner, we won't need the workaround mentioned in the #comment 5. So we can safely close this BZ. We do need a doc BZ to elaborate the uninstall and reinstall of the submariner. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |