Bug 2196628

Summary: [RDR] Rook ceph mon endpoints are not updated with new ip's when submariner is re-installed
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Aman Agrawal <amagrawa>
Component: rookAssignee: Santosh Pillai <sapillai>
Status: ASSIGNED --- QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.13CC: assingh, asuryana, bkunal, muagarwa, nyechiel, odf-bz-bot, sapillai, sgaddam, skitt, tnielsen, vthapar, vumrao
Target Milestone: ---Flags: vumrao: needinfo? (assingh)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 4 Santosh Pillai 2023-05-11 02:51:02 UTC
Submariner was reinstalled on a an existing setup. When submariner was removed, it deleted all the exported services.
 
When globalnet operator started again, it created all the exported services but this time with different IPs. 
For example: cluster c1 had exported IP for a service as 242.1.255.* but after globalnet was reinstalled, the exported services were recreated but this time with 242.0.255.*.  Rook is still using 242.1.255.* saved in the config map and has no clue about 242.0.255.*

In rook we don't really allow mon IP's to change. That's not a supported case. For global IPs scenario, we allow it by failing over one mon at a time. But this is a different situation, all the mon (global) IPs got changed at once when Submariner was reinstalled.

@tnielsen Do you think this can be a scenario that rook should support?

Comment 5 Santosh Pillai 2023-05-15 10:27:11 UTC
Used the following steps to add mons back to quorum. Here we edit only one mon and let rook operator to failover other mons.

Obtain the following information from the cluster
fsid 
mon e exported IP: This can be optioned from `oc get service | grep submariner`. Lets say the exported IP for mon-e is 242.0.255.251 in this case

- Scale down OCS operator and Rook deployments
	oc scale deployment ocs-operator --replicas=0 -n openshift-storage
	oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage

- Update mon deployment to use correct exported IP in `spec.containers[0].args.public_addr`
  	`--public-addr=242.0.255.251` 
  	
- Copy mon-e deployment
  	oc get deployment rook-ceph-mon-e -o yaml > rook-ceph-mon-e-deployment-c1.yaml
  	
- Edit rook-ceph-mon-endpoints to use correct exported IP for mon-e

- Patch the rook-ceph-mon-e Deployment to stop this mon working without deleting the mon pod:
	kubectl  patch deployment rook-ceph-mon-e --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]'
	kubectl  patch deployment rook-ceph-mon-e -p '{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}'

- Connect to mon-e pod:
   	oc exec -it <rook-ceph-mon-e> sh

- Inside mon-e pod:
   - Create a temporary monmap
   	monmaptool  --create --add e 242.0.255.251 --set-min-mon-release --enable--all-features --clobber /tmp/monmap --fsid <ceph fsid>
   - Remove this mon-e entry
   	monmaptool --rm e /tmp/monmap
   - Add v2 protocol (Add V1 protocol as well if cluster supports both)
   	monmaptool --addv e [v2:242.0.255.251:3300] /tmp/monmap
   - inject this monmap to mon-e
   	ceph-mon -i e --inject-monmap /tmp/monmap
   - Exit mon-e pod
   
- Scale back ocs and rook deployments:
 	oc scale deployment ocs-operator --replicas=1 -n openshift-storage
 	oc scale deployment rook-ceph-operator --replicas=1 -n openshift-storage
 	
- Wait for rook operator to failover other mons

Comment 6 Santosh Pillai 2023-05-15 12:19:34 UTC
c1 cluster is `healthy` now after using above workaround

```sh-5.1$ ceph status 
  cluster:
    id:     6bee5946-d3e4-4999-8110-24ed4325fbe2
    health: HEALTH_OK
 
  services:
    mon:        3 daemons, quorum e,g,h (age 21m)
    mgr:        a(active, since 24m)
    mds:        1/1 daemons up, 1 hot standby
    osd:        3 osds: 3 up (since 21m), 3 in (since 10d)
    rbd-mirror: 1 daemon active (1 hosts)
    rgw:        1 daemon active (1 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 169 pgs
    objects: 3.05k objects, 3.3 GiB
    usage:   5.9 GiB used, 1.5 TiB / 1.5 TiB avail
    pgs:     169 active+clean
 
  io:
    client:   31 KiB/s rd, 1.5 MiB/s wr, 36 op/s rd, 322 op/s wr
```


c2 has daemons crashing but mon's are up now.

``` 
sh-5.1$ ceph status  
  cluster:
    id:     c2c61349-f7b5-47c5-8fd6-f687ea46b450
    health: HEALTH_WARN
            1599 daemons have recently crashed
 
  services:
    mon:        3 daemons, quorum e,h,i (age 32m)
    mgr:        a(active, since 34m)
    mds:        1/1 daemons up, 1 hot standby
    osd:        3 osds: 3 up (since 33m), 3 in (since 10d)
    rbd-mirror: 1 daemon active (1 hosts)
    rgw:        1 daemon active (1 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 169 pgs
    objects: 2.89k objects, 4.2 GiB
    usage:   12 GiB used, 1.5 TiB / 1.5 TiB avail
    pgs:     169 active+clean
 
  io:
    client:   17 KiB/s rd, 64 KiB/s wr, 21 op/s rd, 6 op/s wr
```

Comment 7 Santosh Pillai 2023-05-15 12:38:06 UTC
(In reply to Santosh Pillai from comment #5)
> Used the following steps to add mons back to quorum. Here we edit only one
> mon and let rook operator to failover other mons.
> 
> Obtain the following information from the cluster
> fsid 
> mon e exported IP: This can be optioned from `oc get service | grep
> submariner`. Lets say the exported IP for mon-e is 242.0.255.251 in this case
> 
> - Scale down OCS operator and Rook deployments
> 	oc scale deployment ocs-operator --replicas=0 -n openshift-storage
> 	oc scale deployment rook-ceph-operator --replicas=0 -n openshift-storage
> 
> - Update mon deployment to use correct exported IP in
> `spec.containers[0].args.public_addr`
>   	`--public-addr=242.0.255.251` 
>   	
> - Copy mon-e deployment
>   	oc get deployment rook-ceph-mon-e -o yaml >
> rook-ceph-mon-e-deployment-c1.yaml
>   	
> - Edit rook-ceph-mon-endpoints to use correct exported IP for mon-e
> 
> - Patch the rook-ceph-mon-e Deployment to stop this mon working without
> deleting the mon pod:
> 	kubectl  patch deployment rook-ceph-mon-e --type='json' -p
> '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]'
> 	kubectl  patch deployment rook-ceph-mon-e -p '{"spec": {"template":
> {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"],
> "args": []}]}}}}'
> 
> - Connect to mon-e pod:
>    	oc exec -it <rook-ceph-mon-e> sh
> 
> - Inside mon-e pod:
>    - Create a temporary monmap
>    	monmaptool  --create --add e 242.0.255.251 --set-min-mon-release
> --enable--all-features --clobber /tmp/monmap --fsid <ceph fsid>
>    - Remove this mon-e entry
>    	monmaptool --rm e /tmp/monmap
>    - Add v2 protocol (Add V1 protocol as well if cluster supports both)
>    	monmaptool --addv e [v2:242.0.255.251:3300] /tmp/monmap
>    - inject this monmap to mon-e
>    	ceph-mon -i e --inject-monmap /tmp/monmap
>    - Exit mon-e pod
>    
> - Scale back ocs and rook deployments:
>  	oc scale deployment ocs-operator --replicas=1 -n openshift-storage
>  	oc scale deployment rook-ceph-operator --replicas=1 -n openshift-storage
>  	
> - Wait for rook operator to failover other mons


One last step is to restart rbd mirror pods on both the clusters. 

Mirroring health is ok on both the clusters now.

oc get cephblockpool ocs-storagecluster-cephblockpool -n openshift-storage -o jsonpath='{.status.mirroringStatus.summary}{"\n"}'
{"daemon_health":"OK","health":"OK","image_health":"OK","states":{"replaying":20}}

Comment 8 Travis Nielsen 2023-05-15 19:18:34 UTC
Santosh great to see the workaround for getting the cluster back up in this scenario of reinstalling Submariner.

This scenario is very disruptive. Ceph requires immutable IP addresses for the mons.

We cannot support this scenario automatically in Rook.

The only way we can hope to support this scenario is that if/when it happens in production, they will need to contact the support team to step through these complicated recovery steps.
Even better if we can get this recovery working with the krew plugin, which will just need an addition to the existing --restore-quorum command to support the changing IP. Then there is just the separate question of the best way for the support team to use the krew plugin (or an alternative) that is fully tested by QE.

Comment 9 Travis Nielsen 2023-05-16 15:11:34 UTC
Based on previous comments, moving out of 4.13 since we can't support anything except the customer working with support for disaster recovery steps.

Comment 29 Santosh Pillai 2023-08-08 05:41:51 UTC
Hi Vikhyat 

Did you get a chance to check the last comment by Travis regarding doc support?

Comment 30 Vikhyat Umrao 2023-08-08 21:59:24 UTC
(In reply to Santosh Pillai from comment #29)
> Hi Vikhyat 
> 
> Did you get a chance to check the last comment by Travis regarding doc
> support?

Hi Santosh,

Yes, updating IP should be easy  - this is documented here https://access.redhat.com/solutions/3093781 for standalone clusters. I think the basic steps should be the same for ODF. Adding @assingh who can help from ODF side.

Comment 31 Vikhyat Umrao 2023-08-08 22:05:59 UTC
(In reply to Vikhyat Umrao from comment #30)
> (In reply to Santosh Pillai from comment #29)
> > Hi Vikhyat 
> > 
> > Did you get a chance to check the last comment by Travis regarding doc
> > support?
> 
> Hi Santosh,
> 
> Yes, updating IP should be easy  - this is documented here
> https://access.redhat.com/solutions/3093781 for standalone clusters. I think
> the basic steps should be the same for ODF. Adding @assingh who
> can help from ODF side.

Ahh, I see in comment#5 you are already able to achieve it and the question is do we need to document it or not? I think yes we should document it @

Comment 32 Vikhyat Umrao 2023-08-08 22:06:23 UTC
@

Comment 33 Vikhyat Umrao 2023-08-08 22:07:10 UTC
@bkunal and Ashish - can you please check from the KCS point of view?

Comment 36 Santosh Pillai 2023-08-16 13:49:16 UTC
Thanks for the doc Bipin. I'll take a look at the doc tomorrow.