Bug 2089530

Summary: [MetroDR] explain challenges of hybrid deployment for Metro DR cluster configuration
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Martin Bukatovic <mbukatov>
Component: documentationAssignee: Anjana Suparna Sriram <asriram>
Status: ASSIGNED --- QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: asriram, dparkes, hnallurv, mmuench, odf-bz-bot, olakra, vkolli
Target Milestone: ---Flags: hnallurv: needinfo? (vkolli)
hnallurv: needinfo? (olakra)
Target Release: ODF 4.14.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Bukatovic 2022-05-23 22:51:22 UTC
Describe the issue
==================

Chapter discussing requirements for deploying RHCS stretch cluster for MetroDR
doesn't explain peculiarities of a situation when arbiter/tiebreaker MON node is
deployed in a cloud while the remaining ceph zones are running in 2 different
on premise data centers.

This configuration is unique and is not covered neither in Ceph docs nor ODF
docs.

Describe the task you were trying to accomplish
===============================================

I need to understand how to plan deployment of a stretched ceph so that:

- 2 zones (each zone with few OSDs and 2 MONs) are deployed on premise
- one node with MON arbiter/tiebreaker is deployed in a cloud
- the configuration will be used as part of MetroDR solution

Suggestions for improvement
===========================

We need to:

- Provide generic and minimal instructions, which will be applicable to all
  customer environments (some customers may use AWS VPN, other could use
  existing infrastructure facilitating network connection between local and
  cloud networks ...).
- Align the terminology with Red Hat Ceph Storage docs, eg. with
  Red Hat Ceph Storage considerations and recommendations[1] or
  Security Zones[2]

Information to convey include:

- don't use public IP address for aribiter/tiebreaker MON node, use site to
  site vpn (we don't need to go into much details here)
- use separate "cluster network"[2] (we raise this requirement in the current
  docs, but without explanation and context for MetroDR use case)
- explain "client network", "public network" and "cluster network" in MetroDR
  configuration (eg. "public network" won't be public in a sense of a global
  public internet access, client network is not applicable (if I recall it
  right), ...)
- which nodes (from the whole MetroDR perspective) needs to have direct network
  access to the arbiter node
- mistakes to avoid when configuring routing/firewall between the sites

[1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html/installation_guide/red-hat-ceph-storage-considerations-and-recommendations
[2] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/data_security_and_hardening_guide/index#security-zones-sec

Document URL
============

https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.10/html/configuring_openshift_data_foundation_for_metro-dr_with_advanced_cluster_management/requirements-for-deploying-rhcs-stretch-cluster_rhodf

Chapter/Section Number and Title
================================

Chapter 3. Requirements for deploying Red Hat Ceph Storage stretch cluster with arbiter

Product Version
===============

ODF 4.10

Environment Details
===================

MetroDR with ACM: deployment with stretched RHCS cluster across 2 data centers
and one MON hosted in a cloud

Any other versions of this document that also needs this update
===============================================================

This text was introduced in 4.10, and should be fixed at least in 4.11.

Additional information
======================

I was discussing this with Daniel Parkes, who agrees that this kind of
information is missing in the docs, and can provide further details and
guidance.

Comment 6 Martin Bukatovic 2022-07-08 13:38:38 UTC
Looking at the current draft from stage (commit ef84b6f):

I don't see any update in "3.3. Network configuration requirements" section.

Comment 22 Venkat Kolli 2023-02-15 16:22:13 UTC
Its OK to move the cloud arbiter testing to 4.13. as long as we have confirmed 100ms latency test in 4.12.

Comment 23 Venkat Kolli 2023-02-15 16:22:44 UTC
Its OK to move the cloud arbiter testing to 4.13. as long as we have confirmed 100ms latency test in 4.12.