Bug 2089530 - [MetroDR] explain challenges of hybrid deployment for Metro DR cluster configuration [NEEDINFO]
Summary: [MetroDR] explain challenges of hybrid deployment for Metro DR cluster config...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: documentation
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.14.0
Assignee: Anjana Suparna Sriram
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-23 22:51 UTC by Martin Bukatovic
Modified: 2023-08-09 16:43 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
hnallurv: needinfo? (vkolli)
hnallurv: needinfo? (olakra)


Attachments (Terms of Use)

Description Martin Bukatovic 2022-05-23 22:51:22 UTC
Describe the issue
==================

Chapter discussing requirements for deploying RHCS stretch cluster for MetroDR
doesn't explain peculiarities of a situation when arbiter/tiebreaker MON node is
deployed in a cloud while the remaining ceph zones are running in 2 different
on premise data centers.

This configuration is unique and is not covered neither in Ceph docs nor ODF
docs.

Describe the task you were trying to accomplish
===============================================

I need to understand how to plan deployment of a stretched ceph so that:

- 2 zones (each zone with few OSDs and 2 MONs) are deployed on premise
- one node with MON arbiter/tiebreaker is deployed in a cloud
- the configuration will be used as part of MetroDR solution

Suggestions for improvement
===========================

We need to:

- Provide generic and minimal instructions, which will be applicable to all
  customer environments (some customers may use AWS VPN, other could use
  existing infrastructure facilitating network connection between local and
  cloud networks ...).
- Align the terminology with Red Hat Ceph Storage docs, eg. with
  Red Hat Ceph Storage considerations and recommendations[1] or
  Security Zones[2]

Information to convey include:

- don't use public IP address for aribiter/tiebreaker MON node, use site to
  site vpn (we don't need to go into much details here)
- use separate "cluster network"[2] (we raise this requirement in the current
  docs, but without explanation and context for MetroDR use case)
- explain "client network", "public network" and "cluster network" in MetroDR
  configuration (eg. "public network" won't be public in a sense of a global
  public internet access, client network is not applicable (if I recall it
  right), ...)
- which nodes (from the whole MetroDR perspective) needs to have direct network
  access to the arbiter node
- mistakes to avoid when configuring routing/firewall between the sites

[1] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html/installation_guide/red-hat-ceph-storage-considerations-and-recommendations
[2] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/data_security_and_hardening_guide/index#security-zones-sec

Document URL
============

https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.10/html/configuring_openshift_data_foundation_for_metro-dr_with_advanced_cluster_management/requirements-for-deploying-rhcs-stretch-cluster_rhodf

Chapter/Section Number and Title
================================

Chapter 3. Requirements for deploying Red Hat Ceph Storage stretch cluster with arbiter

Product Version
===============

ODF 4.10

Environment Details
===================

MetroDR with ACM: deployment with stretched RHCS cluster across 2 data centers
and one MON hosted in a cloud

Any other versions of this document that also needs this update
===============================================================

This text was introduced in 4.10, and should be fixed at least in 4.11.

Additional information
======================

I was discussing this with Daniel Parkes, who agrees that this kind of
information is missing in the docs, and can provide further details and
guidance.

Comment 6 Martin Bukatovic 2022-07-08 13:38:38 UTC
Looking at the current draft from stage (commit ef84b6f):

I don't see any update in "3.3. Network configuration requirements" section.

Comment 22 Venkat Kolli 2023-02-15 16:22:13 UTC
Its OK to move the cloud arbiter testing to 4.13. as long as we have confirmed 100ms latency test in 4.12.

Comment 23 Venkat Kolli 2023-02-15 16:22:44 UTC
Its OK to move the cloud arbiter testing to 4.13. as long as we have confirmed 100ms latency test in 4.12.


Note You need to log in before you can comment on or make changes to this bug.