Bug 1368271

Summary: RFE: Satellite 6 Intercontinental HA/Resilience (stretch clustered DC and multiple geos)
Product: Red Hat Satellite Reporter: Calvin Hartwell <chartwel>
Component: Docs Architecture GuideAssignee: satellite-doc-list
Status: CLOSED WONTFIX QA Contact: satellite-doc-list
Severity: low Docs Contact:
Priority: low    
Version: 6.1.9CC: ahumbe, rheron
Target Milestone: UnspecifiedKeywords: FutureFeature
Target Release: Unused   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-12 07:45:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1425213    

Description Calvin Hartwell 2016-08-18 22:44:05 UTC
Description of problem:

The current HA reference architecture for Satellite 6 is good but doesn't cover every customer scenario I've encountered. The reference architecture assumes you have shared storage (for satellite servers), shared networks and load balancers available. 

- Consider a large global customer scenario: 

Customer Kittens Inc. has multiple datacenters not only in one country but several across the world. For example, assume the Satellite server is located in Northern France, but they also have a capsule in Southern France, USA, Germany and the UK.   

If the Satellite server dies and it is used to perform critical functions within the business such as patching or provisioning and it could be down for several hours or days, this would be unacceptable. If the satellite server dies foreman (and respective services on the satellite server) would not be available so the capsules would lose some core functionality. 

One potential fix could be to have satellite servers in every country. However, inter-satellite sync does not currently sync everything across multiple satellite servers, just repositories, so constructs such as host collections, host groups, activation keys, repos and host groups are not synchronised. 

There is also no way to automatically register capsules to another Satellite server if the master Satellite server, which could happen if the Satellite server in Northen France died and the Capsule in Southern France was still active but just needed a new master. 

- Consider a smaller customer which doesn't have shared storage or networks across two data centers: 

Customer Cute-Cats Corp has multiple datacenter which complete with Satellite 6 servers but no shared networks or storage. There are no capsules but the customer's client servers can register to either satellite server. For most applications you could just put a load balancer (F5, Riverbed, etc) in front of the traffic or fail-over through DNS to provide resiliency. Due to the restrictive client setup however this is not possible as client machines can only register to one client at a time. 

One potential fix could be to make the client smarter so that it can register/unregister to multiple satellite servers depending on if a Satellite 6 server is down. The client would have more trust (though it would be done as root), it would be able to change its certs (goferd, rhsm, puppet etc) based upon which master it has picked to use. 

Cloud customers could also chose to directly register to the Red Hat CDN if their local satellite 6 server or capsule dies. This whole approach (if configured correctly) could even be done through remote command execution (the Satellite 6 slave server could fail over the clients in the event that the slave server has detected the master server has failed). 

The issue with this approach however is related to subscriptions as the customer would affectively need twice their current subscription count for both satellite 6 servers to deal with a fail over. 

Version-Release number of selected component (if applicable):

6.1.X
6.2.X

How reproducible:

Easy to reproduce. 

Steps to Reproduce:
1. Read and implement Sat 6 HA document 

Actual results:

Sat 6 HA architecture is limited in design to a single DC or two multiple colocated DC with good shared networks and shared storage. 

Expected results:

Customer should have a resilient Satellite 6 infrastructure which can scale cross datacenter, cross geo. 

Additional info:

Comment 3 Ashish Humbe 2019-02-12 07:45:53 UTC
There are no plans to provide HA solution for Satellite 6 in near future so customers are recommended to virtualize the Satellite and Capsule servers and use the hypervisor tools to provide high availability of the virtual machine hosting the Satellite or Capsule.

For more details refer: https://access.redhat.com/solutions/3402361