Bug 1936529

Summary: [RFE] Mechanism to fence service API endpoints off
Product: Red Hat OpenStack Reporter: Carlos Goncalves <cgoncalves>
Component: openstack-tripleoAssignee: James Slagle <jslagle>
Status: NEW --- QA Contact: Joe H. Rahme <jhakimra>
Severity: medium Docs Contact:
Priority: medium    
Version: 17.0 (Wallaby)CC: akaiser, bdobreli, cjeanner, dalvarez, enothen, jpretori, mburns, michjohn, sathlang, tkajinam, tmicheli
Target Milestone: ---Keywords: FutureFeature, RFE, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Carlos Goncalves 2021-03-08 17:09:33 UTC
Cloud operators may need to fence all or a subset of service API endpoints off from their users.

  * Neutron ML2/OVS to ML2/OVN migration requires Neutron API fencing
    Race conditions can happen because users accessing the Neutron API may trigger operations on objects, and the migration tool then tries to create those resources in OVN and that causes some mismatches which result in a failed migration (see BZ #1936018).

  * Octavia-enabled OSP 13 z12 or older to OSP 13 z13 or newer updates
    Octavia was upgraded to the Train version in OSP 13 z13. The upgrade process requires a database schema update. For this reason, users must not be able to do changing operations (create, update, delete) to Octavia load balancing resources. This note was added to the documentation but we've had cases where the Octavia API was not fenced by the cloud operator because either they were not aware of such note or because they expected Director to seamlessly handle it (see BZ #1927169).

  * Faulty service instances not detected by the service health checks (tripleo-common) will still be in the HAProxy server list and thus traffic will be forwarded to them.

User stories:
  * As a cloud operator, I want to fence all or a subset of service API endpoints off so API users have no access to them.
  * As a cloud operator, I want to fence all or a subset of service API endpoints off so API users have limited access (e.g. read-only).

A possible fencing mechanism could be by means of adjusting the HAProxy configuration.
For example, taking servers out of rotation (user story #1) or allowing requests to HTTP 2xx (user story #2).

Comment 1 Carlos Goncalves 2021-03-09 09:39:41 UTC
The last sentence in comment #0 should read instead as follows:
  "For example, taking servers out of rotation (user story #1) or allowing GET requests to the API endpoints (user story #2)."

Comment 2 Daniel Alvarez Sanchez 2021-03-09 10:01:25 UTC
Echoing the need for this RFE from the Neutron side.

I like the HAProxy way for this fencing mechanism because:

- I believe it's faster to restart haproxy than all the services to honor the fencing.
- Allowing GET operations is probably going to cope better with any potential monitoring mechanisms in place.
- Seems like a global solution for all the endpoints.


+1 from me :)

Another solution that we briefly discussed yesterday is through policies but I think it's less flexible than through haproxy.

Comment 3 Sofer Athlan-Guyot 2021-03-15 17:00:46 UTC
Hi,

+1 on HAproxy but note that changing haproxy configuration on a live env will be "reset" during an update to the configured default. 

Just let me know if some more information is needed around that use case.

Thanks,