Bug 1517507

Summary:	[BUG TRACKER] - - Active/standby Amphoras- Amphora boot failure while creating LB should put the LB into error state
Product:	Red Hat OpenStack	Reporter:	Alexander Stafeyev <astafeye>
Component:	openstack-octavia	Assignee:	Nir Magnezi <nmagnezi>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Alexander Stafeyev <astafeye>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	12.0 (Pike)	CC:	amuller, aschultz, astafeye, cgoncalves, ihrachys, jlibosva, jschluet, lpeer, majopela, twilson
Target Milestone:	---	Keywords:	TestOnly, Triaged, ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-07-24 03:16:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1698576

Description Alexander Stafeyev 2017-11-26 10:15:54 UTC

When we have "active/standby" octvia configuration, and we create LB, at least 2 amphoras are booted. 
If 1 of those 2 amphoras failed to boot we should have info/error messages in OCTAVIA logs and we should have a printed out message that alerts us regarding the amphora failure. 

Storyboard - https://storyboard.openstack.org/#!/story/2001315


* Octavia upstream is not managed in launchpad but in storyboard

Comment 2 Nir Magnezi 2017-12-05 10:46:47 UTC

Hi Alex,

Currently (at least until we implement flavors), the loadbalancer topology is a system-wide configuration. Moreover, that configuration is not exposed to the end user in any way.

When a user creates a loadbalancer, he has no notion of what is happening behind the scenes. That user can only get an indication of whether or not his loadbalancer is operational
 by looking at the operating_status and provisioning_status.

It is important to take the above into account since:
1. The user can't actually see the amphoras, those reside on the admin tenant.
2. The user normally does not have access to the deployment logs.
3. The loadbalancer creation is an asynchronous process, thus we cannot block and wait for the outcome in order to print an error.

Additionally, in the scenario you mentioned, in which there is no capacity to boot two amphoras, you expect a warning. I beg the differ. I would expect that at the end of the process the loadbalancer creation will fail and will get into an ERROR state (rather than leave it working with a single amphora).
The idea is that the OpenStack deployment simply does not have enough resources to properly create the loadbalancer and we cannot guarantee the SLA for it. A similar concept is when you try to boot an instance with a large flavor, and you simply don't have the capacity for it.
This is btw, in contrast to a highly available loadbalancer that was successfully created with two amphoras, and somewhere along the way one of the amphora instances died.

P.S.
I do thin we warn the operator when we fail to boot active standby amphoras: https://github.com/openstack/octavia/blob/7bf8804177d3b7a9a4384c2b6d349228ecdced23/octavia/controller/worker/tasks/compute_tasks.py#L228-L232

Thoughts?

Comment 3 Alexander Stafeyev 2017-12-10 08:57:55 UTC

(In reply to Nir Magnezi from comment #2)
> Hi Alex,
> 
> Currently (at least until we implement flavors), the loadbalancer topology
> is a system-wide configuration. Moreover, that configuration is not exposed
> to the end user in any way.
> 
> When a user creates a loadbalancer, he has no notion of what is happening
> behind the scenes. That user can only get an indication of whether or not
> his loadbalancer is operational
>  by looking at the operating_status and provisioning_status.
> 
> It is important to take the above into account since:
> 1. The user can't actually see the amphoras, those reside on the admin
> tenant.
> 2. The user normally does not have access to the deployment logs.
> 3. The loadbalancer creation is an asynchronous process, thus we cannot
> block and wait for the outcome in order to print an error.
> 
> Additionally, in the scenario you mentioned, in which there is no capacity
> to boot two amphoras, you expect a warning. I beg the differ. I would expect
> that at the end of the process the loadbalancer creation will fail and will
> get into an ERROR state (rather than leave it working with a single amphora).
> The idea is that the OpenStack deployment simply does not have enough
> resources to properly create the loadbalancer and we cannot guarantee the
> SLA for it. A similar concept is when you try to boot an instance with a
> large flavor, and you simply don't have the capacity for it.
> This is btw, in contrast to a highly available loadbalancer that was
> successfully created with two amphoras, and somewhere along the way one of
> the amphora instances died.
> 
> P.S.
> I do thin we warn the operator when we fail to boot active standby amphoras:
> https://github.com/openstack/octavia/blob/
> 7bf8804177d3b7a9a4384c2b6d349228ecdced23/octavia/controller/worker/tasks/
> compute_tasks.py#L228-L232
> 
> Thoughts?

Hi Nir, 
I agree with your logic. If the user expects HA and he does not get it, it would be better tu put the LB in ERROR state as you mentioned.  
I will edit the topic .

Comment 4 Nir Magnezi 2017-12-25 10:01:26 UTC

Since there is no development currently needed here, I'm setting this as TestOnly.

Comment 6 Nir Magnezi 2017-12-25 10:58:40 UTC

Since this is TestOnly, moving to POST.

Comment 11 Nir Magnezi 2018-08-30 20:30:48 UTC

Alex,

This is a TestOnly bug.
Are you planning to test it as a part of OSP13z or OSP14?
Del-Rel wants to know when this can be moved to a Modified state.

Thanks,
Nir

Comment 12 Alexander Stafeyev 2018-09-02 08:08:06 UTC

(In reply to Nir Magnezi from comment #11)
> Alex,
> 
> This is a TestOnly bug.
> Are you planning to test it as a part of OSP13z or OSP14?
> Del-Rel wants to know when this can be moved to a Modified state.
> 
> Thanks,
> Nir

Hi Nir, 
I will test it in 14. Thanks