Bug 1877689

Summary:	[Octavia][16.1] Amphora Log Offloading, logs are gone if controller-0 stops
Product:	Red Hat OpenStack	Reporter:	Omer Schwartz <oschwart>
Component:	openstack-octavia	Assignee:	Nate Johnston <njohnston>
Status:	CLOSED NOTABUG	QA Contact:	Omer Schwartz <oschwart>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	16.1 (Train)	CC:	ihrachys, lpeer, majopela, michjohn, scohen, tfreger
Target Milestone:	---	Keywords:	UserExperience
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-09-13 07:32:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1623977

Description Omer Schwartz 2020-09-10 08:04:10 UTC

Description of problem:
If Controller-0 is stopped, the administrative/tenant flow logs are not offloaded, there are just gone (they are not in any other controller).

Version-Release number of selected component (if applicable):
(overcloud) [stack@undercloud-0 ~]$ cat /var/lib/rhos-release/latest-installed
16.1  -p RHOS-16.1-RHEL-8-20200813.n.0

How reproducible:
100%, I managed to reproduce it on my environment.

Steps to Reproduce:
1. Deploy OSP 16.1 in HA
2. Change the flag of the OctaviaLogOffload: true in /home/stack/virt/extra_templates.yaml
3. Due to bug https://bugzilla.redhat.com/show_bug.cgi?id=1856835, copy the templates folder in the following way:
sudo cp -r /usr/share/ansible/roles/octavia_controller_post_config/templates /usr/share/ansible/roles/octavia-controller-post-config/templates
4. run overcloud_deploy.sh
5. Stop controller-0.
6. Create a LB.
7. Check in the other controllers (1,2): there is no octavia-amphora.log file (which contains the offloaded logs).

Actual results:
No octavia-amphora.log file in Controller-1 or Controller-2.

Expected results:
We expect to see an octavia-amphora.log file with the LB creation details in any of the other controllers.

Additional info

Comment 1 Michael Johnson 2020-09-11 23:36:35 UTC

Yes, this is expected.

The rsyslog infrastructure/containers are setup for UDP rsyslog protocol.

This feature was implemented as low overhead and high volume(single load balancers can produce tens of thousands of messages per second with tenant traffic logging enabled). As the important log issues that occur inside the amphora are already logged on the controllers, this was setup as a lowest overhead, best-effort implementation.

It will queue messages until the target server is available again, and in some situations it will eventually switch to one of the secondary servers.

There are many levels of reliability for logging. As you go up this chain you add overhead and increase the amount of system resources (CPU, RAM, and disk) required.

Level 1, as implemented, is UDP transports and minimal queuing. Lowest CPU, RAM, and disk space requirements.
Level 2, would be switching transports over to TCP. Octavia supports this, tripleo is not setup for this. This increases the CPU and RAM overhead on both the amphora and the controllers. This can still drop messages.
Level 3, full bidirectional confirmation. This requires switching to RELP. It requires significant queuing resources on the amphora and significantly increases the RAM and CPU requirements on the controller.

Since these logs were considered "nice to have", we implemented level 1 reliability. This meets the criteria requested in BZ 1623977 with little impact to the cloud.

If there is a need for a higher level of reliability for these logs, we can move up levels, but that will require work and may be best entered as an additional RFE.

Comment 2 Omer Schwartz 2020-09-13 07:17:14 UTC

Makes sense to me, thanks for the information Michael, I am closing this bug.