Bug 1877689 - [Octavia][16.1] Amphora Log Offloading, logs are gone if controller-0 stops
Summary: [Octavia][16.1] Amphora Log Offloading, logs are gone if controller-0 stops
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-octavia
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Nate Johnston
QA Contact: Omer Schwartz
URL:
Whiteboard:
Depends On:
Blocks: 1623977
TreeView+ depends on / blocked
 
Reported: 2020-09-10 08:04 UTC by Omer Schwartz
Modified: 2020-09-13 07:32 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-13 07:32:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Omer Schwartz 2020-09-10 08:04:10 UTC
Description of problem:
If Controller-0 is stopped, the administrative/tenant flow logs are not offloaded, there are just gone (they are not in any other controller).

Version-Release number of selected component (if applicable):
(overcloud) [stack@undercloud-0 ~]$ cat /var/lib/rhos-release/latest-installed
16.1  -p RHOS-16.1-RHEL-8-20200813.n.0

How reproducible:
100%, I managed to reproduce it on my environment.

Steps to Reproduce:
1. Deploy OSP 16.1 in HA
2. Change the flag of the OctaviaLogOffload: true in /home/stack/virt/extra_templates.yaml
3. Due to bug https://bugzilla.redhat.com/show_bug.cgi?id=1856835, copy the templates folder in the following way:
sudo cp -r /usr/share/ansible/roles/octavia_controller_post_config/templates /usr/share/ansible/roles/octavia-controller-post-config/templates
4. run overcloud_deploy.sh
5. Stop controller-0.
6. Create a LB.
7. Check in the other controllers (1,2): there is no octavia-amphora.log file (which contains the offloaded logs).

Actual results:
No octavia-amphora.log file in Controller-1 or Controller-2.

Expected results:
We expect to see an octavia-amphora.log file with the LB creation details in any of the other controllers.

Additional info

Comment 1 Michael Johnson 2020-09-11 23:36:35 UTC
Yes, this is expected.

The rsyslog infrastructure/containers are setup for UDP rsyslog protocol.

This feature was implemented as low overhead and high volume(single load balancers can produce tens of thousands of messages per second with tenant traffic logging enabled). As the important log issues that occur inside the amphora are already logged on the controllers, this was setup as a lowest overhead, best-effort implementation.

It will queue messages until the target server is available again, and in some situations it will eventually switch to one of the secondary servers.

There are many levels of reliability for logging. As you go up this chain you add overhead and increase the amount of system resources (CPU, RAM, and disk) required.

Level 1, as implemented, is UDP transports and minimal queuing. Lowest CPU, RAM, and disk space requirements.
Level 2, would be switching transports over to TCP. Octavia supports this, tripleo is not setup for this. This increases the CPU and RAM overhead on both the amphora and the controllers. This can still drop messages.
Level 3, full bidirectional confirmation. This requires switching to RELP. It requires significant queuing resources on the amphora and significantly increases the RAM and CPU requirements on the controller.

Since these logs were considered "nice to have", we implemented level 1 reliability. This meets the criteria requested in BZ 1623977 with little impact to the cloud.

If there is a need for a higher level of reliability for these logs, we can move up levels, but that will require work and may be best entered as an additional RFE.

Comment 2 Omer Schwartz 2020-09-13 07:17:14 UTC
Makes sense to me, thanks for the information Michael, I am closing this bug.


Note You need to log in before you can comment on or make changes to this bug.