Bug 1877689
Summary: | [Octavia][16.1] Amphora Log Offloading, logs are gone if controller-0 stops | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Omer Schwartz <oschwart> |
Component: | openstack-octavia | Assignee: | Nate Johnston <njohnston> |
Status: | CLOSED NOTABUG | QA Contact: | Omer Schwartz <oschwart> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 16.1 (Train) | CC: | ihrachys, lpeer, majopela, michjohn, scohen, tfreger |
Target Milestone: | --- | Keywords: | UserExperience |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-09-13 07:32:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1623977 |
Description
Omer Schwartz
2020-09-10 08:04:10 UTC
Yes, this is expected. The rsyslog infrastructure/containers are setup for UDP rsyslog protocol. This feature was implemented as low overhead and high volume(single load balancers can produce tens of thousands of messages per second with tenant traffic logging enabled). As the important log issues that occur inside the amphora are already logged on the controllers, this was setup as a lowest overhead, best-effort implementation. It will queue messages until the target server is available again, and in some situations it will eventually switch to one of the secondary servers. There are many levels of reliability for logging. As you go up this chain you add overhead and increase the amount of system resources (CPU, RAM, and disk) required. Level 1, as implemented, is UDP transports and minimal queuing. Lowest CPU, RAM, and disk space requirements. Level 2, would be switching transports over to TCP. Octavia supports this, tripleo is not setup for this. This increases the CPU and RAM overhead on both the amphora and the controllers. This can still drop messages. Level 3, full bidirectional confirmation. This requires switching to RELP. It requires significant queuing resources on the amphora and significantly increases the RAM and CPU requirements on the controller. Since these logs were considered "nice to have", we implemented level 1 reliability. This meets the criteria requested in BZ 1623977 with little impact to the cloud. If there is a need for a higher level of reliability for these logs, we can move up levels, but that will require work and may be best entered as an additional RFE. Makes sense to me, thanks for the information Michael, I am closing this bug. |