Bug 1618772 - CPU load skyrocketing due to API and Health Manager not being able to connect to the bus
Summary: CPU load skyrocketing due to API and Health Manager not being able to connect...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z3
: 13.0 (Queens)
Assignee: Carlos Goncalves
QA Contact: Alexander Stafeyev
URL:
Whiteboard:
: 1631234 (view as bug list)
Depends On:
Blocks: 1624037
TreeView+ depends on / blocked
 
Reported: 2018-08-17 14:43 UTC by Carlos Goncalves
Modified: 2025-01-27 13:50 UTC (History)
14 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.0.4-33.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1624037 (view as bug list)
Environment:
Last Closed: 2018-11-13 22:28:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1787608 0 None None None 2018-08-17 15:37:10 UTC
OpenStack gerrit 593083 0 None MERGED Add OctaviaEventStreamerDriver and change default 2021-01-18 08:53:51 UTC
OpenStack gerrit 598240 0 None MERGED Add OctaviaEventStreamerDriver and change default 2021-01-18 08:53:51 UTC
OpenStack gerrit 598241 0 None MERGED Add OctaviaEventStreamerDriver and change default 2021-01-18 08:53:52 UTC
Red Hat Bugzilla 1607276 0 high CLOSED All existing amphora instances are deleting when RabbitMQ is down 2022-07-09 14:30:58 UTC
Red Hat Issue Tracker OSP-3755 0 None None None 2021-12-10 17:16:32 UTC
Red Hat Product Errata RHBA-2018:3587 0 None None None 2018-11-13 22:29:36 UTC

Internal Links: 1607276

Description Carlos Goncalves 2018-08-17 14:43:36 UTC
Description of problem:
The CPU load of the health manager skyrocketed (CPU load average: 794.80, 467.48, 491.34). Logs show connections to rabbitmq being refused.


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-8.0.4-16.el7ost.noarch
openstack-octavia-health-manager-2.0.1-6.d137eaagit.el7ost.noarch
openstack-octavia-common-2.0.1-6.d137eaagit.el7ost.noarch
openstack-octavia-api-2.0.1-6.d137eaagit.el7ost.noarch
openstack-octavia-housekeeping-2.0.1-6.d137eaagit.el7ost.noarch
openstack-octavia-worker-2.0.1-6.d137eaagit.el7ost.noarch


How reproducible:
100%

Steps to Reproduce:
1. Create load balancer
2. Stop rabbitmq in the controller node Octavia services are connecting to.
3. Observe Octavia logs, processes in controller nodes and their CPU load.

Actual results:
Very high CPU load. Amphorae would probably start to failover as reported in rhbz#1607276.


2018-08-17 09:31:49.651 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.613 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.652 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.653 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused

==> /var/log/containers/octavia/api.log <==
2018-08-17 09:31:51.643 1 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused

==> /var/log/containers/octavia/health-manager.log <==
2018-08-17 09:31:49.568 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.631 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.658 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.656 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused

[root@controller-0 heat-admin]# ps aux | grep octavia
42437 129463 0.0 0.2 367452 84596 ? Ss Aug14 0:01 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 129530 0.0 0.2 588828 81076 ? Sl Aug14 2:54 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 129531 0.4 0.2 371644 81528 ? S Aug14 18:47 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 130349 2.2 0.3 671604 123768 ? Ssl Aug14 90:21 /usr/bin/python2 /usr/bin/octavia-api --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/api.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-api
42437 131439 1.4 0.3 595464 106908 ? Ssl Aug14 60:46 /usr/bin/python2 /usr/bin/octavia-housekeeping --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/housekeeping.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-housekeeping
42437 131605 1.4 0.1 177180 38448 ? Ss Aug14 60:07 octavia-worker: master process [/usr/bin/octavia-worker --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/worker.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-worker]
42437 131698 1.3 0.3 1290132 115248 ? Sl Aug14 54:32 octavia-worker: ConsumerService worker(0)
42437 631264 38.0 3.7 16265304 1218968 ? Sl Aug14 1524:16 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631265 20.1 2.2 9169856 732844 ? Sl Aug14 807:07 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631266 31.9 3.1 14336816 1037200 ? Sl Aug14 1280:07 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631267 26.8 2.7 11699104 903332 ? Sl Aug14 1074:25 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631268 46.4 4.2 18236484 1402736 ? Sl Aug14 1862:21 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631269 11.8 2.0 8880656 668704 ? Sl Aug14 473:48 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631270 11.7 1.9 8197496 637928 ? Sl Aug14 469:27 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631271 46.7 4.0 17608884 1311676 ? Sl Aug14 1874:30 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
root 679181 0.0 0.0 112708 984 pts/1 S+ 09:33 0:00 grep --color=auto octavia


Expected results:

[health_manager]/event_streamer_driver option should be set by default to queue_event_streamer. Driver queue_event_streamer is deprecated and will be removed around by the time neutron-lbaas reaches EOL.

THT is the component setting octavia::health_manager::event_streamer_driver.

Additional info:
Restarting affected Octavia services seem to temporarily resolve the issue by reconnecting to a different rabbitmq endpoint running in another controller node.

Comment 4 Carlos Goncalves 2018-10-08 16:10:45 UTC
*** Bug 1631234 has been marked as a duplicate of this bug. ***

Comment 22 errata-xmlrpc 2018-11-13 22:28:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3587

Comment 23 Alex Klarkson 2022-09-23 17:38:57 UTC Comment hidden (spam)

Note You need to log in before you can comment on or make changes to this bug.