Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1618772

Summary: CPU load skyrocketing due to API and Health Manager not being able to connect to the bus
Product: Red Hat OpenStack Reporter: Carlos Goncalves <cgoncalves>
Component: openstack-tripleo-heat-templatesAssignee: Carlos Goncalves <cgoncalves>
Status: CLOSED ERRATA QA Contact: Alexander Stafeyev <astafeye>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: akaris, amuller, apevec, bbonguar, cgoncalves, dhill, ggrimaux, jmelvin, juriarte, lhh, mburns, oblaut, ramishra, srevivo
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.4-33.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1624037 (view as bug list) Environment:
Last Closed: 2018-11-13 22:28:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1624037    

Description Carlos Goncalves 2018-08-17 14:43:36 UTC
Description of problem:
The CPU load of the health manager skyrocketed (CPU load average: 794.80, 467.48, 491.34). Logs show connections to rabbitmq being refused.


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-8.0.4-16.el7ost.noarch
openstack-octavia-health-manager-2.0.1-6.d137eaagit.el7ost.noarch
openstack-octavia-common-2.0.1-6.d137eaagit.el7ost.noarch
openstack-octavia-api-2.0.1-6.d137eaagit.el7ost.noarch
openstack-octavia-housekeeping-2.0.1-6.d137eaagit.el7ost.noarch
openstack-octavia-worker-2.0.1-6.d137eaagit.el7ost.noarch


How reproducible:
100%

Steps to Reproduce:
1. Create load balancer
2. Stop rabbitmq in the controller node Octavia services are connecting to.
3. Observe Octavia logs, processes in controller nodes and their CPU load.

Actual results:
Very high CPU load. Amphorae would probably start to failover as reported in rhbz#1607276.


2018-08-17 09:31:49.651 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.613 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.652 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.653 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused

==> /var/log/containers/octavia/api.log <==
2018-08-17 09:31:51.643 1 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused

==> /var/log/containers/octavia/health-manager.log <==
2018-08-17 09:31:49.568 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.631 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.658 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused
2018-08-17 09:31:49.656 1810 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 111] Connection refused

[root@controller-0 heat-admin]# ps aux | grep octavia
42437 129463 0.0 0.2 367452 84596 ? Ss Aug14 0:01 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 129530 0.0 0.2 588828 81076 ? Sl Aug14 2:54 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 129531 0.4 0.2 371644 81528 ? S Aug14 18:47 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 130349 2.2 0.3 671604 123768 ? Ssl Aug14 90:21 /usr/bin/python2 /usr/bin/octavia-api --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/api.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-api
42437 131439 1.4 0.3 595464 106908 ? Ssl Aug14 60:46 /usr/bin/python2 /usr/bin/octavia-housekeeping --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/housekeeping.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-housekeeping
42437 131605 1.4 0.1 177180 38448 ? Ss Aug14 60:07 octavia-worker: master process [/usr/bin/octavia-worker --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/worker.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-worker]
42437 131698 1.3 0.3 1290132 115248 ? Sl Aug14 54:32 octavia-worker: ConsumerService worker(0)
42437 631264 38.0 3.7 16265304 1218968 ? Sl Aug14 1524:16 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631265 20.1 2.2 9169856 732844 ? Sl Aug14 807:07 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631266 31.9 3.1 14336816 1037200 ? Sl Aug14 1280:07 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631267 26.8 2.7 11699104 903332 ? Sl Aug14 1074:25 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631268 46.4 4.2 18236484 1402736 ? Sl Aug14 1862:21 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631269 11.8 2.0 8880656 668704 ? Sl Aug14 473:48 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631270 11.7 1.9 8197496 637928 ? Sl Aug14 469:27 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
42437 631271 46.7 4.0 17608884 1311676 ? Sl Aug14 1874:30 /usr/bin/python2 /usr/bin/octavia-health-manager --config-file /usr/share/octavia/octavia-dist.conf --config-file /etc/octavia/octavia.conf --log-file /var/log/octavia/health-manager.log --config-dir /etc/octavia/conf.d/common --config-dir /etc/octavia/conf.d/octavia-health-manager
root 679181 0.0 0.0 112708 984 pts/1 S+ 09:33 0:00 grep --color=auto octavia


Expected results:

[health_manager]/event_streamer_driver option should be set by default to queue_event_streamer. Driver queue_event_streamer is deprecated and will be removed around by the time neutron-lbaas reaches EOL.

THT is the component setting octavia::health_manager::event_streamer_driver.

Additional info:
Restarting affected Octavia services seem to temporarily resolve the issue by reconnecting to a different rabbitmq endpoint running in another controller node.

Comment 4 Carlos Goncalves 2018-10-08 16:10:45 UTC
*** Bug 1631234 has been marked as a duplicate of this bug. ***

Comment 22 errata-xmlrpc 2018-11-13 22:28:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3587

Comment 23 Alex Klarkson 2022-09-23 17:38:57 UTC Comment hidden (spam)