Bug 1378324

Summary: Nova api logging "Too many heartbeats missed" message in logs
Product: Red Hat OpenStack Reporter: PURANDHAR SAIRAM MANNIDI <pmannidi>
Component: openstack-novaAssignee: Eoghan Glynn <eglynn>
Status: CLOSED WONTFIX QA Contact: Prasanth Anbalagan <panbalag>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.0 (Kilo)CC: berrange, dasmith, eglynn, ggillies, kchamart, sbauza, sferdjao, sgordon, srevivo, stephenfin, vromanso
Target Milestone: ---Keywords: ZStream
Target Release: 7.0 (Kilo)Flags: pmannidi: needinfo? (eglynn)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-23 09:50:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description PURANDHAR SAIRAM MANNIDI 2016-09-22 07:02:04 UTC
Description of problem:
Nova api logging "Too many heartbeats missed" message in logs. Could not access the resources for sometime

Customer wanted to check whether the configuration parameter 'heartbeat_timeout_threshold' should be enabled or disabled in a production setup.
From http://docs.openstack.org/kilo/config-reference/content/nova-conf-changes-kilo.html it has been marked EXPERIMENTAL

[oslo_messaging_rabbit] heartbeat_timeout_threshold = 0	(IntOpt) Number of seconds after which the Rabbit broker is considered down if heartbeat's keep-alive fails (0 disables the heartbeat, >0 enables it. Enabling heartbeats requires kombu>=3.0.7 and amqp>=1.4.0). EXPERIMENTAL

Version-Release number of selected component (if applicable):
RHOSP 7.0

How reproducible:
Frequently

Comment 3 Stephen Finucane 2016-09-23 09:50:37 UTC
Having looked at this issue upstream [1], it appears the feature 'heartbeat_timeout_threshold' configuration option in 'oslo.messaging' was set to "disabled-by-default" shortly after being introduced [2] due to issues with various libraries [3]. However, all versions of 'oslo.messaging' used by Kilo [4] were released before this bugfix was included.

Backporting support for newer versions of 'oslo.messaging' simply to remove this message is not practical, thus, the easiest fix is to simply disable this feature manually. You can do this safe in the knowledge that timeouts will still occur when services go down thanks to TCP timeouts.

[1] https://bugs.launchpad.net/oslo.messaging/+bug/1436769
[2] https://bugs.launchpad.net/oslo.messaging/+bug/1436769/comments/12
[3] https://bugs.launchpad.net/oslo.messaging/+bug/1436769/comments/15
[4] https://github.com/openstack/nova/blob/2015.1.4/requirements.txt#L39

Comment 4 awaugama 2017-08-30 17:52:20 UTC
WONTFIX/NOTABUG therefore QE Won't automate