1624588 – Undercloud nova configuration does not have - sync_power_state_interval=-1

Bug 1624588 - Undercloud nova configuration does not have - sync_power_state_interval=-1

Summary: Undercloud nova configuration does not have - sync_power_state_interval=-1

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	14.0 (Rocky)
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	beta
Target Release:	14.0 (Rocky)
Assignee:	Michele Baldessari
QA Contact:	Archit Modi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-09-01 21:50 UTC by Marian Krcmarik
Modified:	2020-12-21 19:45 UTC (History)
CC List:	9 users (show)
Fixed In Version:	instack-undercloud-9.3.1-0.20180831000259.e464799.el7ost openstack-tripleo-heat-templates-9.0.0-0.20180906145841.66804ff.0rc1.0rc1.el7ost puppet-nova-13.3.1-0.20180831195237.ce0efbe.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-01-11 11:51:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1790504	None	None	None	2018-09-03 15:08:05 UTC
OpenStack gerrit	602039	'None'	MERGED	compute: add sync_power_state_interval parameter	2020-05-07 12:51:54 UTC
OpenStack gerrit	602041	'None'	MERGED	Disable sync_power_state_interval in containerized undercloud	2020-05-07 12:51:55 UTC
OpenStack gerrit	602042	'None'	MERGED	use the new puppet-nova parameter for sync_power_state_interval	2020-05-07 12:51:55 UTC
Red Hat Product Errata	RHEA-2019:0045	None	None	None	2019-01-11 11:53:01 UTC

Description Marian Krcmarik 2018-09-01 21:50:54 UTC

Description of problem:
Undercloud nova has configuration item sync_power_state_interval=-1 since RHOS8 based on the decisions which are described in bug #1245298. It seems like this setting is not being set on undercloud of RHOS14 and thus all behaviour described in the mentioned bug is back.

I am not sure what is the right component since I do not have knowledge how much instack-undercloud is being used for undercloud on RHOS14.

Version-Release number of selected component (if applicable):
instack-undercloud-9.2.1-0.20180803181448.be5fa97.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. Start some HA tests which consist of overcloud node resets.

Actual results:
Even though overcloud nodes recover from reset/failover, undercloud nova sometimes keeps shutting them down If It hits interval when It does sync of overcloud node status with DB state.

Expected results:
Just do not touch it

Additional info:

Comment 1 Marian Krcmarik 2018-09-07 21:13:57 UTC

Maybe setting the config parameter sync_power_state_interval to -1 is not enough
I can see a new (?) option in nova config called handle_virt_lifecycle_events which maybe needs to be set to false (true by default) to disable power state synchronization, If I set only sync_power_state_interval to -1 I still could observe the unwanted behaviour, this is description in nova.conf:
# * If ``handle_virt_lifecycle_events`` in workarounds_group is
#   false and this option is negative, then instances that get out
#   of sync between the hypervisor and the Nova database will have
#   to be synchronized manually.
#  (integer value)
#sync_power_state_interval=600

But I do not really get clearly what that option handle_virt_lifecycle_events is about but setting to false seems to help.

Comment 2 Bob Fournier 2018-09-10 22:32:48 UTC

This doesn't fall under HardwareProvisioning DFG, moving to Comoute.

Comment 3 Michele Baldessari 2018-09-12 14:23:45 UTC

So in terms of getting back the exact same config as pre-containerized undercloud, here is the recap of the reviews:
- new puppet-nova param
  master: https://review.openstack.org/599480 (merged)
  rocky: https://review.openstack.org/602039 (not-merged)

- instack-undercloud to move to new param (not really needed for containerized undercloud)
  master: https://review.openstack.org/#/c/599580/ (merged)
  rocky: https://review.openstack.org/602042 (not-merged)

- tht change needed for the undercloud:
  master: https://review.openstack.org/599423 (merged)
  rocky: https://review.openstack.org/602041 (not merged)

Comment#1 from Marian is a bit concerning though and it might very well be that the previous conf we had is now insufficient. Some feedback from the compute folks would be great to have, here.

Comment 4 Artom Lifshitz 2018-09-13 14:22:53 UTC

handle_virt_lifecycle events has been present since Liberty - setting it to false is a workaround to try and reduce the possibility of racing on the _sync_instance_power_state(), which is called by *both* the periodic task (unless sync_power_state_interval = -1) *and* the virt driver sending instance lifecycle events to the compute manager (unless handle_virt_lifecycle_events = false).

That being said, sending lifecycle events up from the virt driver to the compute manager is something that only libvirt and hyperv do, so handle_virt_lifecycle is irrelevant in the case of the overcloud, as the undercloud uses the ironic driver. Therefore, I believe finding a way of getting sync_power_state_interval back to -1 should be enough.

Comment 5 Marian Krcmarik 2018-09-13 14:37:08 UTC

(In reply to Artom Lifshitz from comment #4)
> handle_virt_lifecycle events has been present since Liberty - setting it to
> false is a workaround to try and reduce the possibility of racing on the
> _sync_instance_power_state(), which is called by *both* the periodic task
> (unless sync_power_state_interval = -1) *and* the virt driver sending
> instance lifecycle events to the compute manager (unless
> handle_virt_lifecycle_events = false).
> 
> That being said, sending lifecycle events up from the virt driver to the
> compute manager is something that only libvirt and hyperv do, so
> handle_virt_lifecycle is irrelevant in the case of the overcloud, as the
> undercloud uses the ironic driver. Therefore, I believe finding a way of
> getting sync_power_state_interval back to -1 should be enough.

Yes, probably It was a different hiccup I had observed, I cannot see the problems when only sync_power_state_interval = -1 is set now, so let's proceed with the patch as It is.
Thanks.

Comment 14 errata-xmlrpc 2019-01-11 11:51:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045

Note You need to log in before you can comment on or make changes to this bug.