Bug 1928055 - [RHOSP13][RCA] os-net-config was triggered for existing nodes during scale-out operation
Summary: [RHOSP13][RCA] os-net-config was triggered for existing nodes during scale-ou...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: ---
Assignee: Rabi Mishra
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-12 09:40 UTC by Alex Stupnikov
Modified: 2022-08-30 11:44 UTC (History)
4 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.4.1-79.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-16 10:58:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 775474 0 None MERGED Always set NetworkDeploymentActions to its default 2021-02-17 20:45:31 UTC
Red Hat Issue Tracker OSP-1936 0 None None None 2022-08-30 11:44:32 UTC
Red Hat Product Errata RHBA-2021:2385 0 None None None 2021-06-16 10:59:29 UTC

Description Alex Stupnikov 2021-02-12 09:40:06 UTC
Description of problem:

I kindly ask for a second look for interesting problem that was reported by customer, which looks like a possible bug in THT.

There was an outage in customer's production environment triggered by scale-out operation: os-net-config on existing nodes was called to re-provision networking configuration (sosreports from overcloud nodes are provided, problem occurred at ~ Feb 08 09:18)

My first clue was that at some point customer had "u'NetworkDeploymentActions': [u'CREATE', u'UPDATE']" defined in his templates, removed it before scale-out operation and got caught by THT feature that prevents overcloud destruction when scale-out command is called without arguments.

I asked customer to provide overcloud deployment plan (attached to case), but couldn't find any "old" definitions of NetworkDeploymentActions parameter.

I asked customer to provide sosreports from director node to double-check everything and from mistral-api logs it looks like that at "2021-01-19 09:33:59.951" customer deployed overcloud with "u'NetworkDeploymentActions': [u'CREATE', u'UPDATE']". From mistral-engine logs it looks like scale out operation was executed using the templates without NetworkDeploymentActions definition.

It looks like I miss some small details about how THT work and would like to ask for a second look from people with better understanding.

Comment 1 Rabi Mishra 2021-02-13 06:12:20 UTC
Once you run a deployment with "NetworkDeploymentActions: ['CREATE', 'UPDATE']", you've to reset it back with "NetworkDeploymentActions: ['CREATE']" for it to not run network configs during an scale-out(stack update). Just removing the NetworkDeploymentActions parameter would not change it as we use patch update (i.e use existing parameters in the stack, unless provided).

This is the current behaviour.

For instance during ffwd-upgrade we reset it back with converge[2], after changing it during prepare[1].


[1] https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/environments/lifecycle/ffwd-upgrade-prepare.yaml#L21
[2] https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/environments/lifecycle/ffwd-upgrade-converge.yaml#L14


However, I think we can change this behaviour as this is disruptive in some instances. I'll propose a fix.

Comment 2 Alex Stupnikov 2021-02-15 08:22:43 UTC
Thank you very much!

Comment 13 errata-xmlrpc 2021-06-16 10:58:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 13.0 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2385


Note You need to log in before you can comment on or make changes to this bug.