Bug 1763892 - Following the replacement of a controller, all the controlplane IP changed and openstack-openvswitch-agent wasn't restarted on any computes
Summary: Following the replacement of a controller, all the controlplane IP changed an...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: async
: 13.0 (Queens)
Assignee: Greg Rakauskas
QA Contact: James Smith
URL:
Whiteboard: docs-accepted
Depends On:
Blocks: 1815285 1815286 1868236 1898650
TreeView+ depends on / blocked
 
Reported: 2019-10-21 20:27 UTC by David Hill
Modified: 2023-03-24 15:43 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1815285 1815286 (view as bug list)
Environment:
Last Closed: 2020-03-25 21:45:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description David Hill 2019-10-21 20:27:29 UTC
Description of problem:
Following the replacement of a controller, all the controlplane IPs changed and openstack-openvswitch-agent wasn't restarted on any computes so all the ovs-agents were showing off as down in `neutron agent-list`:

Oct 20 22:26:07 overcloud-compute-0 os-collect-config: [2019-10-20 22:26:07,660] (heat-config) [INFO] {"deploy_stdout": "\u001b[mNotice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.\u001b[0m\n\u001b[mNotice: Compiled catalog for overcloud-compute-0.localdomain in environment production in 3.11 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Kernel/Kmod::Load[nf_conntrack_proto_sctp]/Exec[modprobe nf_conntrack_proto_sctp]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron/Oslo::Messaging::Rabbit[neutron_config]/Neutron_config[oslo_messaging_rabbit/rabbit_hosts]/value: value changed ['10.10.10.11:5672,10.10.10.10:5672,10.10.10.13:5672'] to ['10.10.10.10:5672,10.10.10.13:5672,10.10.10.12:5672']\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::config::end]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::service::begin]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: Finished catalog run in 5.85 seconds\u001b[0m\n", "deploy_stderr": "exception: connect failed\n", "deploy_status_code": 0}
Oct 20 22:26:07 overcloud-compute-0 os-collect-config: [2019-10-20 22:26:07,660] (heat-config) [DEBUG] [2019-10-20 22:25:16,767] (heat-config) [DEBUG] Running FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/e6a53e16-6629-4a73-a072-2dea3c36ebbd"  FACTER_fqdn="overcloud-compute-0.localdomain"  FACTER_deploy_config_name="ComputeDeployment_Step3"  puppet apply --detailed-exitcodes --modulepath /etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules /var/lib/heat-config/heat-config-puppet/e6a53e16-6629-4a73-a072-2dea3c36ebbd.pp
Oct 20 22:26:07 overcloud-compute-0 os-collect-config: [2019-10-20 22:26:07,654] (heat-config) [INFO] Return code 2
Oct 20 22:26:07 overcloud-compute-0 os-collect-config: [2019-10-20 22:26:07,655] (heat-config) [INFO] #033[mNotice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.#033[0m
Oct 20 22:26:07 overcloud-compute-0 os-collect-config: #033[mNotice: Compiled catalog for overcloud-compute-0.localdomain in environment production in 3.11 seconds#033[0m
Oct 20 22:26:07 overcloud-compute-0 os-collect-config: #033[mNotice: /Stage[main]/Tripleo::Profile::Base::Kernel/Kmod::Load[nf_conntrack_proto_sctp]/Exec[modprobe nf_conntrack_proto_sctp]/returns: executed successfully#033[0m
Oct 20 22:26:07 overcloud-compute-0 os-collect-config: #033[mNotice: /Stage[main]/Neutron/Oslo::Messaging::Rabbit[neutron_config]/Neutron_config[oslo_messaging_rabbit/rabbit_hosts]/value: value changed ['10.10.10.11:5672,10.10.10.10:5672,10.10.10.13:5672'] to ['10.10.10.10:5672,10.10.10.13:5672,10.10.10.12:5672']#033[0m
Oct 20 22:26:07 overcloud-compute-0 os-collect-config: #033[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::config::end]: Triggered 'refresh' from 1 events#033[0m
Oct 20 22:26:07 overcloud-compute-0 os-collect-config: #033[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::service::begin]: Triggered 'refresh' from 1 events#033[0m


Version-Release number of selected component (if applicable):
puppet-neutron-9.5.0-4.el7ost.noarch                        Sun Jan 14 15:03:16 2018


How reproducible:
This time

Steps to Reproduce:
1. Replaced a controller 
2.
3.

Actual results:
neutron-openvswitch-agent failed to connect to the rabbitmq hosts that all changed

Expected results:
neutron-openvswitch-agent should've restarted 

Additional info:

Comment 20 Greg Rakauskas 2020-03-25 21:45:21 UTC
Hi,

The necessary changes have been added to the "Director Installation ans Usage" guide.

Customers can view these changes on the Red Hat Customer Portal now. (The 
details are below.)

Thanks,
--Greg

   - The NOTE in step 6, here:
     -------------------------
     https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/index#preparing-for-controller-replacement

   - The last step, here:
     --------------------
     https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/index#preparing-for-controller-

Comment 21 Greg Rakauskas 2020-11-17 17:42:57 UTC
Hi,

As discovered in BZ 1868236, restarting the OVS agent is unecessary.

While a restart is required for RHOSP 10, it is unnecessary in RHOSP 13 and
later.

I am removing the changes made to the "Director Installation and Usage" guide in
these BZs:

   - BZ 1763892 - RHOSP 13 
   - BZ 1815285 - RHOSP 15 
   - BZ 1815286 - RHOSP 16.0 

--Greg

Comment 22 Greg Rakauskas 2020-11-17 20:17:56 UTC
Hi,

Removed step 7 in the "Director Installation and Usage" guide topic, "Preparing
for Controller replacement:"

   7. If you are using Open Virtual Switch (OVS) and replaced Controller nodes
      in the past without restarting the OVS agents, then restart the agents on
      the compute nodes before replacing this Controller. Restarting the OVS
      agents ensures that they have a full complement of RabbitMQ connections.

      Run the following command to restart the OVS agent:

      [heat-admin@overcloud-compute-0 ~]$ sudo docker restart neutron_ovs_agent

Removed step 11 in the "Director Installation and Usage" guide topic, "Cleaning
up after Controller node replacement:"

   11. If you are using Open Virtual Switch (OVS), and  the IP address for the
       Controller node has changed, then you must restart the OVS agent on all
       compute nodes:

       [heat-admin@overcloud-compute-0 ~]$ sudo podman restart neutron_ovs_agent

Customers can see these changes here:

   https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/index#preparing-for-controller-replacement

--Greg


Note You need to log in before you can comment on or make changes to this bug.