Description of problem: Following the replacement of a controller, all the controlplane IPs changed and openstack-openvswitch-agent wasn't restarted on any computes so all the ovs-agents were showing off as down in `neutron agent-list`: Oct 20 22:26:07 overcloud-compute-0 os-collect-config: [2019-10-20 22:26:07,660] (heat-config) [INFO] {"deploy_stdout": "\u001b[mNotice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.\u001b[0m\n\u001b[mNotice: Compiled catalog for overcloud-compute-0.localdomain in environment production in 3.11 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Kernel/Kmod::Load[nf_conntrack_proto_sctp]/Exec[modprobe nf_conntrack_proto_sctp]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron/Oslo::Messaging::Rabbit[neutron_config]/Neutron_config[oslo_messaging_rabbit/rabbit_hosts]/value: value changed ['10.10.10.11:5672,10.10.10.10:5672,10.10.10.13:5672'] to ['10.10.10.10:5672,10.10.10.13:5672,10.10.10.12:5672']\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::config::end]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::service::begin]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: Finished catalog run in 5.85 seconds\u001b[0m\n", "deploy_stderr": "exception: connect failed\n", "deploy_status_code": 0} Oct 20 22:26:07 overcloud-compute-0 os-collect-config: [2019-10-20 22:26:07,660] (heat-config) [DEBUG] [2019-10-20 22:25:16,767] (heat-config) [DEBUG] Running FACTER_heat_outputs_path="/var/run/heat-config/heat-config-puppet/e6a53e16-6629-4a73-a072-2dea3c36ebbd" FACTER_fqdn="overcloud-compute-0.localdomain" FACTER_deploy_config_name="ComputeDeployment_Step3" puppet apply --detailed-exitcodes --modulepath /etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules /var/lib/heat-config/heat-config-puppet/e6a53e16-6629-4a73-a072-2dea3c36ebbd.pp Oct 20 22:26:07 overcloud-compute-0 os-collect-config: [2019-10-20 22:26:07,654] (heat-config) [INFO] Return code 2 Oct 20 22:26:07 overcloud-compute-0 os-collect-config: [2019-10-20 22:26:07,655] (heat-config) [INFO] #033[mNotice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.#033[0m Oct 20 22:26:07 overcloud-compute-0 os-collect-config: #033[mNotice: Compiled catalog for overcloud-compute-0.localdomain in environment production in 3.11 seconds#033[0m Oct 20 22:26:07 overcloud-compute-0 os-collect-config: #033[mNotice: /Stage[main]/Tripleo::Profile::Base::Kernel/Kmod::Load[nf_conntrack_proto_sctp]/Exec[modprobe nf_conntrack_proto_sctp]/returns: executed successfully#033[0m Oct 20 22:26:07 overcloud-compute-0 os-collect-config: #033[mNotice: /Stage[main]/Neutron/Oslo::Messaging::Rabbit[neutron_config]/Neutron_config[oslo_messaging_rabbit/rabbit_hosts]/value: value changed ['10.10.10.11:5672,10.10.10.10:5672,10.10.10.13:5672'] to ['10.10.10.10:5672,10.10.10.13:5672,10.10.10.12:5672']#033[0m Oct 20 22:26:07 overcloud-compute-0 os-collect-config: #033[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::config::end]: Triggered 'refresh' from 1 events#033[0m Oct 20 22:26:07 overcloud-compute-0 os-collect-config: #033[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::service::begin]: Triggered 'refresh' from 1 events#033[0m Version-Release number of selected component (if applicable): puppet-neutron-9.5.0-4.el7ost.noarch Sun Jan 14 15:03:16 2018 How reproducible: This time Steps to Reproduce: 1. Replaced a controller 2. 3. Actual results: neutron-openvswitch-agent failed to connect to the rabbitmq hosts that all changed Expected results: neutron-openvswitch-agent should've restarted Additional info:
Hi, The necessary changes have been added to the "Director Installation ans Usage" guide. Customers can view these changes on the Red Hat Customer Portal now. (The details are below.) Thanks, --Greg - The NOTE in step 6, here: ------------------------- https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/index#preparing-for-controller-replacement - The last step, here: -------------------- https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/index#preparing-for-controller-
Hi, As discovered in BZ 1868236, restarting the OVS agent is unecessary. While a restart is required for RHOSP 10, it is unnecessary in RHOSP 13 and later. I am removing the changes made to the "Director Installation and Usage" guide in these BZs: - BZ 1763892 - RHOSP 13 - BZ 1815285 - RHOSP 15 - BZ 1815286 - RHOSP 16.0 --Greg
Hi, Removed step 7 in the "Director Installation and Usage" guide topic, "Preparing for Controller replacement:" 7. If you are using Open Virtual Switch (OVS) and replaced Controller nodes in the past without restarting the OVS agents, then restart the agents on the compute nodes before replacing this Controller. Restarting the OVS agents ensures that they have a full complement of RabbitMQ connections. Run the following command to restart the OVS agent: [heat-admin@overcloud-compute-0 ~]$ sudo docker restart neutron_ovs_agent Removed step 11 in the "Director Installation and Usage" guide topic, "Cleaning up after Controller node replacement:" 11. If you are using Open Virtual Switch (OVS), and the IP address for the Controller node has changed, then you must restart the OVS agent on all compute nodes: [heat-admin@overcloud-compute-0 ~]$ sudo podman restart neutron_ovs_agent Customers can see these changes here: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/director_installation_and_usage/index#preparing-for-controller-replacement --Greg