Description of problem: After performing a compute scale down on RHOS 16.1, network agents for the deleted compute host still exist and for OVN deployments these agents cannot be deleted. For OVS deployments, the agents reappear after they have been deleted. Related Docs BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1841011 How reproducible: always Steps to Reproduce: 1. Deploy RHOS 16.1 with at least 2 compute nodes 2. Scale down 1 compute host 3. OVN agents for the deleted compute are still up and running Actual results: (overcloud) [stack@undercloud-0 ~]$ openstack network agent list +--------------------------------------+----------------------+---------------------------+-------------------+-------+-------+-------------------------------+ | ID | Agent Type | Host | Availability Zone | Alive | State | Binary | +--------------------------------------+----------------------+---------------------------+-------------------+-------+-------+-------------------------------+ | 79362159-1532-4473-b4cf-295ac7970cb9 | OVN Controller agent | compute-0.redhat.local | n/a | XXX | UP | ovn-controller | | 49991374-00fa-4d70-9dc1-598e9c4c83d9 | OVN Metadata agent | compute-0.redhat.local | n/a | XXX | UP | networking-ovn-metadata-agent | | a0eeb079-4fe1-4bb0-be14-0a57a0c487ce | OVN Controller agent | compute-1.redhat.local | n/a | :-) | UP | ovn-controller | | 2cee6616-96c5-4bd8-89e9-12416e663a3d | OVN Metadata agent | compute-1.redhat.local | n/a | :-) | UP | networking-ovn-metadata-agent | | 8b8b2933-9ab1-4740-9358-fcabf5b4b88e | OVN Controller agent | controller-0.redhat.local | n/a | :-) | UP | ovn-controller | +--------------------------------------+----------------------+---------------------------+-------------------+-------+-------+-------------------------------+ (overcloud) [stack@undercloud-0 ~]$ openstack network agent delete 79362159-1532-4473-b4cf-295ac7970cb9 Failed to delete network agent with ID '79362159-1532-4473-b4cf-295ac7970cb9': BadRequestException: 400: Client Error for url: http://10.0.0.148:9696/v2.0/agents/79362159-1532-4473-b4cf-295ac7970cb9, Bad agent request: OVN agents cannot be deleted. 1 of 1 network agents failed to delete. (overcloud) [stack@undercloud-0 ~]$ openstack network agent delete 49991374-00fa-4d70-9dc1-598e9c4c83d9 Failed to delete network agent with ID '49991374-00fa-4d70-9dc1-598e9c4c83d9': BadRequestException: 400: Client Error for url: http://10.0.0.148:9696/v2.0/agents/49991374-00fa-4d70-9dc1-598e9c4c83d9, Bad agent request: OVN agents cannot be deleted. 1 of 1 network agents failed to delete. Expected results: Agents for the deleted node should not exist at all and should be done as a part of tripleo scale down tasks Additional info: Documentation: Section 15.3. Removing Compute nodes https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/director_installation_and_usage/scaling-overcloud-nodes From the related BZ, notes: ``` Dan Macpherson tested this on 16.1 - results: 9. Remove the Open vSwitch agent from the node: DDF: agents cant be deleted Dan: This is true for OVN agents. This step can be deleted. For OVS agents (not applicable to this procedure tho): You can delete the agent but it gets recreated after you delete it. So we probably need to find out what keeps recreating the agent and disable it if possible. ```
*** Bug 1975264 has been marked as a duplicate of this bug. ***
Hello, My Customer (Bank Of Italy) is asking a FIX of this problem in OSP 16.2.2 (or next Z stream). Could you provide me a progress about this Bugzilla ? Thank you so much Riccardo
Hello, As point raised at Comment 25, could you include this bugzilla (and the solution connected to it) to the "Removing Compute nodes" procedure [1] Thank you so much in advance Riccardo [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/director_installation_and_usage/assembly_scaling-overcloud-nodes#proc_removing-compute-nodes_scaling-overcloud-nodes
In general, the agent api is designed so that one *has* to manually call openstack network agent delete when they want an agent to be deleted. It will not (and should not) ever disappear on its own. It should show as down if the agent has not been reachable in DEFAULT.agent_down_time seconds.
So if we want those agents to disappear as part of some scale down operation or whatever, then tripleo / whatever is doing the scale down procedure will need to call the agent delete command for the agents that are on those nodes. The docs here https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/director_installation_and_usage/assembly_scaling-overcloud-nodes#proc_removing-compute-nodes_scaling-overcloud-nodes also mention having to call 'openstack network agent delete'
[600+ days bug note] Terry to follow up with support if comment 32 is sufficient resolution
*** Bug 2168403 has been marked as a duplicate of this bug. ***
*** Bug 2068069 has been marked as a duplicate of this bug. ***
*** Bug 2064794 has been marked as a duplicate of this bug. ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days