Description of problem: In previous versions of OSP, we tell the end user to cleanup neutron agents when removing nodes (e.g. controllers). With the switch to OVN, you cannot actually remove an ovn-controller because it errors. +--------------------------------------+----------------------+---------------------------+-------------------+-------+----------------+-------------------------------+ | id | agent_type | host | availability_zone | alive | admin_state_up | binary | +--------------------------------------+----------------------+---------------------------+-------------------+-------+----------------+-------------------------------+ | 0c447e90-4aa9-42c1-8d2e-d86a5c6bbcbb | OVN Controller agent | controller-0.redhat.local | n/a | xxx | True | ovn-controller | | 331cc8b5-8b83-4ffa-8efe-e50aaf43b2c7 | OVN Controller agent | compute-1.redhat.local | n/a | :-) | True | ovn-controller | | 4bbf6c90-e5cf-42c6-9f69-949d64484640 | OVN Metadata agent | compute-1.redhat.local | n/a | :-) | True | networking-ovn-metadata-agent | | e914fb7f-4134-4453-90f3-9d89f79647e1 | OVN Controller agent | controller-1.redhat.local | n/a | :-) | True | ovn-controller | | 9ac0774e-9df0-4f83-b7ab-8f63e8877724 | OVN Controller agent | controller-2.redhat.local | n/a | :-) | True | ovn-controller | | cec804db-8c02-47d7-b7cc-304f8aafc7b7 | OVN Controller agent | compute-0.redhat.local | n/a | :-) | True | ovn-controller | | 01471f00-85d6-41b5-93cf-78b2105c08dd | OVN Metadata agent | compute-0.redhat.local | n/a | :-) | True | networking-ovn-metadata-agent | | 9d728dc6-137b-4cb7-8d36-81d0fd63a847 | OVN Controller agent | compute-2.redhat.local | n/a | :-) | True | ovn-controller | | 11866321-a80b-4db4-a8a8-e1dd939910ba | OVN Metadata agent | compute-2.redhat.local | n/a | :-) | True | networking-ovn-metadata-agent | | bb8c82e6-2982-45da-a7a1-33b60a696dba | OVN Controller agent | controller-3.redhat.local | n/a | :-) | True | ovn-controller | +--------------------------------------+----------------------+---------------------------+-------------------+-------+----------------+-------------------------------+ source /home/stack/overcloudrc neutron agent-delete 0c447e90-4aa9-42c1-8d2e-d86a5c6bbcbb Bad agent request: OVN agents cannot be deleted. Neutron server returns request_ids: ['req-88d8234f-4dc9-4f09-8375-91cdba356e3c']"] Version-Release number of selected component (if applicable): How reproducible: Every time Steps to Reproduce: 1. Deploy a cloud with OVN 2. Have a controller fail 3. Replace the controller with a new one 4. Attempt to cleanup the old neutron 'agent' Actual results: You cannot remove an ovn-controller 'agent' Expected results: We should be able to remove resources for non-existent systems. Additional info: Related Bug 1695073
This was by design as deleting an agent was not well-defined for networking-ovn. We don't, exactly, have agents. So the implementation of the agent api was mapped as best we could. We specifically return NotImplemented when deleting an agent. The "controller" agent, is essentially a OVN_Southbound DB Chassis entry. When shutdown cleanly, ovn-controller will remove this entry and it will disappear automatically from the agent list. If there is a network interruption or hardware failure, after the configurable "alive timeout" has passed, it will show up as dead. In the case that the server is never coming back, we could just delete the Chassis entry ourselves if the "agent delete" API request is received. We should probably only allow the delete request when the agent shows up as down. If it is up and we delete the entry, I think ovn-controller will just re-add it. I can't imagine that we'd actually kill agent processes/keep them from restarting/etc. on an agent delete call. It would solely be for cleaning up the agent list display when a server was truly already gone. I'd have to check, but I think we could do something similar with the external_ids that store the metadata agent info.
The upstream patches need to be backported to 16.1. In addition to adding support for deleting agents, it also increases performance by minimizing the amount of db writes (which are replicated to each connection on each server).
*** Bug 1946835 has been marked as a duplicate of this bug. ***
*** Bug 1982130 has been marked as a duplicate of this bug. ***
*** Bug 1887866 has been marked as a duplicate of this bug. ***
I don't know why is this BZ still in MODIFIED. It's been released in 16.1.7
Hi Jakub, Looks like this was actually released in 16.1.8 looking at the package version in the container catalogue: https://catalog.redhat.com/software/containers/rhosp-rhel8/openstack-neutron-server-ovn/5de6be20dd19c71643b78104?tag=16.1.7-12.1646286259&push_date=1646661535000 https://catalog.redhat.com/software/containers/rhosp-rhel8/openstack-neutron-server-ovn/5de6be20dd19c71643b78104?tag=16.1.8-7&push_date=1648122338000 Looks like there might have been some confusion with the fixed in version "Fixed In Version: python-networking-ovn-7.3.1-1.20210714143306.el8ost → python-networking-ovn-7.3.1-1.20210809163307.4e24f4c.el8ost" We just had some confusion with one of the customers and double checked it, please let me know if this is correct.
(In reply to ldenny from comment #9) > Hi Jakub, > > Looks like this was actually released in 16.1.8 looking at the package > version in the container catalogue: > > https://catalog.redhat.com/software/containers/rhosp-rhel8/openstack-neutron- > server-ovn/5de6be20dd19c71643b78104?tag=16.1.7-12. > 1646286259&push_date=1646661535000 > > https://catalog.redhat.com/software/containers/rhosp-rhel8/openstack-neutron- > server-ovn/5de6be20dd19c71643b78104?tag=16.1.8-7&push_date=1648122338000 > > Looks like there might have been some confusion with the fixed in version > > "Fixed In Version: python-networking-ovn-7.3.1-1.20210714143306.el8ost → > python-networking-ovn-7.3.1-1.20210809163307.4e24f4c.el8ost" > > We just had some confusion with one of the customers and double checked it, > please let me know if this is correct. You're right - looking at the changelog: 16.1.7: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1791046 - Simplify OVN Agent API implementation (rhbz#1788336) - Avoid race condition when processing RowEvents (rhbz#1788336) 16.1.8: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=1873883 - Reset "AgentCache" singleton in functional tests (rhbz#1788336) - Don't update AgentCache when Chassis_Private.chassis == [] (rhbz#1788336) - Convert OvnDbNotifyHandler rows to frozen rows (rhbz#1788336) - Add support for deleting ml2/ovn agents (rhbz#1788336) - Simplify OVN Agent API implementation (rhbz#1788336) - Avoid race condition when processing RowEvents (rhbz#1788336) Now since this was not tested, I wonder if we should target it to z9 to get it properly verified. I'm moving this to ON_QA. Thanks for pointing this out Lewis!
Awesome Jakub, thanks for the follow up!
According to our records, this should be resolved by python-networking-ovn-7.3.1-1.20220113183502.el8ost. This build is available now.
*** Bug 2114723 has been marked as a duplicate of this bug. ***