Bug 1932187 - No notification for port has been unplugged event when VM is turned off
Summary: No notification for port has been unplugged event when VM is turned off
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-24 08:06 UTC by Alex Katz
Modified: 2023-03-21 19:40 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-28 14:37:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-4282 0 None None None 2022-08-17 15:08:11 UTC

Description Alex Katz 2021-02-24 08:06:47 UTC
Description of problem:
When I run 'openstack server stop <server_uuid>' I don't see any logs for `network-vif-unplugged` event in the server.log file (checked all the controller nodes). Port remains in ACTIVE state as per 'openstack port list' output.

If I run 'openstack server start <server_uuid>' I see logs for `network-vif-plugged` as well as for `network-vif-unplugged` events in the server.log file:

2021-02-22 13:46:14.616 33 DEBUG neutron.notifiers.nova [-] Sending events: [{'server_uuid': '52928e3f-5cd8-4437-b818-89335ec88b58', 'name': 'network-vif-unplugged', 'status': 'completed', 'tag': '522f0812-fa63-47c6-8c85-12585203673f'}] send_events /usr/lib/python3.6/site-packages/neutron/notifiers/nova.py:246
2021-02-22 13:46:15.191 33 INFO neutron.notifiers.nova [-] Nova event response: {'server_uuid': '52928e3f-5cd8-4437-b818-89335ec88b58', 'name': 'network-vif-unplugged', 'status': 'completed', 'tag': '522f0812-fa63-47c6-8c85-12585203673f', 'code': 200}
2021-02-22 13:46:15.973 32 DEBUG neutron.notifiers.nova [-] Sending events: [{'server_uuid': '52928e3f-5cd8-4437-b818-89335ec88b58', 'name': 'network-vif-plugged', 'status': 'completed', 'tag': '522f0812-fa63-47c6-8c85-12585203673f'}] send_events /usr/lib/python3.6/site-packages/neutron/notifiers/nova.py:246
2021-02-22 13:46:17.056 32 INFO neutron.notifiers.nova [-] Nova event response: {'server_uuid': '52928e3f-5cd8-4437-b818-89335ec88b58', 'name': 'network-vif-plugged', 'status': 'completed', 'tag': '522f0812-fa63-47c6-8c85-12585203673f', 'code': 200}


Version-Release number of selected component (if applicable):
RHOS-16.1-RHEL-8-20210216.n.1


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Rodolfo Alonso 2021-03-02 18:21:17 UTC
Hi Alex:

Just for the record, the environment you lent me has OVS with hybrid plugin. That means Nova creates a Linux Bridge that is connected to the VM from one side and to OVS, using a veth pair, in the other side: https://docs.openstack.org/neutron/pike/contributor/internals/openvswitch_agent.html.

The problem is that when the VM is stopped, the TAP device (light blue in the link sent) is deleted, but the OVS port (qvo-xxx) it is not. For the OVS agent (and so for the Neutron server), this port exists and is active. When the VM is restarted, the Linux Bridge and the veth pair is re-recreated (deleted and created again). During a short time, the OVS agent notifies that the port has been deleted (vif-unplugged) and attached again (vif-plugged).

Of course, this is not the expected behaviour: when the VM is stopped (and the TAP port deleted), we should reflect in Neutron server the correct status of this port. But we don't receive any information from Nova, via an RPC call or, the easiest way, deleting the Linux Bridge and the veth pair. If the veth pair is deleted, the OVS agent will detect this event and will inform Neutron server, that will set the port to DOWN.

You should flip this BZ to Nova.

Regards.

Comment 3 Alex Katz 2021-03-03 10:26:09 UTC
Hi Rodolfo,

Thank you for debugging and the update.

Comment 4 smooney 2021-03-05 15:53:23 UTC
os vif should be already deletaing the lunxubridge and veth pair.
https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L313-L324
and nova should be calling unplug as part of server stop.
looking at poweroff

https://github.com/openstack/nova/blob/db666e2118972e501637141e48164a94f9bead54/nova/virt/libvirt/driver.py#L3576-L3580
the libvirt driver just calls destoy it does not call unplug_vifs so that is the issue here.

Comment 6 smooney 2021-03-05 19:58:08 UTC
by the way just to be clear there is no is no api contract that say if you stop a vm its port status will change.
that is not a valid exception to have.
for sriov for example it wont and it never has for ovs with ip tables.
it may have for ovn because libvirt was doing a port delete on domain destoy but that is not part of the nova or neutron api
guarantees and this is something that is up to each virt dirver to implemnt differently.

for example i would expect that ironic will not do anythingto update teh status of the neutron port when the server is powered off.
so at the api level you should not be able to observe this behavior in anyway that you can make decision with it. it is not part
of the public api contract.

Comment 11 smooney 2021-03-09 12:45:58 UTC
os-vif use --may-exist when add the port so that it wont fail if its already there (e.g. when you restart nova-compute) and wont break network connectivity for the guest.

libvirt on the other hand does ovs-vsctl --if-exists del-port then an add port on domain create normally this would cause the ovs port to be deleted and recretaed but
since we destory the domian on power off libvirt already deleteded it then so when it powers on it just create a new port since there is none ot delete.

in the libvirt diriver power_on is implemented by calling hard_reboot (start and stop were technically optional orgininally but reboot was required so it was added first.)
hard_reboot calls destory https://github.com/openstack/nova/blob/31889ce296d1e1a62fe5825292479009118ddfab/nova/virt/libvirt/driver.py#L3454-L3455
which evenutally calls _unplug_vifs https://github.com/openstack/nova/blob/31889ce296d1e1a62fe5825292479009118ddfab/nova/virt/libvirt/driver.py#L1496-L1497

which for iptables will call  _unplug_bridge https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L313-L324
this will delete the linux bridge an veth pair.

it then calls _create_guest_with_network https://github.com/openstack/nova/blob/31889ce296d1e1a62fe5825292479009118ddfab/nova/virt/libvirt/driver.py#L3498-L3500
which will plug the vifs again using os-vif https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L203-L225 which recreates the bridge and veth pair.

this is the correct behavior for hard reboot and power on is just hard reboot we could skip the call to destroy technically for power on but this has been the behaviour
nova had since before neutron was a project so there has been no regression here.

I would suggest deleting the tobiko test as this api is an internal api just for inter service synchronization.
we even have a warning to that effect https://docs.openstack.org/api-ref/compute/#create-external-events-os-server-external-events

ill add this to the ptg adgenda for discussion. i think in general it would be valid to unplug vifs in power_off but since this is virt dirver and ml2 diver specific behavior
we should not be asserting this behavior outside of nova/neutron.

Comment 14 smooney 2021-07-28 14:37:11 UTC
we discussed this at the ptg and decided this is not a bug and it should not be fixed
https://etherpad.opendev.org/p/r.0a1509fc788d92391f50397f2ee4af9f  line 312


(sean-k-mooney): should we unplug vifs in power off

    (gibi): ping ralonsoh

    currently in the libvirt diver power_on is implmente by calling hard_reboot

    power_off undefines the libvirt domain but does not call unplug_vifs

    hard reboot will both destory the domain and clean up network interfaces.

    the current power_off behavior results in ports being left configured on ovs while the vm is off and then deletitng and recreating them on power on.

    nova has done this since before neutron was a project so its expected but should we tear down the backend networking config when we power off.

    also should we do the same for host mounted cinder volumes? i assume they are unmounted but the attachments are not unbound.

    i dont belive new calls to neutron or cinder to unbinidng the port binding or volume attachments woudl be correct but we might want to remove the configurtion form the host.

    context for this is https://bugzilla.redhat.com/show_bug.cgi?id=1932187 tobiko is a tempest alternitive that should not be asserting this behavior in test since is not part of the api contract

    and changes based on both the neutron ml2 driver and nova virt dirver so its not  generic. but it raises the question should we keep the logic we always had or should we unplug in power off.

    AGREED:

    (sean-k-mooney): Not a bug/wont fix

Comment 15 smooney 2021-07-28 14:38:34 UTC
*** Bug 1983937 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.