Bug 2059189

Summary: [OSP16.2] Neutron with ML2/OVS - Can't do live-migration. RARP packets dropped
Product: Red Hat OpenStack Reporter: ggrimaux
Component: openstack-neutronAssignee: Rodolfo Alonso <ralonsoh>
Status: CLOSED NOTABUG QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: ccamposr, chrisw, ralonsoh, scohen
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-29 08:54:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ggrimaux 2022-02-28 13:05:37 UTC
Description of problem:
Since FFU 13 to 16.2.1 Live migration is not working.

RARP packets after live migration are dropped and ping packets are lost.

The issue is similar to:
https://bugzilla.redhat.com/show_bug.cgi?id=2033156

Here's debugging done for this:
tcpdump on the destination compute on the VM's qvo Interface:

while true; do tcpdump -nne -i qvo02b333cd-9f ; done

14:19:03.164647 fa:16:3e:21:f9:82 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:21:f9:82 tell fa:16:3e:21:f9:82, length 46
14:19:03.166141 fa:16:3e:21:f9:82 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.63.26.172 tell 10.63.26.172, length 28
14:19:03.166269 fa:16:3e:21:f9:82 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 86: fe80::f816:3eff:fe21:f982 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::f816:3eff:fe21:f982, length 32
14:19:03.166352 fa:16:3e:21:f9:82 > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 90: fe80::f816:3eff:fe21:f982 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
14:19:03.203730 fa:16:3e:21:f9:82 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:21:f9:82 tell fa:16:3e:21:f9:82, length 46
14:19:03.216256 fa:16:3e:21:f9:82 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.63.26.172 tell 10.63.26.172, length 28
14:19:03.216280 fa:16:3e:21:f9:82 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 86: fe80::f816:3eff:fe21:f982 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::f816:3eff:fe21:f982, length 32
14:19:03.216281 fa:16:3e:21:f9:82 > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 90: fe80::f816:3eff:fe21:f982 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
14:19:03.353722 fa:16:3e:21:f9:82 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:21:f9:82 tell fa:16:3e:21:f9:82, length 46
14:19:03.366310 fa:16:3e:21:f9:82 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.63.26.172 tell 10.63.26.172, length 28
14:19:03.366314 fa:16:3e:21:f9:82 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 86: fe80::f816:3eff:fe21:f982 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::f816:3eff:fe21:f982, length 32
14:19:03.366315 fa:16:3e:21:f9:82 > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 90: fe80::f816:3eff:fe21:f982 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
14:19:03.603707 fa:16:3e:21:f9:82 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:21:f9:82 tell fa:16:3e:21:f9:82, length 46
14:19:03.616351 fa:16:3e:21:f9:82 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.63.26.172 tell 10.63.26.172, length 28
14:19:03.616357 fa:16:3e:21:f9:82 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 86: fe80::f816:3eff:fe21:f982 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::f816:3eff:fe21:f982, length 32
14:19:03.616358 fa:16:3e:21:f9:82 > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 90: fe80::f816:3eff:fe21:f982 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
14:19:03.953724 fa:16:3e:21:f9:82 > ff:ff:ff:ff:ff:ff, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:21:f9:82 tell fa:16:3e:21:f9:82, length 46
14:19:03.966218 fa:16:3e:21:f9:82 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 10.63.26.172 tell 10.63.26.172, length 28
14:19:03.966223 fa:16:3e:21:f9:82 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 86: fe80::f816:3eff:fe21:f982 > ff02::1: ICMP6, neighbor advertisement, tgt is fe80::f816:3eff:fe21:f982, length 32
14:19:03.966224 fa:16:3e:21:f9:82 > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 90: fe80::f816:3eff:fe21:f982 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
14:19:05.986309 00:22:bd:f8:19:ff > fa:16:3e:21:f9:82, ethertype IPv4 (0x0800), length 98: 10.99.12.194 > 10.63.26.172: ICMP echo request, id 48904, seq 1, length 64
14:19:05.986506 fa:16:3e:21:f9:82 > 00:22:bd:f8:19:ff, ethertype IPv4 (0x0800), length 98: 10.63.26.172 > 10.99.12.194: ICMP echo reply, id 48904, seq 1, length 64
14:19:07.007854 00:22:bd:f8:19:ff > fa:16:3e:21:f9:82, ethertype IPv4 (0x0800), length 98: 10.99.12.194 > 10.63.26.172: ICMP echo request, id 48904, seq 2, length 64
14:19:07.008058 fa:16:3e:21:f9:82 > 00:22:bd:f8:19:ff, ethertype IPv4 (0x0800), length 98: 10.63.26.172 > 10.99.12.194: ICMP echo reply, id 48904, seq 2, length 64
ofproto trace to see when I openflow rule got installed:


while true; do date +'%T.%6N'; ovs-appctl ofproto/trace br-int in_port=qvo02b333cd-9f,dl_src=fa:16:3e:21:f9:82,dl_dst=ff:ff:ff:ff:ff:ff; done

14:19:04.524891
Flow: in_port=16,vlan_tci=0x0000,dl_src=fa:16:3e:21:f9:82,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0000

bridge("br-int")
----------------
 0. priority 0, cookie 0x48fe92598072c04b
    goto_table:60
60. priority 3, cookie 0x48fe92598072c04b
    NORMAL
     -> no learned MAC for destination, flooding

    bridge("br-link")
    -----------------
         0. in_port=2, priority 2, cookie 0x939396ea29341af8
            drop

bridge("br-ex")
---------------
 0. in_port=1, priority 2, cookie 0xebac2dc27a47b9d8
    drop

Final flow: unchanged
Megaflow: recirc_id=0,eth,in_port=16,dl_src=fa:16:3e:21:f9:82,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0000
Datapath actions: 2
---
14:19:04.529480
Flow: in_port=16,vlan_tci=0x0000,dl_src=fa:16:3e:21:f9:82,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0000

bridge("br-int")
----------------
 0. priority 0, cookie 0x48fe92598072c04b
    goto_table:60
60. priority 3, cookie 0x48fe92598072c04b
    NORMAL
     -> no learned MAC for destination, flooding

    bridge("br-link")
    -----------------
         0. in_port=2,dl_vlan=10, priority 4, cookie 0x939396ea29341af8
            set_field:4566->vlan_vid
            NORMAL
             -> no learned MAC for destination, flooding

bridge("br-ex")
---------------
 0. in_port=1, priority 2, cookie 0xebac2dc27a47b9d8
    drop

Final flow: unchanged
Megaflow: recirc_id=0,eth,in_port=16,dl_src=fa:16:3e:21:f9:82,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0000
Datapath actions: push_vlan(vid=10,pcp=0),2,pop_vlan,push_vlan(vid=470,pcp=0),4,3
logs from openvswitch-agent.log of the relevant time


/var/log/containers/neutron/openvswitch-agent.log
2022-02-23 14:19:04.494 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Agent rpc_loop - iteration:40540 started
2022-02-23 14:19:04.495 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Agent rpc_loop - iteration:40540 - starting polling. Elapsed:0.002
2022-02-23 14:19:04.496 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Agent rpc_loop - iteration:40540 - port information retrieved. Elapsed:0.003
2022-02-23 14:19:04.497 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Port 02b333cd-9f09-4735-a95b-4a6071b67c19 updated. Details: {'device': '02b333cd-9f09-4735-a95b-4a6071b67c19', 'device_id': 'acb27759-8e76-4a89-849c-0517295e0a6e', 'ne
twork_id': '149ff7ec-140a-4e8b-8de6-12c954c34ec2', 'port_id': '02b333cd-9f09-4735-a95b-4a6071b67c19', 'mac_address': 'fa:16:3e:21:f9:82', 'admin_state_up': True, 'network_type': 'vlan', 'segmentation_id': 470, 'physical_network': 'datacentre', 'fixed_ips': [{'subnet_id': 'bc131b06-586f-497e-a806-e86ee8bb3435', 'ip_add
ress': '10.63.26.172'}], 'device_owner': 'compute:staging', 'allowed_address_pairs': [], 'port_security_enabled': False, 'qos_policy_id': None, 'network_qos_policy_id': None, 'profile': {}, 'vif_type': 'ovs', 'vnic_type': 'normal', 'security_groups': [], 'migrating_to': None}
2022-02-23 14:19:04.497 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Assigning 10 as local vlan for net-id=149ff7ec-140a-4e8b-8de6-12c954c34ec2
2022-02-23 14:19:04.502 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] process_network_ports - iteration:40540 - treat_devices_added_or_updated completed. Skipped 0 and no activated binding devices 0 of 1 devices currently available. Time
 elapsed: 0.006
2022-02-23 14:19:04.504 80730 INFO neutron.agent.securitygroups_rpc [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Preparing filters for devices {'02b333cd-9f09-4735-a95b-4a6071b67c19'}
2022-02-23 14:19:04.524 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] process_network_ports - iteration:40540 - agent port security group processed in 0.028
2022-02-23 14:19:04.525 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Skipping ARP spoofing rules for port 'qvo02b333cd-9f' because it has port security disabled
2022-02-23 14:19:04.679 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Configuration for devices up ['02b333cd-9f09-4735-a95b-4a6071b67c19'] and devices down [] completed.
2022-02-23 14:19:04.679 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Agent rpc_loop - iteration:40540 - ports processed. Elapsed:0.186
2022-02-23 14:19:04.679 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Agent rpc_loop - iteration:40540 completed. Processed ports statistics: {'regular': {'added': 1, 'updated': 0, 'removed': 0}}. Elapsed:0.186
2022-02-23 14:19:06.494 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Agent rpc_loop - iteration:40541 started
2022-02-23 14:19:06.495 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Agent rpc_loop - iteration:40541 - starting polling. Elapsed:0.001
2022-02-23 14:19:06.496 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Agent rpc_loop - iteration:40541 - port information retrieved. Elapsed:0.002
2022-02-23 14:19:06.497 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Port 02b333cd-9f09-4735-a95b-4a6071b67c19 updated. Details: {'device': '02b333cd-9f09-4735-a95b-4a6071b67c19', 'device_id': 'acb27759-8e76-4a89-849c-0517295e0a6e', 'ne
twork_id': '149ff7ec-140a-4e8b-8de6-12c954c34ec2', 'port_id': '02b333cd-9f09-4735-a95b-4a6071b67c19', 'mac_address': 'fa:16:3e:21:f9:82', 'admin_state_up': True, 'network_type': 'vlan', 'segmentation_id': 470, 'physical_network': 'datacentre', 'fixed_ips': [{'subnet_id': 'bc131b06-586f-497e-a806-e86ee8bb3435', 'ip_add
ress': '10.63.26.172'}], 'device_owner': 'compute:staging', 'allowed_address_pairs': [], 'port_security_enabled': False, 'qos_policy_id': None, 'network_qos_policy_id': None, 'profile': {}, 'vif_type': 'ovs', 'vnic_type': 'normal', 'security_groups': [], 'migrating_to': None}
2022-02-23 14:19:06.499 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] process_network_ports - iteration:40541 - treat_devices_added_or_updated completed. Skipped 0 and no activated binding devices 0 of 1 devices currently available. Time
 elapsed: 0.003
2022-02-23 14:19:06.499 80730 INFO neutron.agent.securitygroups_rpc [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Refresh firewall rules
2022-02-23 14:19:06.511 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] process_network_ports - iteration:40541 - agent port security group processed in 0.015
2022-02-23 14:19:06.512 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Skipping ARP spoofing rules for port 'qvo02b333cd-9f' because it has port security disabled
2022-02-23 14:19:06.617 80730 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-2a83d008-fb7a-4ebb-9b74-6b215261e63f - - - - -] Configuration for devices up ['02b333cd-9f09-4735-a95b-4a6071b67c19'] and devices down [] completed.


In our case the port is ovs_hybrid_plug set to true:
(overcloud) [stack@director ~]$ openstack port show 1f89def1-9942-4327-8348-f90d8fa6268a
+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                   | Value                                                                                                                                            |
+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up          | UP                                                                                                                                               |
| allowed_address_pairs   |                                                                                                                                                  |
| binding_host_id         | ...                                                                                                                  |
| binding_profile         |                                                                                                                                                  |
| binding_vif_details     | bridge_name='br-int', connectivity='l2', datapath_type='system', ovs_hybrid_plug='True', port_filter='True'                                      |
| binding_vif_type        | ovs                                                                                                                                              |
| binding_vnic_type       | normal                                                                                                                                           |
| created_at              | 2022-02-25T08:43:55Z                                                                                                                             |
| data_plane_status       | None                                                                                                                                             |
| description             |                                                                                                                                                  |
| device_id               | 7bba5fdd-0cba-4373-b3da-39ceac0780f9                                                                                                             |
| device_owner            | compute:staging                                                                                                                                  |
| dns_assignment          | None                                                                                                                                             |
| dns_domain              | None                                                                                                                                             |
| dns_name                | None                                                                                                                                             |
| extra_dhcp_opts         |                                                                                                                                                  |
| fixed_ips               | ip_address='10.63.26.234', subnet_id='bc131b06-586f-497e-a806-e86ee8bb3435'                                                                      |
| id                      | 1f89def1-9942-4327-8348-f90d8fa6268a                                                                                                             |
| location                | cloud='', project.domain_id=, project.domain_name=, project.id='b6684a1f0b1044eab46f45361c94f312', project.name=, region_name='regionOne', zone= |
| mac_address             | fa:16:3e:da:e8:ea                                                                                                                                |
| name                    |                                                                                                                                                  |
| network_id              | 149ff7ec-140a-4e8b-8de6-12c954c34ec2                                                                                                             |
| port_security_enabled   | False                                                                                                                                            |
| project_id              | b6684a1f0b1044eab46f45361c94f312                                                                                                                 |
| propagate_uplink_status | None                                                                                                                                             |
| qos_policy_id           | None                                                                                                                                             |
| resource_request        | None                                                                                                                                             |
| revision_number         | 4                                                                                                                                                |
| security_group_ids      |                                                                                                                                                  |
| status                  | ACTIVE                                                                                                                                           |
| tags                    |                                                                                                                                                  |
| trunk_details           | None                                                                                                                                             |
| updated_at              | 2022-02-25T08:43:58Z                                                                                                                             |
+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+

This was working fine before FFU (13 to 16.2.1)

We have a sosreport from the destination compute.

I am working on this with a RH Consultant that can get you anything need to troubleshoot this.

We need your help to understand why it fails here.


Version-Release number of selected component (if applicable):
OSP16.2.1
With ML2/OVS

How reproducible:
100% on this environment.

Steps to Reproduce:
1. Try a live migration.
2.
3.

Actual results:
Live-migration fails.

Expected results:
Live-migration works.

Additional info:
sosreport from destination compute.

Comment 5 ggrimaux 2023-06-29 08:50:09 UTC
Enabling the following solved the issue:
  ComputeExtraConfig:
    neutron::config::server_config:
      nova/live_migration_events:
        value: True


You may close this BZ!

Thank you!