Bug 1966157
| Summary: | NeutronOVSFirewallDriver: openvswitch - ovn wrong openflow programming when vms >1 exist on the same host | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Yariv <yrachman> | |
| Component: | openvswitch | Assignee: | Haresh Khandelwal <hakhande> | |
| Status: | CLOSED EOL | QA Contact: | Miguel Angel Nieto <mnietoji> | |
| Severity: | high | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 16.2 (Train) | CC: | aconole, apevec, astupnik, atzin, chrisw, cmilleta, ctrautma, egarciar, ekuris, eolivare, ffernand, ggrimaux, hakhande, jiji, jpretori, lariel, lhh, lsvaty, majopela, maord, mburns, mgeary, mleitner, mmichels, mnietoji, nusiddiq, oblaut, ralonsoh, rsafrono, scohen, supadhya | |
| Target Milestone: | Alpha | Keywords: | AutomationBlocker, TestOnly, Triaged | |
| Target Release: | 17.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Known Issue | ||
| Doc Text: |
There is a limitation when using ML2/OVN with `provider:network_type geneve` with a Mellanox adapter on a Compute node that has more than one instance on the geneve network. The floating IP of only one of the instances will be reachable. You can track the progress of the resolution on this Bugzilla ticket.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 2014183 (view as bug list) | Environment: | ||
| Last Closed: | 2023-10-10 12:21:23 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
|
Description
Yariv
2021-05-31 14:30:04 UTC
BZ, opened due to regression for hw-offload rhel 8.4 this test is passing with ovs backend for 16.2 Another interesting issue overcloud) [stack@undercloud-0 ~]$ ping 10.35.141.162 PING 10.35.141.162 (10.35.141.162) 56(84) bytes of data. 64 bytes from 10.35.141.162: icmp_seq=2 ttl=61 time=26.9 ms ^C --- 10.35.141.162 ping statistics --- 2 packets transmitted, 1 received, 50% packet loss, time 1012ms rtt min/avg/max/mdev = 26.877/26.877/26.877/0.000 ms (overcloud) [stack@undercloud-0 ~]$ ping 10.35.141.163 PING 10.35.141.163 (10.35.141.163) 56(84) bytes of data. ^C --- 10.35.141.163 ping statistics --- 8 packets transmitted, 0 received, 100% packet loss, time 7158ms Waiting few minutes (overcloud) [stack@undercloud-0 ~]$ ping 10.35.141.163 PING 10.35.141.163 (10.35.141.163) 56(84) bytes of data. 64 bytes from 10.35.141.163: icmp_seq=1 ttl=61 time=26.10 ms 64 bytes from 10.35.141.163: icmp_seq=2 ttl=61 time=0.619 ms ^C --- 10.35.141.163 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.619/13.800/26.981/13.181 ms (overcloud) [stack@undercloud-0 ~]$ ping 10.35.141.162 PING 10.35.141.162 (10.35.141.162) 56(84) bytes of data. ^C --- 10.35.141.162 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1061ms See description, above sos report:
rhos-release.virt.bos.redhat.com:/var/www/html/log/ovn-hw-fip/
Adding the commands here:
[root@computeovshwoffload-0 ~]# ovs-vsctl show
edb712f9-21dd-4816-b0ab-e185994d2312
Manager "ptcp:6640:127.0.0.1"
is_connected: true
Bridge br-int
fail_mode: secure
datapath_type: system
Port ovn-f5aa6c-0
Interface ovn-f5aa6c-0
type: geneve
options: {csum="true", key=flow, remote_ip="10.10.161.127"}
bfd_status: {diagnostic="Control Detection Time Expired", flap_count="2", forwarding="false", remote_diagnostic="No Diagnostic", remote_state=down, state=down}
Port ovn-3ea0c7-0
Interface ovn-3ea0c7-0
type: geneve
options: {csum="true", key=flow, remote_ip="10.10.161.129"}
Port enp4s0f0_8
Interface enp4s0f0_8
Port ovn-1c2cb7-0
Interface ovn-1c2cb7-0
type: geneve
options: {csum="true", key=flow, remote_ip="10.10.161.135"}
bfd_status: {diagnostic="Control Detection Time Expired", flap_count="2", forwarding="false", remote_diagnostic="No Diagnostic", remote_state=down, state=down}
Port tap130d9819-20
Interface tap130d9819-20
Port enp4s0f0_0
Interface enp4s0f0_0
Port br-int
Interface br-int
type: internal
Port ovn-ccf1be-0
Interface ovn-ccf1be-0
type: geneve
options: {csum="true", key=flow, remote_ip="10.10.161.125"}
bfd_status: {diagnostic="Control Detection Time Expired", flap_count="2", forwarding="false", remote_diagnostic="No Diagnostic", remote_state=down, state=down}
Bridge br-link1
fail_mode: standalone
Port bond0
Interface enp6s0f0
Interface enp6s0f1
Port br-link1
Interface br-link1
type: internal
Bridge br-link0
fail_mode: standalone
Port br-link0
Interface br-link0
type: internal
Port mx-bond
Interface mx-bond
ovs_version: "2.15.1"
[root@computeovshwoffload-0 ~]#
[root@computeovshwoffload-0 ~]# ovs-vsctl list Open_vSwitch
_uuid : edb712f9-21dd-4816-b0ab-e185994d2312
bridges : [087d9b34-53da-42b1-9052-50dc7819cf0b, 1d9b091c-c8e8-4d50-8ee5-07b51d9f5b53, 8d8e1d31-df9a-4c03-9876-14de0828c795]
cur_cfg : 69
datapath_types : [netdev, system]
datapaths : {}
db_version : "8.2.0"
dpdk_initialized : false
dpdk_version : "DPDK 20.11.1"
external_ids : {hostname=computeovshwoffload-0.redhat.local, ovn-bridge=br-int, ovn-bridge-mappings="mx-network:br-link0,mgmt:br-link1", ovn-encap-ip="10.10.161.101", ovn-encap-type=geneve, ovn-openflow-probe-interval="60", ovn-remote="tcp:10.10.160.115:6642", ovn-remote-probe-interval="60000", rundir="/var/run/openvswitch", system-id="ff44434d-03f6-4cd2-9d57-94dfa476ca32"}
iface_types : [bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options : [fb560279-de47-49d8-9883-bce497d67e1e]
next_cfg : 69
other_config : {hw-offload="true"}
ovs_version : "2.15.1"
ssl : []
statistics : {}
system_type : rhel
system_version : "8.4"
[root@computeovshwoffload-0 ~]# ovs-vsctl list Open_vSwitch
_uuid : edb712f9-21dd-4816-b0ab-e185994d2312
bridges : [087d9b34-53da-42b1-9052-50dc7819cf0b, 1d9b091c-c8e8-4d50-8ee5-07b51d9f5b53, 8d8e1d31-df9a-4c03-9876-14de0828c795]
cur_cfg : 69
datapath_types : [netdev, system]
datapaths : {}
db_version : "8.2.0"
dpdk_initialized : false
dpdk_version : "DPDK 20.11.1"
external_ids : {hostname=computeovshwoffload-0.redhat.local, ovn-bridge=br-int, ovn-bridge-mappings="mx-network:br-link0,mgmt:br-link1", ovn-encap-ip="10.10.161.101", ovn-encap-type=geneve, ovn-openflow-probe-interval="60", ovn-remote="tcp:10.10.160.115:6642", ovn-remote-probe-interval="60000", rundir="/var/run/openvswitch", system-id="ff44434d-03f6-4cd2-9d57-94dfa476ca32"}
iface_types : [bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options : [fb560279-de47-49d8-9883-bce497d67e1e]
next_cfg : 69
other_config : {hw-offload="true"}
ovs_version : "2.15.1"
ssl : []
statistics : {}
system_type : rhel
system_version : "8.4"
[root@computeovshwoffload-0 ~]# ovs-vsctl list Bridge
_uuid : 1d9b091c-c8e8-4d50-8ee5-07b51d9f5b53
auto_attach : []
controller : []
datapath_id : "00003cfdfe33a5c0"
datapath_type : ""
datapath_version : "<unknown>"
external_ids : {}
fail_mode : standalone
flood_vlans : []
flow_tables : {}
ipfix : []
mcast_snooping_enable: false
mirrors : []
name : br-link1
netflow : []
other_config : {}
ports : [74c7db58-b96b-4ae1-997a-4b4fe129eca4, d07bb466-f10a-4d4f-a30d-4bbc47d65f6b]
protocols : []
rstp_enable : false
rstp_status : {}
sflow : []
status : {}
stp_enable : false
_uuid : 087d9b34-53da-42b1-9052-50dc7819cf0b
auto_attach : []
controller : []
datapath_id : "0000ea39dbd27b00"
datapath_type : system
datapath_version : "<unknown>"
external_ids : {ct-zone-130d9819-2725-4acb-bd71-08bcc4627bb5_dnat="3", ct-zone-130d9819-2725-4acb-bd71-08bcc4627bb5_snat="2", ct-zone-21fe1f0f-c5c4-4fec-96b6-f75f0d6d84ca="1", ct-zone-3f31490f-4a85-40b4-a55c-e404afe37a14_dnat="9", ct-zone-3f31490f-4a85-40b4-a55c-e404afe37a14_snat="8", ct-zone-5816989a-7d23-4baf-bcad-0f19d7556a0b="4", ct-zone-760c2c71-7f41-487b-ae42-97c5a8e68dbf="5", ct-zone-bacb5372-84c4-442d-8af9-6fe912043908_dnat="6", ct-zone-bacb5372-84c4-442d-8af9-6fe912043908_snat="7", ct-zone-provnet-9944dd33-b0ff-41d3-b1dc-503febf31976="10", ovn-nb-cfg="332"}
fail_mode : secure
flood_vlans : []
flow_tables : {}
ipfix : []
mcast_snooping_enable: false
mirrors : []
name : br-int
netflow : []
other_config : {disable-in-band="true", hwaddr="ea:39:db:d2:7b:00"}
ports : [1fd0ad86-4fc5-4d7f-ba75-a7d83b5ec2cc, 47b547c4-e7ac-40bf-b0c5-3850cbfd3c43, 4f0ab7f9-0757-4bb1-919d-a447eccf03ae, 6e22db8e-de6f-448a-800b-27b7cac5cb9d, cf999796-32d0-4758-b5a7-92c1a9d342ae, d01b3af0-e70a-4c5e-933f-e06e1877a053, dfa292bb-6a5b-4f2a-af64-1b9b22d970a2, f0a86926-91bf-48a6-89ac-7a2fc84ed099]
protocols : []
rstp_enable : false
rstp_status : {}
sflow : []
status : {}
stp_enable : false
_uuid : 8d8e1d31-df9a-4c03-9876-14de0828c795
auto_attach : []
controller : []
datapath_id : "0000043f72b8bb5e"
datapath_type : ""
datapath_version : "<unknown>"
external_ids : {}
fail_mode : standalone
flood_vlans : []
flow_tables : {}
ipfix : []
mcast_snooping_enable: false
mirrors : []
name : br-link0
netflow : []
other_config : {}
ports : [46682d89-4531-4101-be78-00e743e37230, 65e8a46a-d494-486a-b1e1-a95613b46d11]
protocols : []
rstp_enable : false
rstp_status : {}
sflow : []
status : {}
stp_enable : false
[root@computeovshwoffload-0 ~]# ovs-vsctl list Interface
_uuid : f5e85f91-dd6b-4889-90ab-13db7dc572e1
admin_state : up
bfd : {enable="true"}
bfd_status : {diagnostic="Control Detection Time Expired", flap_count="2", forwarding="false", remote_diagnostic="No Diagnostic", remote_state=down, state=down}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {}
ifindex : 68
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : up
lldp : {}
mac : []
mac_in_use : "7a:1d:c7:73:af:9c"
mtu : []
mtu_request : []
name : ovn-ccf1be-0
ofport : 4
ofport_request : []
options : {csum="true", key=flow, remote_ip="10.10.161.125"}
other_config : {}
statistics : {rx_bytes=445455, rx_packets=6513, tx_bytes=8671086, tx_packets=131274}
status : {tunnel_egress_iface=vlan161, tunnel_egress_iface_carrier=up}
type : geneve
_uuid : 53cfde91-9e82-476c-8179-159e39ed810f
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {}
ifindex : 62
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 1
link_speed : []
link_state : up
lldp : {}
mac : []
mac_in_use : "3c:fd:fe:33:a5:c0"
mtu : 9000
mtu_request : []
name : br-link1
ofport : 65534
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=735902, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=15801, tx_bytes=3386, tx_dropped=0, tx_errors=0, tx_packets=47}
status : {driver_name=openvswitch}
type : internal
_uuid : ebc8ded2-c187-4287-94cc-451519dd07bf
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : full
error : []
external_ids : {iface-id="5816989a-7d23-4baf-bcad-0f19d7556a0b"}
ifindex : 87
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : 10000000000
link_state : up
lldp : {}
mac : []
mac_in_use : "26:24:bc:47:5f:ca"
mtu : 1500
mtu_request : []
name : tap130d9819-20
ofport : 28
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=23784, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=207, tx_bytes=52633, tx_dropped=0, tx_errors=0, tx_packets=740}
status : {driver_name=veth, driver_version="1.0", firmware_version=""}
type : ""
_uuid : 5ab9c9e3-0efa-4e59-a404-27bf20fdfeb2
admin_state : up
bfd : {enable="true"}
bfd_status : {diagnostic="Control Detection Time Expired", flap_count="2", forwarding="false", remote_diagnostic="No Diagnostic", remote_state=down, state=down}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {}
ifindex : 68
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : up
lldp : {}
mac : []
mac_in_use : "b6:2a:c4:15:7c:9d"
mtu : []
mtu_request : []
name : ovn-1c2cb7-0
ofport : 3
ofport_request : []
options : {csum="true", key=flow, remote_ip="10.10.161.135"}
other_config : {}
statistics : {rx_bytes=409134, rx_packets=6199, tx_bytes=8647254, tx_packets=131019}
status : {tunnel_egress_iface=vlan161, tunnel_egress_iface_carrier=up}
type : geneve
_uuid : 869157d3-e7c3-4418-8c8d-5f65f9ca83d5
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : full
error : []
external_ids : {}
ifindex : 18
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : 10000000000
link_state : up
lldp : {}
mac : []
mac_in_use : "04:3f:72:b8:bb:5e"
mtu : 9000
mtu_request : []
name : mx-bond
ofport : 1
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=80336928, rx_crc_err=0, rx_dropped=4497, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=1185360, tx_bytes=21290897, tx_dropped=0, tx_errors=0, tx_packets=311882}
status : {driver_name=bonding, driver_version="4.18.0-305.3.1.el8_4.x86_64", firmware_version="2"}
type : ""
_uuid : 012981dc-10fd-4d48-ae20-24c22c7ba9e2
admin_state : down
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {}
ifindex : 67
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : down
lldp : {}
mac : []
mac_in_use : "ea:39:db:d2:7b:00"
mtu : 1500
mtu_request : []
name : br-int
ofport : 65534
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0}
status : {driver_name=openvswitch}
type : internal
_uuid : 3526b34c-a22a-4c93-89ba-e5c4f09002e3
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {}
ifindex : 68
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : up
lldp : {}
mac : []
mac_in_use : "7e:ae:de:54:55:90"
mtu : []
mtu_request : []
name : ovn-3ea0c7-0
ofport : 1
ofport_request : []
options : {csum="true", key=flow, remote_ip="10.10.161.129"}
other_config : {}
statistics : {rx_bytes=5140, rx_packets=68, tx_bytes=4912, tx_packets=94}
status : {tunnel_egress_iface=vlan161, tunnel_egress_iface_carrier=up}
type : geneve
_uuid : fc67aa9c-f166-4bbb-937d-d2b1889e8520
admin_state : down
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {}
ifindex : 61
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 2
link_speed : []
link_state : down
lldp : {}
mac : []
mac_in_use : "04:3f:72:b8:bb:5e"
mtu : 9000
mtu_request : []
name : br-link0
ofport : 65534
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=15847, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0}
status : {driver_name=openvswitch}
type : internal
_uuid : 276bfcb5-1a91-404a-b5aa-7efe4c58926a
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {attached-mac="fa:16:3e:ae:f2:b1", iface-id="760c2c71-7f41-487b-ae42-97c5a8e68dbf", iface-status=active, ovn-installed="true", vm-uuid="fd5d78d2-dce2-4790-b446-f5a0a4e34249"}
ifindex : 30
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : up
lldp : {}
mac : []
mac_in_use : "da:cc:cd:60:8a:e9"
mtu : 8942
mtu_request : []
name : enp4s0f0_0
ofport : 29
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=133508, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=1679, tx_bytes=69898, tx_dropped=0, tx_errors=0, tx_packets=889}
status : {driver_name=mlx5e_rep, driver_version="4.18.0-305.3.1.el8_4.x86_64", firmware_version="16.27.6120 (DEL0000000015)"}
type : ""
_uuid : 38dd099e-9c69-4910-9add-86636404c528
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : full
error : []
external_ids : {}
ifindex : 7
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : 10000000000
link_state : up
lldp : {}
mac : []
mac_in_use : "3c:fd:fe:33:a5:c2"
mtu : 9000
mtu_request : []
name : enp6s0f1
ofport : 2
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=43101385, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=718156, tx_bytes=12603, tx_dropped=0, tx_errors=0, tx_packets=106}
status : {driver_name=i40e, driver_version="4.18.0-305.3.1.el8_4.x86_64", firmware_version="5.40 0x80002d36 18.0.17"}
type : ""
_uuid : f34e95c6-4d67-4a91-ae19-e4945249fd24
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {attached-mac="fa:16:3e:35:1c:81", iface-id="21fe1f0f-c5c4-4fec-96b6-f75f0d6d84ca", iface-status=active, ovn-installed="true", vm-uuid="c0d00331-9ee5-4242-8494-f6ae60442d16"}
ifindex : 38
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : up
lldp : {}
mac : []
mac_in_use : "0a:19:98:0d:ff:c6"
mtu : 8942
mtu_request : []
name : enp4s0f0_8
ofport : 27
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=144624, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=1871, tx_bytes=80485, tx_dropped=0, tx_errors=0, tx_packets=1028}
status : {driver_name=mlx5e_rep, driver_version="4.18.0-305.3.1.el8_4.x86_64", firmware_version="16.27.6120 (DEL0000000015)"}
type : ""
_uuid : 77fa407e-1b70-4f9f-8b7c-a0fbe207a433
admin_state : up
bfd : {enable="true"}
bfd_status : {diagnostic="Control Detection Time Expired", flap_count="2", forwarding="false", remote_diagnostic="No Diagnostic", remote_state=down, state=down}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : []
error : []
external_ids : {}
ifindex : 68
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : []
link_state : up
lldp : {}
mac : []
mac_in_use : "02:2e:4b:89:e5:50"
mtu : []
mtu_request : []
name : ovn-f5aa6c-0
ofport : 2
ofport_request : []
options : {csum="true", key=flow, remote_ip="10.10.161.127"}
other_config : {}
statistics : {rx_bytes=499044, rx_packets=6693, tx_bytes=8728759, tx_packets=131496}
status : {tunnel_egress_iface=vlan161, tunnel_egress_iface_carrier=up}
type : geneve
_uuid : 32e14c57-950b-431e-94ba-03be0cd3e80b
admin_state : up
bfd : {}
bfd_status : {}
cfm_fault : []
cfm_fault_status : []
cfm_flap_count : []
cfm_health : []
cfm_mpid : []
cfm_remote_mpids : []
cfm_remote_opstate : []
duplex : full
error : []
external_ids : {}
ifindex : 6
ingress_policing_burst: 0
ingress_policing_rate: 0
lacp_current : []
link_resets : 0
link_speed : 10000000000
link_state : up
lldp : {}
mac : []
mac_in_use : "3c:fd:fe:33:a5:c0"
mtu : 9000
mtu_request : []
name : enp6s0f0
ofport : 1
ofport_request : []
options : {}
other_config : {}
statistics : {collisions=0, rx_bytes=43102825, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=718180, tx_bytes=15425, tx_dropped=0, tx_errors=0, tx_packets=147}
status : {driver_name=i40e, driver_version="4.18.0-305.3.1.el8_4.x86_64", firmware_version="5.40 0x80002d36 18.0.17"}
type : ""
[root@computeovshwoffload-0 ~]#
Can you make sure that the system has the tc utilities installed? For example, the sosreport doesn't
contain any of the 'tc' commands I would expect ('tc -s filter show {devname} ingress', etc) - maybe
you can capture.
I'm not sure about that errno - ENOENT - usually I think it implies a generic error installing the
flow along the hw datapath. I would expect if too many flows got offloaded, we would see ENOSPC, and
if the flow wasn't supported we would see something like EOPNOTSUPP or similar.
Maybe mleitner can see something I don't.
(In reply to Aaron Conole from comment #17) > I'm not sure about that errno - ENOENT - usually I think it implies a > generic error installing the > flow along the hw datapath. I would expect if too many flows got offloaded, > we would see ENOSPC, and > if the flow wasn't supported we would see something like EOPNOTSUPP or > similar. > > Maybe mleitner can see something I don't. While we don't have https://bugzilla.redhat.com/show_bug.cgi?id=1916418 We can use a perf probe on https://github.com/torvalds/linux/commit/7e3ce05e7f650371061d0b9eec1e1cf74ed6fca0 to find exactly where and why this error was returned. Btw, interesting how the 1st packet gets through, and then others don't. That pretty much means the upcall handles it, updates the datapath and then things get broken somehow. But if the filter failed to be added in tc, it should have added in dp:ovs. Weird. BUG not reproduced with the following paddle, Same test is failing with this puddle RHOS-16.2-RHEL-8-20210811.n.1 Checking if issue persist, will update (In reply to Yariv from comment #23) > BUG not reproduced with the following paddle, > > Same test is failing with this puddle > RHOS-16.2-RHEL-8-20210811.n.1 > > Checking if issue persist, will update The problem still persist RHOS-16.2-RHEL-8-20210811.n.1 (overcloud) [stack@undercloud-0 ~]$ openstack server list --all --host computeovshwoffload-0.redhat.local +--------------------------------------+------------------------------------------+--------+---------------------------------------------------------------------------------------------+---------------------------------------+--------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+------------------------------------------+--------+---------------------------------------------------------------------------------------------+---------------------------------------+--------+ | 02f9e32d-7078-4f7b-869f-90b75f70dc56 | tempest-TestNfvOffload-server-1658772737 | ACTIVE | mellanox-geneve-provider=20.20.220.192, 10.35.141.167; mellanox-vlan-provider=30.30.220.182 | rhel-guest-image-7-6-210-x86-64-qcow2 | | | 8935550e-35d2-4177-a845-641e2a305c6e | tempest-TestNfvOffload-server-530477859 | ACTIVE | mellanox-geneve-provider=20.20.220.122, 10.35.141.172; mellanox-vlan-provider=30.30.220.125 | rhel-guest-image-7-6-210-x86-64-qcow2 | | +--------------------------------------+------------------------------------------+--------+---------------------------------------------------------------------------------------------+---------------------------------------+--------+ (overcloud) [stack@undercloud-0 ~]$ ping 10.35.141.167 PING 10.35.141.167 (10.35.141.167) 56(84) bytes of data. 64 bytes from 10.35.141.167: icmp_seq=1 ttl=61 time=17.9 ms ^C --- 10.35.141.167 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 17.891/17.891/17.891/0.000 ms (overcloud) [stack@undercloud-0 ~]$ ping 10.35.141.172 PING 10.35.141.172 (10.35.141.172) 56(84) bytes of data. @mleitner, would you like to look at the machines? (In reply to Yariv from comment #24) > @mleitner, would you like to look at the machines? Not really. :-) Haresh can debug OSP better than I do. I'm here if anything, though. Restoring need-info that I cleared by mistake. I have been debugging the issue together with Haresh and I think we have seen the root cause.
The problem happens when there are 2 vms in the same compute connected to the same provider network. There is an issue with the flow programming and packets go to the wrong vm, so that ping fails.
It is not related with floating ip, ping fails between 2 ips in the same provider network. If that ip is used for floating ip, then floating ip will fail too.
If there is a single vm per compute there is no issue.
Here I provide an example:
I create 4 vms (2 in each compute). Vms does not have floating ip, i will use console:
(venv) (overcloud) [stack@undercloud-0 ~]$ openstack server list --a
+--------------------------------------+------------------------------------------+--------+------------------------------------------------------------------------------+---------------------------------------+--------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+------------------------------------------+--------+------------------------------------------------------------------------------+---------------------------------------+--------+
| 048a6bcd-5ada-4127-b95a-9962cf31e80f | tempest-TestNfvOffload-server-1590230603 | ACTIVE | mellanox-geneve-provider=20.20.220.112; mellanox-vlan-provider=30.30.220.147 | rhel-guest-image-7-6-210-x86-64-qcow2 | |
| f71166db-2a49-4d73-aa73-4573b3bf8db5 | tempest-TestNfvOffload-server-741569861 | ACTIVE | mellanox-geneve-provider=20.20.220.106; mellanox-vlan-provider=30.30.220.132 | rhel-guest-image-7-6-210-x86-64-qcow2 | |
| c71d4bdf-4aa7-4b69-8405-6633c5864a7d | tempest-TestNfvOffload-server-922847220 | ACTIVE | mellanox-geneve-provider=20.20.220.196; mellanox-vlan-provider=30.30.220.172 | rhel-guest-image-7-6-210-x86-64-qcow2 | |
| 1ac387cf-31cf-4cf6-97b4-acb3034cca31 | tempest-TestNfvOffload-server-1803927283 | ACTIVE | mellanox-geneve-provider=20.20.220.188; mellanox-vlan-provider=30.30.220.175 | rhel-guest-image-7-6-210-x86-64-qcow2 | |
+--------------------------------------+------------------------------------------+--------+------------------------------------------------------------------------------+---------------------------------------+
These are the ports and macs:
(venv) (overcloud) [stack@undercloud-0 ~]$ openstack port list | egrep "220.112|220.106|220.196|220.188|220.147|220.132|220.172|220.175"
| 54397464-6866-40dd-98cd-8c3e1f48e018 | tempest-port-smoke-1207250115 | fa:16:3e:40:bf:a6 | ip_address='30.30.220.132', subnet_id='82969e14-51b1-4993-90d2-0269dd0bdf8d' | ACTIVE |
| 5f893749-f4ba-4230-b46a-9a1491b0bc6c | tempest-port-smoke-331751474 | fa:16:3e:05:48:82 | ip_address='20.20.220.106', subnet_id='8aa962a5-354f-4406-8fa1-baa73ff14f2b' | ACTIVE |
| 7052864e-8624-493d-b23c-6b8ecbf54d8f | tempest-port-smoke-1880359353 | fa:16:3e:2d:1c:71 | ip_address='30.30.220.175', subnet_id='82969e14-51b1-4993-90d2-0269dd0bdf8d' | ACTIVE |
| 883f8950-5930-4549-a5e8-c6a8d44cf9c8 | tempest-port-smoke-1206809988 | fa:16:3e:e7:e1:1e | ip_address='20.20.220.112', subnet_id='8aa962a5-354f-4406-8fa1-baa73ff14f2b' | ACTIVE |
| 92c7fd6f-929c-4f2a-b378-6ac69946ffc8 | tempest-port-smoke-441515917 | fa:16:3e:b6:fa:2c | ip_address='20.20.220.188', subnet_id='8aa962a5-354f-4406-8fa1-baa73ff14f2b' | ACTIVE |
| a31551fc-bc18-4c11-971f-9acc6e12d51c | tempest-port-smoke-1291037828 | fa:16:3e:4c:0c:21 | ip_address='30.30.220.147', subnet_id='82969e14-51b1-4993-90d2-0269dd0bdf8d' | ACTIVE |
| ec05e264-e102-4c00-a04c-9873d1c7c9b9 | tempest-port-smoke-612065238 | fa:16:3e:52:ed:2a | ip_address='30.30.220.172', subnet_id='82969e14-51b1-4993-90d2-0269dd0bdf8d' | ACTIVE |
| ffbae10d-2440-4af7-8a29-7da30d537f59 | tempest-port-smoke-1068018430 | fa:16:3e:37:96:8b | ip_address='20.20.220.196', subnet_id='8aa962a5-354f-4406-8fa1-baa73ff14f2b' | ACTIVE |
These are representor ports used:
hypervisor 192.0.50.18
29: enp4s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master mx-bond state UP mode DEFAULT group default qlen 1000
link/ether 98:03:9b:9c:50:58 brd ff:ff:ff:ff:ff:ff
vf 8 link/ether fa:16:3e:37:96:8b brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 9 link/ether fa:16:3e:05:48:82 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
50: enp4s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master mx-bond state UP mode DEFAULT group default qlen 1000
link/ether 98:03:9b:9c:50:58 brd ff:ff:ff:ff:ff:ff permaddr 98:03:9b:9c:50:59
vf 8 link/ether fa:16:3e:52:ed:2a brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 9 link/ether fa:16:3e:40:bf:a6 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
hypervisor 192.0.50.11
29: enp4s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master mx-bond state UP mode DEFAULT group default qlen 1000
link/ether ec:0d:9a:7d:7d:32 brd ff:ff:ff:ff:ff:ff
vf 8 link/ether fa:16:3e:b6:fa:2c brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 9 link/ether fa:16:3e:e7:e1:1e brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
50: enp4s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master mx-bond state UP mode DEFAULT group default qlen 1000
link/ether ec:0d:9a:7d:7d:32 brd ff:ff:ff:ff:ff:ff permaddr ec:0d:9a:7d:7d:33
vf 8 link/ether fa:16:3e:2d:1c:71 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 9 link/ether fa:16:3e:4c:0c:21 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
Test: ping from 20.20.220.188 to 20.20.220.196 fails. If we execute tcpdump in 20.20.220.106 while executing ping, we can see how packets are comming, but they do not go to 20.20.220.196, so ping fails. Below are the flows, and we can see that it is being delivered to enp4s0f0_9 instead of enp4s0f0_8
flows:
Hypervisor 192.0.50.11
ufid:bb8da3b2-727e-415b-b55c-fb1f19310940, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(enp4s0f0_8),packet_type(ns=0/0,id=0/0),eth(src=$a:16:3e:b6:fa:2c,dst=fa:16:3e:37:96:8b),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=20.20.220.192/255.255.255.224,proto=0/0,tos=0/0x3,ttl=0/0,frag=no), packets:22, bytes:3520, used:0.360s, offloaded:yes, dp:t$, actions:set(tunnel(tun_id=0x3,dst=10.10.121.176,ttl=64,tp_dst=6081,key6(bad key length 1, expected 0)(01)geneve({class=0x102,type=0x80,len=4,0x20003}),flags(key))),genev_sys_6081
Hypervisor 192.0.50.18
ufid:734179ef-b530-42f8-aed6-e41ba813d5e7, skb_priority(0/0),tunnel(tun_id=0x3,src=10.10.121.146,dst=10.10.121.176,ttl=0/0,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x20004/0x7fffffff}),flags(+key)),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(genev_sys_6081),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:b6:fa:2c,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no), packets:68, bytes:6664, used:0.290s, offloaded:yes, dp:tc, actions:enp4s0f0_9
ufid:6ecf939a-0d5c-4c69-a396-5bbc424e5cfb, skb_priority(0/0),tunnel(tun_id=0x3,src=10.10.121.146,dst=10.10.121.176,ttl=0/0,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x20004/0x7fffffff}),flags(+key)),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(genev_sys_6081),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:b6:fa:2c,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0806),arp(sip=0.0.0.0/0.0.0.0,tip=0.0.0.0/0.0.0.0,op=0/0,sha=00:00:00:00:00:00/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), packets:0, bytes:0, used:1.310s, offloaded:yes, dp:tc, actions:enp4s0f0_9
The issue can be related to this fix.
This fix solves an issue with HW offload of different geneve tunnel with the same tunnel src/dst ip, id and port but different geneve options.
commit 929a2faddd55290fbb0b73f453b200ed1b2b2947
Author: Dima Chumak <dchumak>
Date: Thu Feb 11 09:36:33 2021 +0200
net/mlx5e: Consider geneve_opts for encap contexts
Current algorithm for encap keys is legacy from initial vxlan
implementation and doesn't take into account all possible fields of a
tunnel. For example, for a Geneve tunnel, which may have additional TLV
options, they are ignored when comparing encap keys and a rule can be
attached to an incorrect encap entry.
Fix that by introducing encap_info_equal() operation in
struct mlx5e_tc_tunnel. Geneve tunnel type uses custom implementation,
which extends generic algorithm and considers options if they are set.
Fixes: 7f1a546e3222 ("net/mlx5e: Consider tunnel type for encap contexts")
Signed-off-by: Dima Chumak <dchumak>
Reviewed-by: Vlad Buslov <vladbu>
Signed-off-by: Saeed Mahameed <saeedm>
Thanks Maor, Hi Amir, Is it possible to get 4.18 kernel patch with fix? We can try out and if it fixes the issue, can request for backport. Thanks (In reply to Haresh Khandelwal from comment #56) > Thanks Maor, > Hi Amir, Is it possible to get 4.18 kernel patch with fix? We can try out > and if it fixes the issue, can request for backport. > > Thanks Hi, we have this fix already in kernel-4.18.0-324.el8 and above from RHEL 8.5 branch. Is it enough for your testing ? Thanks Amir, So, shall i assume that commit#929a2faddd55290fbb0b73f453b200ed1b2b2947 would fix this issue? RHOSP 16.2.x would be shipped with RHEL 8.4 throughout life. The latest compose has kernel version 4.18.0-305.19.1.el8_4. I am not aware of how RHEL picks kernel version or if next 8.4z would have the fix kernel. If not, then we need to backport it. Marcelo, can you help here? Thanks 8.4.z kernels will always be 4.18.0-305.*.el8_4. With that, yes, we would need to backport the fix to 8.4.z so that RHOSP can have it. In theory we would need two tests here: - one with y-stream/8.5 kernel, to be sure the issue is fixed in y-stream we don't want regressions for customers updating from 8.4.z to 8.5 or 8.6 later on. - one with a test kernel on 8.4.z, to be sure that the fix is complete and no dependencies were missed we don't want to backport something that later on we find out "ooops, missed this other commit". We can skip one of them if there's enough confidence, though. I think the patch is spot on. If Nvidia agrees, we can proceed with just the 2nd test, with a test kernel for 8.4.z. Hi Marcelo, (In reply to Marcelo Ricardo Leitner from comment #59) > 8.4.z kernels will always be 4.18.0-305.*.el8_4. With that, yes, we would > need to backport the fix to 8.4.z so that RHOSP can have it. Good, Thanks > > In theory we would need two tests here: > - one with y-stream/8.5 kernel, to be sure the issue is fixed in y-stream > we don't want regressions for customers updating from 8.4.z to 8.5 or 8.6 > later on. RHOSP has no plan to use RHEL 8.5 ever, RHOSP17 will be based on RHEL 9. > > - one with a test kernel on 8.4.z, to be sure that the fix is complete and > no dependencies were missed > we don't want to backport something that later on we find out "ooops, > missed this other commit". > > We can skip one of them if there's enough confidence, though. > I think the patch is spot on. If Nvidia agrees, we can proceed with just the > 2nd test, with a test kernel for 8.4.z. Yes, this Bz was found in our CI, so it would be easy to validate quickly if we have fix. Thanks (In reply to maord from comment #55) > The issue can be related to this fix. > This fix solves an issue with HW offload of different geneve tunnel with the > same tunnel src/dst ip, id and port but different geneve options. > > commit 929a2faddd55290fbb0b73f453b200ed1b2b2947 > Author: Dima Chumak <dchumak> > Date: Thu Feb 11 09:36:33 2021 +0200 > > net/mlx5e: Consider geneve_opts for encap contexts For the record, this bz was originally backported via https://bugzilla.redhat.com/show_bug.cgi?id=1915308 and that's where the 8.4.z will need to be requested, once the test confirms it. @atzin any news on the test NVIDIA build? (In reply to Karrar Fida from comment #63) > @atzin any news on the test NVIDIA build? https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=40713341 test kernel of RHEL-8.4 with 929a2faddd55 ("net/mlx5e: Consider geneve_opts for encap contexts") We think this is a test-only for the NFV team. Please switch it to our DFG if you think we are wrong! Hi, I have tested with that patch and the problem is not solved. I have installed the patch: (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.50.20 "uname -a" Warning: Permanently added '192.0.50.20' (ECDSA) to the list of known hosts. Linux computeovshwoffload-0 4.18.0-305.26.1.el8_4.UNSUPPORTED_1966157.x86_64 #1 SMP Sun Oct 31 06:28:11 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.50.22 "uname -a" Warning: Permanently added '192.0.50.22' (ECDSA) to the list of known hosts. Linux computeovshwoffload-1 4.18.0-305.26.1.el8_4.UNSUPPORTED_1966157.x86_64 #1 SMP Sun Oct 31 06:28:11 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux I have check that I continue having problems with the ping (overcloud) [stack@undercloud-0 ~]$ ping -c 1 -w 1 10.35.141.53;sleep 12; ping -c 1 -w 1 10.35.141.53 PING 10.35.141.53 (10.35.141.53) 56(84) bytes of data. --- 10.35.141.53 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms PING 10.35.141.53 (10.35.141.53) 56(84) bytes of data. 64 bytes from 10.35.141.53: icmp_seq=1 ttl=61 time=10.0 ms --- 10.35.141.53 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 10.013/10.013/10.013/0.000 ms Hi Miguel, If the test was positive, we would have skipped this test, but then: Please try the latest 8.5 kernel as well. Maybe something went sour in the backport to 8.4.z, missed a patch dependency or so. You can download it from here: http://download.eng.bos.redhat.com/brewroot/packages/kernel/4.18.0/348.4.el8/ Thanks. Hi I tried with previous patch and it is not solving the problem either. Apart from installing rpms and rebooting computes, should I do anything else to ensure the that patch is installed properly? With uname -a I can see that I have the correct kernel version, is it enough? (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.50.20 "uname -a" Warning: Permanently added '192.0.50.20' (ECDSA) to the list of known hosts. Linux computeovshwoffload-0 4.18.0-348.4.el8.x86_64 #1 SMP Mon Oct 25 15:08:07 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.50.22 "uname -a" Warning: Permanently added '192.0.50.22' (ECDSA) to the list of known hosts. Linux computeovshwoffload-1 4.18.0-348.4.el8.x86_64 #1 SMP Mon Oct 25 15:08:07 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux (undercloud) [stack@undercloud-0 ~]$ ping -c 1 -w 1 10.35.141.51;sleep 12; ping -c 1 -w 1 10.35.141.51 PING 10.35.141.51 (10.35.141.51) 56(84) bytes of data. --- 10.35.141.51 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms PING 10.35.141.51 (10.35.141.51) 56(84) bytes of data. 64 bytes from 10.35.141.51: icmp_seq=1 ttl=61 time=2.48 ms --- 10.35.141.51 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 2.475/2.475/2.475/0.000 ms (In reply to Miguel Angel Nieto from comment #70) > I tried with previous patch and it is not solving the problem either. Thanks. That's a very important piece of information. > > Apart from installing rpms and rebooting computes, should I do anything else > to ensure the that patch is installed properly? With uname -a I can see > that I have the correct kernel version, is it enough? It is enough, yes. https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=40903423 test kernel of RHEL-8.4 with 929a2faddd55 ("net/mlx5e: Consider geneve_opts for encap contexts") I think that the build of comment 64 did not eventually contained the fix due to my mistake. Hi Folks, The fix that Maor suggested above actually solves a problem on the encap side, where it will not respect different encap header that have different geneve options. Therefore the effects of this bug may be observed only on the receiver side which does the matching and classification on the geneve options. So a few questions: 1. Did u install and test the fix also on the client/traffic sender side? 2. If what I mentioned is true, you should see this behavior also with HW offload turned off (As well as see the same geneve options for all traffic on tcpdump). Can you please confirm that you verified that without HW offload it works? 3. We need to understand if this issue is already resolved upstream or is it something we should repro and debug properly in house. Can you please confirm for us? Thanks, Ariel Hi I answer the questions: 1. Yes, I patched overcloud image during deployment with the new kernel, so the kernel updates were everywhere. The issue is happening in all compute nodes if I have in that compute node 2 or more vms attached to the same geneve network. 2. When I tested the kernel patch I only tested with offload enabled, but from previous tests I did I can confirm that the issue only happens with hw offload. There is no issue if offload is disabled. 3. I will try to get more information about this point Regards Miguel (In reply to Miguel Angel Nieto from comment #74) > 3. I will try to get more information about this point You can use an ARK kernel for that, btw. It's the kernel-* packages at https://odcs.fedoraproject.org/composes/production/latest-Fedora-ELN/compose/BaseOS/x86_64/os/Packages/ They should be fresh enough for this test, and they install nicely o RHEL 8. Thanks for the repo. I have tried today, I didnt have any issue to update kernel packages but the servers are not booting properly after updating the kernel, I would say some services are not working properly, ssh is broken. I will need some more time to see what is happening. Hi Folks, any update on the upstream testing here? sorry for the delay. I will tested it between thursday and friday. Hi
I tested with upstream kernel and I didnt found the issue, I think it is working properly
Linux computeovshwoffload-0 5.15.0-60.eln113.x86_64 #1 SMP Mon Nov 1 16:50:20 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
I have ping to both vms at the same time and I have got the flows:
VMS:
(overcloud) [stack@undercloud-0 ~]$ openstack server list --a | egrep "141.58|141.55"
| e378517a-66de-4363-882c-6a7b11035f24 | tempest-TestNfvOffload-server-2033717592 | ACTIVE | mellanox-geneve-provider=20.20.220.180, 10.35.141.58; mellanox-vlan-provider=30.30.220.116 | rhel-guest-image-7-6-210-x86-64-qcow2 | |
| d075c648-197f-460a-a88f-5f19933447e3 | tempest-TestNfvOffload-server-1328589616 | ACTIVE | mellanox-geneve-provider=20.20.220.171, 10.35.141.55; mellanox-vlan-provider=30.30.220.162 | rhel-guest-image-7-6-210-x86-64-qcow2 | |
PORTS
(overcloud) [stack@undercloud-0 ~]$ openstack port list | egrep "180|171"
| 89979eb3-b59d-4c6b-b81c-5264aecd60c8 | tempest-port-smoke-988384003 | fa:16:3e:03:8d:eb | ip_address='20.20.220.180', subnet_id='b25f99f4-5441-4df1-ab99-b3d1c5885042' | ACTIVE |
| 89d9df81-3bea-412d-a71b-663df1ed0ce7 | tempest-port-smoke-297925914 | fa:16:3e:de:26:4c | ip_address='20.20.220.171', subnet_id='b25f99f4-5441-4df1-ab99-b3d1c5885042' | ACTIVE |
VFS:
11: enp4s0f0: <BROADCAST,MULTICAST,PROMISC,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master mx-bond state UP mode DEFAULT group default qlen 1000
link/ether 98:03:9b:9c:50:58 brd ff:ff:ff:ff:ff:ff
vf 2 link/ether fa:16:3e:03:8d:eb brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
vf 9 link/ether fa:16:3e:de:26:4c brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state disable, trust off, query_rss off
FLOWS
ufid:951f5f75-93a2-482b-a090-9d28a84cf28e, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x21),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(enp4s0f0_9),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:de:26:4c,dst=fa:16:3e:96:d9:20),eth_type(0x0800),ipv4(src=20.20.220.128/255.255.255.192,dst=10.35.0.0/255.255.128.0,proto=1,tos=0/0x3,ttl=64,frag=no),icmp(type=0/0,code=0/0), packets:372, bytes:59520, used:0.850s, offloaded:yes, dp:tc, actions:set(tunnel(tun_id=0x2,dst=10.10.121.103,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x30002}),flags(csum|key))),set(eth(src=fa:16:3e:74:20:44,dst=9c:cc:83:58:1c:60)),set(ipv4(ttl=63)),genev_sys_6081
ufid:0951ac83-ff60-4464-a216-5f52fde3307f, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x21),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(enp4s0f0_9),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:de:26:4c,dst=fa:16:3e:96:d9:20),eth_type(0x0800),ipv4(src=20.20.220.128/255.255.255.192,dst=32.0.0.0/224.0.0.0,proto=17,tos=0/0x3,ttl=64,frag=no),udp(src=0/0,dst=0/0x800), packets:1, bytes:152, used:3.070s, offloaded:yes, dp:tc, actions:set(tunnel(tun_id=0x2,dst=10.10.121.103,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x30002}),flags(csum|key))),set(eth(src=fa:16:3e:74:20:44,dst=9c:cc:83:58:1c:60)),set(ipv4(ttl=63)),genev_sys_6081
ufid:aa2a6075-74e8-4022-b3d6-9d3192cc880d, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x21),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(enp4s0f0_2),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:03:8d:eb,dst=fa:16:3e:96:d9:20),eth_type(0x0800),ipv4(src=20.20.220.128/255.255.255.192,dst=10.35.0.0/255.255.128.0,proto=1,tos=0/0x3,ttl=64,frag=no),icmp(type=0/0,code=0/0), packets:379, bytes:60640, used:0.850s, offloaded:yes, dp:tc, actions:set(tunnel(tun_id=0x2,dst=10.10.121.103,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x30002}),flags(csum|key))),set(eth(src=fa:16:3e:74:20:44,dst=9c:cc:83:58:1c:60)),set(ipv4(ttl=63)),genev_sys_6081
ufid:28fb9f44-4f87-4fe8-b2e6-6ee76847673a, skb_priority(0/0),tunnel(tun_id=0x3,src=10.10.121.103,dst=10.10.121.169,ttl=0/0,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x60005/0x7fffffff}),flags(+key)),skb_mark(0/0),ct_state(0/0x21),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(genev_sys_6081),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:96:d9:20,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=1,tos=0/0,ttl=0/0,frag=no),icmp(type=0/0,code=0/0), packets:379, bytes:37142, used:0.850s, offloaded:yes, dp:tc, actions:enp4s0f0_2
ufid:8bbf6429-398f-46d3-b719-5fbbf506d539, skb_priority(0/0),tunnel(tun_id=0x3,src=10.10.121.103,dst=10.10.121.169,ttl=0/0,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x60003/0x7fffffff}),flags(+key)),skb_mark(0/0),ct_state(0/0x21),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(genev_sys_6081),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:96:d9:20,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=1,tos=0/0,ttl=0/0,frag=no),icmp(type=0/0,code=0/0), packets:372, bytes:36456, used:0.850s, offloaded:yes, dp:tc, actions:enp4s0f0_9
ufid:d7a56785-6997-45ed-b162-ffe59dab9364, skb_priority(0/0),tunnel(tun_id=0x3,src=10.10.121.103,dst=10.10.121.169,ttl=0/0,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x60003/0x7fffffff}),flags(+key)),skb_mark(0/0),ct_state(0/0x21),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(genev_sys_6081),packet_type(ns=0/0,id=0/0),eth(src=fa:16:3e:96:d9:20,dst=00:00:00:00:00:00/01:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=17,tos=0/0,ttl=0/0,frag=no),udp(src=0/0,dst=32768/0x8000), packets:3, bytes:270, used:3.070s, offloaded:yes, dp:tc, actions:enp4s0f0_9
ufid:4e9f7708-d134-48fd-9eb0-ad9f646afb14, recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(enp6s0f0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=d0:07:ca:34:e9:17,dst=01:00:5e:00:00:01),eth_type(0x8100),vlan(vid=124,pcp=0/0x0),encap(eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=0/0,tos=0/0,ttl=0/0,frag=no)), packets:0, bytes:0, used:never, dp:ovs, actions:userspace(pid=4183418373,slow_path(match))
ufid:e3a5a9ff-3f1d-4a5d-ac75-0a006da3b30e, recirc_id(0),dp_hash(0/0),skb_priority(0/0),tunnel(tun_id=0x0,src=10.10.121.172,dst=10.10.121.169,ttl=0/0,flags(-df+csum+key)),in_port(genev_sys_6081),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=17,tos=0/0,ttl=0/0,frag=no),udp(src=0/0,dst=3784), packets:931, bytes:61446, used:0.923s, dp:ovs, actions:userspace(pid=3978135798,slow_path(bfd))
ufid:b652e835-b598-4b30-8be1-379a4b94df21, recirc_id(0),dp_hash(0/0),skb_priority(0/0),tunnel(tun_id=0x0,src=10.10.121.131,dst=10.10.121.169,ttl=0/0,flags(-df+csum+key)),in_port(genev_sys_6081),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=17,tos=0/0,ttl=0/0,frag=no),udp(src=0/0,dst=3784), packets:931, bytes:61446, used:0.359s, dp:ovs, actions:userspace(pid=3978135798,slow_path(bfd))
ufid:6d0b12db-380d-4f64-81e1-bc1d6614025e, recirc_id(0),dp_hash(0/0),skb_priority(0/0),tunnel(tun_id=0x0,src=10.10.121.103,dst=10.10.121.169,ttl=0/0,flags(-df+csum+key)),in_port(genev_sys_6081),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=17,tos=0/0,ttl=0/0,frag=no),udp(src=0/0,dst=3784), packets:938, bytes:61908, used:0.207s, dp:ovs, actions:userspace(pid=3978135798,slow_path(bfd))
ufid:3f0eb711-5f01-46ca-91ed-bf5cd2a0cb80, recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(enp4s0f0_9),skb_mark(0/0),ct_state(0/0x21),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=fa:16:3e:de:26:4c,dst=fa:16:3e:96:d9:20),eth_type(0x0806),arp(sip=20.20.220.171,tip=20.20.220.254,op=1/0xff,sha=fa:16:3e:de:26:4c,tha=00:00:00:00:00:00), packets:0, bytes:0, used:never, dp:ovs, actions:userspace(pid=2567040171,slow_path(action))
So do we have any idea between which versions/commits we should do the bisect? Marcelo, Do you have any idea for comment#81? According to comment #70, it DID NOT work with 8.5 kernel 4.18.0-348.4.el8. That kernel has driver rebased to v5.12 as per https://bugzilla.redhat.com/show_bug.cgi?id=1915308. It also has tc rebased to "latest upstream" (fuzzy) by https://bugzilla.redhat.com/show_bug.cgi?id=1946986, which seems it's v5.13. I don't see any net/openvswitch changes between 8.5 and current net-next, 89f971182417cb27abd82cfc48a7f36b99352ddc. Comment #80 says it worked with v5.15. With that, I'm thinking the haystack that we're looking for this needle is v5.12..v5.15. And then, while checking the driver diff between 8.5 and 89f971182417cb27abd82cfc48a7f36b99352ddc, I noticed this commit: $ git show 3442e0335e70f348728c17bca924ec507ad6358a commit 3442e0335e70f348728c17bca924ec507ad6358a Author: Yevgeny Kliteynik <kliteyn> Date: Sun Feb 7 04:27:48 2021 +0200 net/mlx5: DR, Add support for matching on geneve TLV option Enable matching on tunnel geneve TLV option using the flex parser. Well, that's precisely what is being done here. The commit has: @@ -360,10 +365,14 @@ static int dr_matcher_set_ste_builders(struct mlx5dr_matcher *matcher, if (dr_mask_is_tnl_vxlan_gpe(&mask, dmn)) mlx5dr_ste_build_tnl_vxlan_gpe(ste_ctx, &sb[idx++], &mask, inner, rx); - else if (dr_mask_is_tnl_geneve(&mask, dmn)) + else if (dr_mask_is_tnl_geneve(&mask, dmn)) { mlx5dr_ste_build_tnl_geneve(ste_ctx, &sb[idx++], &mask, inner, rx); - + if (dr_mask_is_tnl_geneve_tlv_opt(&mask.misc3)) + mlx5dr_ste_build_tnl_geneve_tlv_opt(ste_ctx, &sb[idx++], + &mask, &dmn->info.caps, + inner, rx); + } if (DR_MASK_IS_ETH_L4_MISC_SET(mask.misc3, outer)) mlx5dr_ste_build_eth_l4_misc(ste_ctx, &sb[idx++], &mask, inner, rx); This is too deep in the driver for me now, but apparently up to this commit it was ignoring this part of the information. This commit is on DR (direct routing / which is AKA software steering), which OSP is using. Checking the patchset that introduced this commit, the cover letter mentions: https://lore.kernel.org/netdev/20210420032018.58639-1-saeed%40kernel.org/T/ """ 3) Dynamic Flex parser support: Flex parser is a HW parser that can support protocols that are not natively supported by the HCA, such as Geneve (TLV options) and GTP-U. There are 8 such parsers, and each of them can be assigned to parse a specific set of protocols. 4) Enable matching on Geneve TLV options """ With the writing on #4, apparently that's the case. The patch that we attempted earlier, 929a2faddd55 ("net/mlx5e: Consider geneve_opts for encap contexts"), AFAICT now, it's meant for tx, right? While 3442e0335e70 ("net/mlx5: DR, Add support for matching on geneve TLV option") is on rx side and we would need both. Note that the test here is failing by delivering the packets from the wire to the wrong VF, which would be 'rx' in my wording here. Depending on Nvidia's review now, perhaps we can narrow down that v5.12~v5.15 further. Ariel, thoughts? Any other test that we can do? (In reply to Marcelo Ricardo Leitner from comment #83) > $ git show 3442e0335e70f348728c17bca924ec507ad6358a > commit 3442e0335e70f348728c17bca924ec507ad6358a > Author: Yevgeny Kliteynik <kliteyn> > Date: Sun Feb 7 04:27:48 2021 +0200 > > net/mlx5: DR, Add support for matching on geneve TLV option > > Enable matching on tunnel geneve TLV option using the flex parser. This commit is probably slated for 8.6, via https://bugzilla.redhat.com/show_bug.cgi?id=1982191 . But Alaa/Amir will know better. Not to be backported to 8.4 and will therefor be release noted. (In reply to Marcelo Ricardo Leitner from comment #83) > According to comment #70, it DID NOT work with 8.5 kernel 4.18.0-348.4.el8. > That kernel has driver rebased to v5.12 as per > https://bugzilla.redhat.com/show_bug.cgi?id=1915308. > > It also has tc rebased to "latest upstream" (fuzzy) by > https://bugzilla.redhat.com/show_bug.cgi?id=1946986, which seems it's v5.13. > > I don't see any net/openvswitch changes between 8.5 and current net-next, > 89f971182417cb27abd82cfc48a7f36b99352ddc. > > Comment #80 says it worked with v5.15. > > With that, I'm thinking the haystack that we're looking for this needle is > v5.12..v5.15. > > > > And then, while checking the driver diff between 8.5 and > 89f971182417cb27abd82cfc48a7f36b99352ddc, I noticed this commit: > > $ git show 3442e0335e70f348728c17bca924ec507ad6358a > commit 3442e0335e70f348728c17bca924ec507ad6358a > Author: Yevgeny Kliteynik <kliteyn> > Date: Sun Feb 7 04:27:48 2021 +0200 > > net/mlx5: DR, Add support for matching on geneve TLV option > > Enable matching on tunnel geneve TLV option using the flex parser. > > > Well, that's precisely what is being done here. The commit has: > > @@ -360,10 +365,14 @@ static int dr_matcher_set_ste_builders(struct > mlx5dr_matcher *matcher, > if (dr_mask_is_tnl_vxlan_gpe(&mask, dmn)) > mlx5dr_ste_build_tnl_vxlan_gpe(ste_ctx, &sb[idx++], > &mask, inner, rx); > - else if (dr_mask_is_tnl_geneve(&mask, dmn)) > + else if (dr_mask_is_tnl_geneve(&mask, dmn)) { > mlx5dr_ste_build_tnl_geneve(ste_ctx, &sb[idx++], > &mask, inner, rx); > - > + if (dr_mask_is_tnl_geneve_tlv_opt(&mask.misc3)) > + mlx5dr_ste_build_tnl_geneve_tlv_opt(ste_ctx, > &sb[idx++], > + &mask, > &dmn->info.caps, > + inner, > rx); > + } > if (DR_MASK_IS_ETH_L4_MISC_SET(mask.misc3, outer)) > mlx5dr_ste_build_eth_l4_misc(ste_ctx, &sb[idx++], > &mask, inner, rx); > > This is too deep in the driver for me now, but apparently up to this commit > it was ignoring this part of the information. > This commit is on DR (direct routing / which is AKA software steering), > which OSP is using. > > Checking the patchset that introduced this commit, the cover letter mentions: > https://lore.kernel.org/netdev/20210420032018.58639-1-saeed%40kernel.org/T/ > > """ > 3) Dynamic Flex parser support: > Flex parser is a HW parser that can support protocols that are not > natively supported by the HCA, such as Geneve (TLV options) and GTP-U. > There are 8 such parsers, and each of them can be assigned to parse a > specific set of protocols. > > 4) Enable matching on Geneve TLV options > """ > > With the writing on #4, apparently that's the case. > > The patch that we attempted earlier, > 929a2faddd55 ("net/mlx5e: Consider geneve_opts for encap contexts"), > AFAICT now, it's meant for tx, right? While 3442e0335e70 ("net/mlx5: DR, Add > support for matching on geneve TLV option") > is on rx side and we would need both. Note that the test here is failing by > delivering the packets from the > wire to the wrong VF, which would be 'rx' in my wording here. > > Depending on Nvidia's review now, perhaps we can narrow down that > v5.12~v5.15 further. > Ariel, thoughts? Any other test that we can do? I think we have a bingo. Nice catch Marcelo. This is indeed affecting matching on geneve headers on the RX path. Looks like u have a valid test for the RX fix already. To validate the TX we need to try and send traffic with different geneve options (but same tunnel IPs) from the same host and see that indeed the different flows have different options. Thanks Ariel. With that, Amir, can we have 8.4.z test kernel with this fix/series also please? Thanks. (In reply to Marcelo Ricardo Leitner from comment #87) > Thanks Ariel. > With that, Amir, can we have 8.4.z test kernel with this fix/series also > please? Thanks. Marcelo, we need to confirm the repro u have is with SW steering because the patch you pointed out is relevant only to that mode. Ariel Right. I thought I had asked folks that already, but if I did, I don't know where. :-} Yariv, Haresh, can you please confirm Ariel's question on comment #88? Thanks, Marcelo Hi Ariel, Marcelo, Steering mode in OSP (16.1.3 onwards) is smfs. Thanks And the test was using 16.2, ok. Thanks Haresh. Ariel, should we assume that changing to smfs is hard to fail? Because in that case, it would be using dmfs, but in a normal run I have never seen the change to smfs fail. Asking because AFAIK OSP ignores the failure and continues with dmfs. As long as it is set while in legacy mode (not switchdev) it is not likely to fail. With all the above, my understanding is that we can safely assume the host was using smfs at that moment. Please speak if you (anyone) don't agree. :-) Hi Miguel, From comment#48, this issue is only related to ml2/ovn(and thus geneve), can you please update the bug summary and remove "ovs"? Thanks Considering patch from comment #86 is already present in 9.0 beta and we agreed today to not backport this to rhel8 unless requested by a customer (well, or via a general driver update), we're good from the RHEL side on this issue. Haresh, considering the above, what should we do this bz then? *** Bug 2014183 has been marked as a duplicate of this bug. *** Raising severity due to the AutomationBlocker keyword This issue is not happening in 17.1. I have 2 vms in each compute and I can ping all of them (overcloud) [stack@undercloud-0 ~]$ openstack server list --all-projects --long +--------------------------------------+------------------------------------------+--------+------------+-------------+------------------------------------------------------+----------------------------------------------+--------------------------------------+--------------------+-------------------+-----------------------------------+------------+ | ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor | Availability Zone | Host | Properties | +--------------------------------------+------------------------------------------+--------+------------+-------------+------------------------------------------------------+----------------------------------------------+--------------------------------------+--------------------+-------------------+-----------------------------------+------------+ | 9c4b9a42-ad31-4e4c-af8b-cdf6f616ef84 | tempest-TestNfvOffload-server-1581528699 | ACTIVE | None | Running | mellanox-geneve-provider=10.46.228.40, 20.20.220.178 | rhel-guest-image-nfv-2-8.7-1660.x86_64.qcow2 | c4d2c2e7-536e-4aa8-b3b7-e8f0f6d0cc90 | nfv_qe_ag_flavor_1 | nova | computehwoffload-r740.localdomain | | | 384e3861-fe40-4b32-a3e5-5df6bba27d08 | tempest-TestNfvOffload-server-123555249 | ACTIVE | None | Running | mellanox-geneve-provider=10.46.228.39, 20.20.220.149 | rhel-guest-image-nfv-2-8.7-1660.x86_64.qcow2 | c4d2c2e7-536e-4aa8-b3b7-e8f0f6d0cc90 | nfv_qe_ag_flavor_0 | nova | computehwoffload-r730.localdomain | | | fdc3a9a1-1271-46db-b6e6-1e69adac4944 | tempest-TestNfvOffload-server-1714201295 | ACTIVE | None | Running | mellanox-geneve-provider=10.46.228.35, 20.20.220.118 | rhel-guest-image-nfv-2-8.7-1660.x86_64.qcow2 | c4d2c2e7-536e-4aa8-b3b7-e8f0f6d0cc90 | nfv_qe_ag_flavor_1 | nova | computehwoffload-r740.localdomain | | | cb50f5de-7a1c-4ead-a7cc-2b5d12cc5f03 | tempest-TestNfvOffload-server-514728000 | ACTIVE | None | Running | mellanox-geneve-provider=10.46.228.34, 20.20.220.140 | rhel-guest-image-nfv-2-8.7-1660.x86_64.qcow2 | c4d2c2e7-536e-4aa8-b3b7-e8f0f6d0cc90 | nfv_qe_ag_flavor_0 | nova | computehwoffload-r730.localdomain | | +--------------------------------------+------------------------------------------+--------+------------+-------------+------------------------------------------------------+----------------------------------------------+--------------------------------------+--------------------+-------------------+-----------------------------------+------------+ (overcloud) [stack@undercloud-0 ~]$ ping -w 1 -c 1 10.46.228.40 PING 10.46.228.40 (10.46.228.40) 56(84) bytes of data. 64 bytes from 10.46.228.40: icmp_seq=1 ttl=61 time=7.34 ms --- 10.46.228.40 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 7.343/7.343/7.343/0.000 ms (overcloud) [stack@undercloud-0 ~]$ ping -w 1 -c 1 10.46.228.39 PING 10.46.228.39 (10.46.228.39) 56(84) bytes of data. 64 bytes from 10.46.228.39: icmp_seq=1 ttl=61 time=7.64 ms --- 10.46.228.39 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 7.637/7.637/7.637/0.000 ms (overcloud) [stack@undercloud-0 ~]$ ping -w 1 -c 1 10.46.228.35 PING 10.46.228.35 (10.46.228.35) 56(84) bytes of data. 64 bytes from 10.46.228.35: icmp_seq=1 ttl=61 time=5.51 ms --- 10.46.228.35 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 5.510/5.510/5.510/0.000 ms (overcloud) [stack@undercloud-0 ~]$ ping -w 1 -c 1 10.46.228.34 PING 10.46.228.34 (10.46.228.34) 56(84) bytes of data. 64 bytes from 10.46.228.34: icmp_seq=1 ttl=61 time=2.19 ms --- 10.46.228.34 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 2.189/2.189/2.189/0.000 ms (overcloud) [stack@undercloud-0 ~]$ cat core_puddle_version RHOS-17.1-RHEL-9-20230613.n.1(overcloud) [stack@undercloud-0 ~] |