Bug 2133457

Summary: [16.2][OVS] [IPv6] icmpv6 is unreachable for short time after reboot overcloud
Product: Red Hat OpenStack Reporter: Fiorella Yanac <fyanac>
Component: openstack-neutronAssignee: Miro Tomaska <mtomaska>
Status: ASSIGNED --- QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: high    
Version: 16.2 (Train)CC: averdagu, bcafarel, chrisw, echaudro, ekuris, eolivare, mpattric, mtomaska, scohen, tfreger
Target Milestone: z10Keywords: Automation, Triaged
Target Release: 16.2 (Train on RHEL 8.4)Flags: echaudro: needinfo-
mtomaska: needinfo? (mpattric)
echaudro: needinfo? (mpattric)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2130394    
Bug Blocks:    

Comment 2 Mike Pattrick 2022-11-14 21:37:20 UTC
@echaudro This appears to be similar to bz2130394. The kernel ends up with an incorrect flow entry, and using dpctl to delete it allows the test to pass. Also noticed that the duplicate_upcall counter increases.

Comment 3 Eelco Chaudron 2022-11-15 08:44:32 UTC
(In reply to Michael Pattrick from comment #2)
> @echaudro This appears to be similar to bz2130394. The kernel
> ends up with an incorrect flow entry, and using dpctl to delete it allows
> the test to pass. Also noticed that the duplicate_upcall counter increases.

This is odd as this would only happen if there is already traffic when the system reboots (restarts) and traffic hits OVS at right between adding the bridge and configuring the controller. But I guess still possible. 

Mike can you try the patched kernel to confirm this is the same issue?

Comment 4 Mike Pattrick 2022-11-15 16:59:54 UTC
We've run the test again with the patched kernel, and the issue remains.

Some additional information, when running tcpdump on interface qvo430839ba-85 we see this packet:

> ethertype IPv6 (0x86dd), length 86: 2001:db8::f816:3eff:fead:c7ee > 2001:db8::f816:3eff:fe29:2e85: ICMP6, neighbor advertisement, tgt is 2001:db8::f816:3eff:fead:c7ee, length 32

We have the following flows installed:

> duration=485.560s, table=0, n_packets=404, n_bytes=34232, priority=10,icmp6,in_port="qvo430839ba-85",icmp_type=136 actions=resubmit(,24)
> duration=485.562s, table=24, n_packets=0, n_bytes=0, priority=2,icmp6,in_port="qvo430839ba-85",icmp_type=136,nd_target=2001:db8::f816:3eff:fead:c7ee actions=resubmit(,60)
> duration=910.457s, table=24, n_packets=287, n_bytes=24298, idle_age=0, priority=0 actions=drop

The second entry should match, but we see zero packets on that one. We have the following flow installed in the kernel:

> recirc_id(0),skb_priority(0),in_port(qvo430839ba-85),eth(),eth_type(0x86dd),ipv6(proto=58,frag=no),icmpv6(type=136), packets:385, bytes:32630, used:0.280s, actions:drop

So we have a flow installed for the drop, even though the resubmit is a higher priority and more specific.

As noted above, when the offending flow is cleared from the kernel, this test passes immediately.

Comment 5 Toni Freger 2023-03-07 08:31:06 UTC
(In reply to Eelco Chaudron from comment #3)
> (In reply to Michael Pattrick from comment #2)
> > @echaudro This appears to be similar to bz2130394. The kernel
> > ends up with an incorrect flow entry, and using dpctl to delete it allows
> > the test to pass. Also noticed that the duplicate_upcall counter increases.
> 
> This is odd as this would only happen if there is already traffic when the
> system reboots (restarts) and traffic hits OVS at right between adding the
> bridge and configuring the controller. But I guess still possible. 
> 
> Mike can you try the patched kernel to confirm this is the same issue?

@fyanac, @eolivare folks can you please check if this type of coverage missing in tobiko? if yes, please review and decide if neutron team should have it in the backlog for automation coverage. Thanks!

Comment 14 Eran Kuris 2023-06-20 09:22:59 UTC
Updating the flags and removing the blocks as it's not persistent failures