Bug 1599647 - [Netvirt] Ping to Google from n/w namespace fails for some n/s on some controller nodes
Summary: [Netvirt] Ping to Google from n/w namespace fails for some n/s on some contro...
Keywords:
Status: CLOSED DUPLICATE of bug 1588115
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z4
: 13.0 (Queens)
Assignee: Sridhar Gaddam
QA Contact: Noam Manos
URL:
Whiteboard: Netvirt
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-10 09:46 UTC by Janki
Modified: 2018-10-24 12:37 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
N/A
Last Closed: 2018-10-17 08:07:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
karaf logs, data dumps, ovs logs, flows, sosreport (1.87 MB, application/x-xz)
2018-07-10 09:46 UTC, Janki
no flags Details

Description Janki 2018-07-10 09:46:53 UTC
Created attachment 1457749 [details]
karaf logs, data dumps, ovs logs, flows, sosreport

Description of problem:
Ping to Google from network namespaces fails for some controller nodes.

Version-Release number of selected component (if applicable):
OSP13, Oxygen

How reproducible:
Always

Steps to Reproduce:
Create 10 VxLAN networks, 10 routers. Attach each network to each router. Attach all routers to public network. Log into each overcloud controller nodes and ping Google.

sudo ip netns exec qdhcp-<ns> ping -c4 8.8.8.8

Actual results:
All namespaces can ping from controller-0, some from controller-1 and none from controller-2.

Expected results:
All namespaces from all controller nodes should be able to ping.

Additional info:
Initial analysis shows group from table 43 is missing.

Comment 1 Mike Kolesnik 2018-07-12 08:38:14 UTC
Janki, can you reproduce it with ping from within a VM to an external IP?

Comment 4 Sridhar Gaddam 2018-08-01 12:11:36 UTC
I had a look at the logs and the issue is some missing flows in Table 43 on Controller node.

Normally in a working deployment, there would be three flows in Table43 as shown below. 
 cookie=0x822002d, duration=64494.775s, table=43, n_packets=13611, n_bytes=571662, priority=100,arp,arp_op=1 actions=group:5500                                                               
 cookie=0x822002e, duration=64494.775s, table=43, n_packets=1069, n_bytes=44898, priority=100,arp,arp_op=2 actions=CONTROLLER:65535,resubmit(,48)
 cookie=0x8220000, duration=64494.868s, table=43, n_packets=51451, n_bytes=5769354, priority=0 actions=goto_table:48

On Controller2, I could only see a single flow while the other two flows are missing.
cookie=0x822002e, duration=64493.993s, table=43, n_packets=1069, n_bytes=44898, priority=100,arp,arp_op=2 actions=CONTROLLER:65535,resubmit(,48)

Interesingly the missing flows are present in the config datastore. This looks like a similar issue we observed earlier and added a work-around to reset the manager/controller on the OVS-Switch after installation.
However, the logic used in the workaround was to look for a presence/absence of a flow in a table and reset accordingly. 
The logic does works when all the flows in the table are missing, but here the issue is only few flows from the service are missing, so the work-around was not applied in this case.

[sgaddam@dhcp-0-56 odl_dumps]$ grep -nri "id\": \"arp.check.table.43.arp.request" *
config___opendaylight-inventory__nodes.json:9977:                                "id": "arp.check.table.43.arp.request",
config___opendaylight-inventory__nodes.json:22583:                                "id": "arp.check.table.43.arp.request",
config___opendaylight-inventory__nodes.json:28829:                                "id": "arp.check.table.43.arp.request",
config___opendaylight-inventory__nodes.json:41864:                                "id": "arp.check.table.43.arp.request",
config___opendaylight-inventory__nodes.json:61342:                                "id": "arp.check.table.43.arp.request",
operational___opendaylight-inventory__nodes.json:29299:                                "id": "arp.check.table.43.arp.request",
operational___opendaylight-inventory__nodes.json:38334:                                "id": "arp.check.table.43.arp.request",
operational___opendaylight-inventory__nodes.json:58313:                                "id": "arp.check.table.43.arp.request",
operational___opendaylight-inventory__nodes.json:83399:                                "id": "arp.check.table.43.arp.request",

Comment 6 Sridhar Gaddam 2018-08-28 05:36:43 UTC
As discussed @Janki, please see if we can reproduce this issue with the latest rpm.

Comment 7 Janki 2018-10-17 08:07:55 UTC
I haven't this in last 35 iterations of a stability test run. Issue is a missing flow. We have a workaround that checks if *any* and NOT *all* flows are present. Tweaking it is  not a good solution. Also talked with Sridhar. This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1588115.

*** This bug has been marked as a duplicate of bug 1588115 ***


Note You need to log in before you can comment on or make changes to this bug.