Created attachment 1457749 [details] karaf logs, data dumps, ovs logs, flows, sosreport Description of problem: Ping to Google from network namespaces fails for some controller nodes. Version-Release number of selected component (if applicable): OSP13, Oxygen How reproducible: Always Steps to Reproduce: Create 10 VxLAN networks, 10 routers. Attach each network to each router. Attach all routers to public network. Log into each overcloud controller nodes and ping Google. sudo ip netns exec qdhcp-<ns> ping -c4 8.8.8.8 Actual results: All namespaces can ping from controller-0, some from controller-1 and none from controller-2. Expected results: All namespaces from all controller nodes should be able to ping. Additional info: Initial analysis shows group from table 43 is missing.
Janki, can you reproduce it with ping from within a VM to an external IP?
I had a look at the logs and the issue is some missing flows in Table 43 on Controller node. Normally in a working deployment, there would be three flows in Table43 as shown below. cookie=0x822002d, duration=64494.775s, table=43, n_packets=13611, n_bytes=571662, priority=100,arp,arp_op=1 actions=group:5500 cookie=0x822002e, duration=64494.775s, table=43, n_packets=1069, n_bytes=44898, priority=100,arp,arp_op=2 actions=CONTROLLER:65535,resubmit(,48) cookie=0x8220000, duration=64494.868s, table=43, n_packets=51451, n_bytes=5769354, priority=0 actions=goto_table:48 On Controller2, I could only see a single flow while the other two flows are missing. cookie=0x822002e, duration=64493.993s, table=43, n_packets=1069, n_bytes=44898, priority=100,arp,arp_op=2 actions=CONTROLLER:65535,resubmit(,48) Interesingly the missing flows are present in the config datastore. This looks like a similar issue we observed earlier and added a work-around to reset the manager/controller on the OVS-Switch after installation. However, the logic used in the workaround was to look for a presence/absence of a flow in a table and reset accordingly. The logic does works when all the flows in the table are missing, but here the issue is only few flows from the service are missing, so the work-around was not applied in this case. [sgaddam@dhcp-0-56 odl_dumps]$ grep -nri "id\": \"arp.check.table.43.arp.request" * config___opendaylight-inventory__nodes.json:9977: "id": "arp.check.table.43.arp.request", config___opendaylight-inventory__nodes.json:22583: "id": "arp.check.table.43.arp.request", config___opendaylight-inventory__nodes.json:28829: "id": "arp.check.table.43.arp.request", config___opendaylight-inventory__nodes.json:41864: "id": "arp.check.table.43.arp.request", config___opendaylight-inventory__nodes.json:61342: "id": "arp.check.table.43.arp.request", operational___opendaylight-inventory__nodes.json:29299: "id": "arp.check.table.43.arp.request", operational___opendaylight-inventory__nodes.json:38334: "id": "arp.check.table.43.arp.request", operational___opendaylight-inventory__nodes.json:58313: "id": "arp.check.table.43.arp.request", operational___opendaylight-inventory__nodes.json:83399: "id": "arp.check.table.43.arp.request",
As discussed @Janki, please see if we can reproduce this issue with the latest rpm.
I haven't this in last 35 iterations of a stability test run. Issue is a missing flow. We have a workaround that checks if *any* and NOT *all* flows are present. Tweaking it is not a good solution. Also talked with Sridhar. This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1588115. *** This bug has been marked as a duplicate of bug 1588115 ***