Bug 1654796

Summary: [Netvirt][CI] vpn csit test "Check L3_Datapath Traffic Across Networks With Router" failing, flows missing in config datastore
Product: Red Hat OpenStack Reporter: Waldemar Znoinski <wznoinsk>
Component: opendaylightAssignee: Vishal Thapar <vthapar>
Status: CLOSED WONTFIX QA Contact: Noam Manos <nmanos>
Severity: medium Docs Contact:
Priority: medium    
Version: 14.0 (Rocky)CC: mkolesni
Target Milestone: z1Keywords: Triaged, ZStream
Target Release: 14.0 (Rocky)   
Hardware: x86_64   
OS: Linux   
Whiteboard: Netvirt
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-06 16:15:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Waldemar Znoinski 2018-11-29 17:02:07 UTC
Description of problem:
sometimes the VPN related csit tests are failing in d/s OSP14 CI jobs
error:
Keyword 'VpnOperations.Verify Flows Are Present For L3VPN' failed after retrying for 30 seconds. The last error was: ' cookie=0x8000003, duration=69.901s, table=21, n_packets=0, n_bytes=0, priority=42,icmp,metadata=0x33ca4/0xfffffe,nw_dst=10.1.1.1,icmp_type=8,icmp_code=0 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:fa:16:3e:a9:10:6b->eth_src,move:NXM_OF_IP_SRC[]->NXM_OF_IP_DST[],set_field:10.1.1.1->ip_src,set_field:0->icmp_type,load:0->NXM_OF_IN_PORT[],resubmit(,21)
 cookie=0x8000003, duration=61.135s, table=21, n_packets=0, n_bytes=0, priority=42,icmp,metadata=0x33ca4/0xfffffe,nw_dst=20.1.1.1,icmp_type=8,icmp_code=0 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:fa:16:3e:41:9b:f8->eth_src,move:NXM_OF_IP_SRC[]->NXM_OF_IP_DST[],set_field:20.1.1.1->ip_src,set_field:0->icmp_type,load:0->NXM_OF_IN_PORT[],resubmit(,21)
 cookie=0x8000003, duration=69.846s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33ca4/0xfffffe,nw_dst=10.1.1.4 actions=set_field:0x16->tun_id,set_field:fa:16:3e:0d:2f:e9->eth_dst,load:0xf00->NXM_NX_REG6[],resubmit(,220)
 cookie=0x8000003, duration=69.808s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33ca4/0xfffffe,nw_dst=10.1.1.3 actions=set_field:0x16->tun_id,set_field:fa:16:3e:6b:21:8d->eth_dst,load:0x1000->NXM_NX_REG6[],resubmit(,220)
    [ Message content over the limit has been removed. ]
...p,metadata=0x33ca4/0xfffffe,nw_dst=20.1.1.19 actions=set_field:0x55->tun_id,set_field:fa:16:3e:d8:f9:a5->eth_dst,load:0x200->NXM_NX_REG6[],resubmit(,220)
 cookie=0x8000003, duration=61.054s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33ca4/0xfffffe,nw_dst=20.1.1.3 actions=set_field:0x55->tun_id,set_field:fa:16:3e:e3:ef:8a->eth_dst,load:0xf00->NXM_NX_REG6[],resubmit(,220)
 cookie=0x8000003, duration=60.990s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33ca4/0xfffffe,nw_dst=20.1.1.4 actions=set_field:0x55->tun_id,set_field:fa:16:3e:9b:c0:fc->eth_dst,load:0x1000->NXM_NX_REG6[],resubmit(,220)
 cookie=0x8000003, duration=60.982s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33ca4/0xfffffe,nw_dst=20.1.1.2 actions=set_field:0x55->tun_id,set_field:fa:16:3e:68:48:e2->eth_dst,load:0x1200->NXM_NX_REG6[],resubmit(,220)
 cookie=0x8000003, duration=6009.645s, table=21, n_packets=0, n_bytes=0, priority=34,ip,metadata=0x33c22/0xfffffe,nw_dst=10.0.0.0/24 actions=write_metadata:0x1772033c22/0xfffffffffe,goto_table:22
 cookie=0x8000003, duration=1639.930s, table=21, n_packets=0, n_bytes=0, priority=34,ip,metadata=0x33c6a/0xfffffe,nw_dst=10.10.10.0/24 actions=write_metadata:0x177c033c6a/0xfffffffffe,goto_table:22' does not contain '20.1.1.5'

it's still happening (i.e.: saw it a couple of days back in CI run)

it was looked at by Sridhar briefly in the past, his hints:


In this use-case, we basically create two IPv4 tenant networks/subnets and associate them to a Neutron Router.
In each of the network we spawn two VMs (one on Compute-0 and the other on Compute-1).

As part of "Check L3_Datapath Traffic Across Networks With Router", we want to ensure that for each of the VMs that are spawned on the networks (i.e., 4 VMs in total) we would need a FIB entry (i.e., table 21 entry). 
I see that for 3 VMs (10.1.1.31, 10.1.1.5 and 20.1.1.19) we have the FIB entry, but for the fourth VM (i.e., IPAddress 20.1.1.5 which is spawned on Compute-1) the FIB entry is missing, because of this the test-case is marked as failure.

I had a quick look at the Config Store entries and other YANG models and can confirm that the flows are missing even in the Config DataStore (i.e., something happened on ODL/Netvirt Side and VrfEntryListener/FIBManager in Netvirt did not even program the flows in Config store). Tried to look at the Karaf logs and there were some ERRORs but I couldn't figure out quickly the exact one that could have triggered this issue.

Just thought of sharing this info so that if you see this test failure consistently you will have some background on what went wrong during this run.


Version-Release number of selected component (if applicable):
osp14
puddle: 2018-11-21.7
odl 8.3.0-7

How reproducible:
15%


Steps to Reproduce:
1. deploy OSP14 + ODL
2. run csit vpn test suite
3.

Actual results:
random vpn csit test cases fail occasionally

Expected results:
vpn csit tests to pass

Additional info:
CI job at: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com//job/DFG-opendaylight-odl-netvirt-14_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ha-csit/91/

Comment 2 Franck Baudin 2019-03-06 16:15:36 UTC
As per depreciation notice [1], closing this bug. Please reopen if relevant for RHOSP13, as this is the only version shipping ODL.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/release_notes/index#deprecated_functionality

Comment 3 Franck Baudin 2019-03-06 16:17:27 UTC
As per depreciation notice [1], closing this bug. Please reopen if relevant for RHOSP13, as this is the only version shipping ODL.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/release_notes/index#deprecated_functionality