Bug 1654796 - [Netvirt][CI] vpn csit test "Check L3_Datapath Traffic Across Networks With Router" failing, flows missing in config datastore
Summary: [Netvirt][CI] vpn csit test "Check L3_Datapath Traffic Across Networks With R...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight
Version: 14.0 (Rocky)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: z1
: 14.0 (Rocky)
Assignee: Vishal Thapar
QA Contact: Noam Manos
URL:
Whiteboard: Netvirt
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-29 17:02 UTC by Waldemar Znoinski
Modified: 2019-03-06 16:17 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-06 16:15:36 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Waldemar Znoinski 2018-11-29 17:02:07 UTC
Description of problem:
sometimes the VPN related csit tests are failing in d/s OSP14 CI jobs
error:
Keyword 'VpnOperations.Verify Flows Are Present For L3VPN' failed after retrying for 30 seconds. The last error was: ' cookie=0x8000003, duration=69.901s, table=21, n_packets=0, n_bytes=0, priority=42,icmp,metadata=0x33ca4/0xfffffe,nw_dst=10.1.1.1,icmp_type=8,icmp_code=0 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:fa:16:3e:a9:10:6b->eth_src,move:NXM_OF_IP_SRC[]->NXM_OF_IP_DST[],set_field:10.1.1.1->ip_src,set_field:0->icmp_type,load:0->NXM_OF_IN_PORT[],resubmit(,21)
 cookie=0x8000003, duration=61.135s, table=21, n_packets=0, n_bytes=0, priority=42,icmp,metadata=0x33ca4/0xfffffe,nw_dst=20.1.1.1,icmp_type=8,icmp_code=0 actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:fa:16:3e:41:9b:f8->eth_src,move:NXM_OF_IP_SRC[]->NXM_OF_IP_DST[],set_field:20.1.1.1->ip_src,set_field:0->icmp_type,load:0->NXM_OF_IN_PORT[],resubmit(,21)
 cookie=0x8000003, duration=69.846s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33ca4/0xfffffe,nw_dst=10.1.1.4 actions=set_field:0x16->tun_id,set_field:fa:16:3e:0d:2f:e9->eth_dst,load:0xf00->NXM_NX_REG6[],resubmit(,220)
 cookie=0x8000003, duration=69.808s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33ca4/0xfffffe,nw_dst=10.1.1.3 actions=set_field:0x16->tun_id,set_field:fa:16:3e:6b:21:8d->eth_dst,load:0x1000->NXM_NX_REG6[],resubmit(,220)
    [ Message content over the limit has been removed. ]
...p,metadata=0x33ca4/0xfffffe,nw_dst=20.1.1.19 actions=set_field:0x55->tun_id,set_field:fa:16:3e:d8:f9:a5->eth_dst,load:0x200->NXM_NX_REG6[],resubmit(,220)
 cookie=0x8000003, duration=61.054s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33ca4/0xfffffe,nw_dst=20.1.1.3 actions=set_field:0x55->tun_id,set_field:fa:16:3e:e3:ef:8a->eth_dst,load:0xf00->NXM_NX_REG6[],resubmit(,220)
 cookie=0x8000003, duration=60.990s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33ca4/0xfffffe,nw_dst=20.1.1.4 actions=set_field:0x55->tun_id,set_field:fa:16:3e:9b:c0:fc->eth_dst,load:0x1000->NXM_NX_REG6[],resubmit(,220)
 cookie=0x8000003, duration=60.982s, table=21, n_packets=0, n_bytes=0, priority=42,ip,metadata=0x33ca4/0xfffffe,nw_dst=20.1.1.2 actions=set_field:0x55->tun_id,set_field:fa:16:3e:68:48:e2->eth_dst,load:0x1200->NXM_NX_REG6[],resubmit(,220)
 cookie=0x8000003, duration=6009.645s, table=21, n_packets=0, n_bytes=0, priority=34,ip,metadata=0x33c22/0xfffffe,nw_dst=10.0.0.0/24 actions=write_metadata:0x1772033c22/0xfffffffffe,goto_table:22
 cookie=0x8000003, duration=1639.930s, table=21, n_packets=0, n_bytes=0, priority=34,ip,metadata=0x33c6a/0xfffffe,nw_dst=10.10.10.0/24 actions=write_metadata:0x177c033c6a/0xfffffffffe,goto_table:22' does not contain '20.1.1.5'

it's still happening (i.e.: saw it a couple of days back in CI run)

it was looked at by Sridhar briefly in the past, his hints:


In this use-case, we basically create two IPv4 tenant networks/subnets and associate them to a Neutron Router.
In each of the network we spawn two VMs (one on Compute-0 and the other on Compute-1).

As part of "Check L3_Datapath Traffic Across Networks With Router", we want to ensure that for each of the VMs that are spawned on the networks (i.e., 4 VMs in total) we would need a FIB entry (i.e., table 21 entry). 
I see that for 3 VMs (10.1.1.31, 10.1.1.5 and 20.1.1.19) we have the FIB entry, but for the fourth VM (i.e., IPAddress 20.1.1.5 which is spawned on Compute-1) the FIB entry is missing, because of this the test-case is marked as failure.

I had a quick look at the Config Store entries and other YANG models and can confirm that the flows are missing even in the Config DataStore (i.e., something happened on ODL/Netvirt Side and VrfEntryListener/FIBManager in Netvirt did not even program the flows in Config store). Tried to look at the Karaf logs and there were some ERRORs but I couldn't figure out quickly the exact one that could have triggered this issue.

Just thought of sharing this info so that if you see this test failure consistently you will have some background on what went wrong during this run.


Version-Release number of selected component (if applicable):
osp14
puddle: 2018-11-21.7
odl 8.3.0-7

How reproducible:
15%


Steps to Reproduce:
1. deploy OSP14 + ODL
2. run csit vpn test suite
3.

Actual results:
random vpn csit test cases fail occasionally

Expected results:
vpn csit tests to pass

Additional info:
CI job at: https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com//job/DFG-opendaylight-odl-netvirt-14_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ha-csit/91/

Comment 2 Franck Baudin 2019-03-06 16:15:36 UTC
As per depreciation notice [1], closing this bug. Please reopen if relevant for RHOSP13, as this is the only version shipping ODL.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/release_notes/index#deprecated_functionality

Comment 3 Franck Baudin 2019-03-06 16:17:27 UTC
As per depreciation notice [1], closing this bug. Please reopen if relevant for RHOSP13, as this is the only version shipping ODL.

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/14/html-single/release_notes/index#deprecated_functionality


Note You need to log in before you can comment on or make changes to this bug.