Bug 1597236 - [Netvirt] Tempest tests fail indicating FIP connectivity problems, vpnid=-1
Summary: [Netvirt] Tempest tests fail indicating FIP connectivity problems, vpnid=-1
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: z2
: 13.0 (Queens)
Assignee: Vishal Thapar
QA Contact: Waldemar Znoinski
URL:
Whiteboard: Netvirt
: 1609334 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-02 10:29 UTC by Waldemar Znoinski
Modified: 2023-09-15 01:27 UTC (History)
7 users (show)

Fixed In Version: opendaylight-8.3.0-2.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
N/A
Last Closed: 2018-08-29 16:20:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport and odltools from a moment cold_migration test failed (but didn't clean the resources yet) (8.40 MB, application/zip)
2018-07-02 10:34 UTC, Waldemar Znoinski
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenDaylight Bug NETVIRT-1358 0 None None None 2018-07-04 06:20:45 UTC
OpenDaylight gerrit 73725 0 None None None 2018-07-04 06:25:25 UTC
Red Hat Issue Tracker ODL-275 0 None None None 2022-07-09 11:52:18 UTC
Red Hat Issue Tracker OSP-17278 0 None None None 2022-07-09 11:52:38 UTC
Red Hat Product Errata RHSA-2018:2598 0 None None None 2018-08-29 16:21:11 UTC

Description Waldemar Znoinski 2018-07-02 10:29:52 UTC
Description of problem:
FIP issues to VMs spawned by tempest.
missing vpnid when creating a flow.

Table:21, Host:127745018911474, DpnId:127745018911474/0x742ef4795af2, FlowId:DefaultFibRouteForSNATSNAT.127745018911474.21.​​-1,VpnId:8388607/0xfffffe,Reason:VpnInstance for VpnId not found
('Flow: ', '{"barrier": false, "flow-name": "DefaultFibRouteForSNATSNAT.127745018911474.21.-1", "idle-timeout": 0, "installHw": true, "priority": 10, "strict": false, "table_id": 21, "id": "DefaultFibRouteForSNATSNAT.127745018911474.21.-1", "cookie": "0x8000006", "hard-timeout": 0, "match": {"ethernet-match": {"ethernet-type": {"type": 2048}}, "metadata": {"metadata-mask": "0xfffffe", "metadata": "0xfffffe"}}, "instructions": {"instruction": [{"go-to-table": {"table_id": 26}, "order": 0}]}}')


Version-Release number of selected component (if applicable):
opendaylight-8.3.0-1


How reproducible:
~50% of CI jobs

Steps to Reproduce:
1.
2.
3.

Actual results:
tempest tests fail indicating FIP connectivity problems

Expected results:
tempest tests to pass

Additional info:

Comment 1 Waldemar Znoinski 2018-07-02 10:34:40 UTC
Created attachment 1455932 [details]
sosreport and odltools from a moment cold_migration test failed (but didn't clean the resources yet)

Comment 2 Vishal Thapar 2018-07-02 11:42:22 UTC
There are some false alarm logs for routers that don't exist, likely https://bugzilla.redhat.com/show_bug.cgi?id=1519783 

These are the different SNAT default flows for this external router:
Table:21, Host:host-192-168-24-6.localdomain, DpnId:497381571937/0x73ce407d61, FlowId:DefaultFibRouteForSNATSNAT.497381571937.21.100184,VpnId:100184/0x30eb0,Reason:None
Table:21, Host:host-192-168-24-7.localdomain, DpnId:127745018911474/0x742ef4795af2, FlowId:DefaultFibRouteForSNATSNAT.127745018911474.21.-1,VpnId:8388607/0xfffffe,Reason:VpnInstance for VpnId not found
Table:21, Host:host-192-168-24-17.localdomain, DpnId:123382746230149/0x703748c2ad85, FlowId:DefaultFibRouteForSNATSNAT.123382746230149.21.100184,VpnId:100184/0x30eb0,Reason:None
Table:21, Host:host-192-168-24-11.localdomain, DpnId:154348852276935/0x8c612482eec7, FlowId:DefaultFibRouteForSNATSNAT.154348852276935.21.100184,VpnId:100184/0x30eb0,Reason:None

Note that other 3 OVS got flows correctly, only on one it failed. It is likely the selected NAPT switch. Need to confirm from some other models as to which one is selected NAPT.

Comment 3 Vishal Thapar 2018-07-02 11:55:36 UTC
Confirmed this is indeed the selected NAPT switch:

{
    "napt-switches": {
        "router-to-napt-switch": [
            {
                "router-name": "75875187-4dd1-4211-b62f-aea702df4f54",
                "primary-switch-id": 127745018911474
            }
        ]
    }
}

Router ID also matches the one in logs:

2018-06-29T02:13:39,407 | WARN  | org.opendaylight.yang.gen.v1.urn.opendaylight.neutron.ports.rev150712.ports.attributes.ports.Port_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | NeutronPortChangeListener        | 360 - org.opendaylight.netvirt.neutronvpn-impl - 0.6.3.redhat-1 | No router found for router GW port a7494061-ff84-49a3-a898-e7ea5ee0238b for router 75875187-4dd1-4211-b62f-aea702df4f54

2018-06-29T02:13:39,518 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.neutron.l3.rev150712.routers.attributes.routers.Router_AsyncClusteredDataTreeChangeListenerBase-DataTreeChangeHandler-0 | NeutronRouterChangeListener      | 354 - org.opendaylight.netvirt.ipv6service-impl - 0.6.3.redhat-1 | Add Router notification handler is invoked Uuid [_value=75875187-4dd1-4211-b62f-aea702df4f54].

But the NAPT code does show it has correct routerId:
2018-06-29T02:13:39,550 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | ConntrackBasedSnatService        | 358 - org.opendaylight.netvirt.natservice-impl - 0.6.3.redhat-1 | installSnatSpecificEntriesForNaptSwitch: called for router 75875187-4dd1-4211-b62f-aea702df4f54
2018-06-29T02:13:39,550 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | ConntrackBasedSnatService        | 358 - org.opendaylight.netvirt.natservice-impl - 0.6.3.redhat-1 | installSnatSpecificEntriesForNaptSwitch : called for the primary NAPT switch dpnId 127745018911474
2018-06-29T02:13:39,551 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | ConntrackBasedSnatService        | 358 - org.opendaylight.netvirt.natservice-impl - 0.6.3.redhat-1 | installTerminatingServiceTblEntry : creating entry for Terminating Service Table for switch 127745018911474, routerId 100184
2018-06-29T02:13:39,551 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | ConntrackBasedSnatService        | 358 - org.opendaylight.netvirt.natservice-impl - 0.6.3.redhat-1 | createOutboundTblTrackEntry : called for switch 127745018911474, routerId 100184
2018-06-29T02:13:39,552 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | ConntrackBasedSnatService        | 358 - org.opendaylight.netvirt.natservice-impl - 0.6.3.redhat-1 | createOutboundTblEntry : dpId 127745018911474 and routerId 100184
2018-06-29T02:13:39,553 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | ConntrackBasedSnatService        | 358 - org.opendaylight.netvirt.natservice-impl - 0.6.3.redhat-1 | installNaptPfibFlow : dpId 127745018911474, extNetId 100000
2018-06-29T02:13:39,553 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | ConntrackBasedSnatService        | 358 - org.opendaylight.netvirt.natservice-impl - 0.6.3.redhat-1 | installInboundEntry : dpId 127745018911474 and routerId 100184
2018-06-29T02:13:39,554 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | ConntrackBasedSnatService        | 358 - org.opendaylight.netvirt.natservice-impl - 0.6.3.redhat-1 | installNaptPfibEntry : called for dpnId 127745018911474 and routerId 100184

Comment 4 Vishal Thapar 2018-07-02 12:10:38 UTC
And found the smoking gun:

2018-06-29T02:13:39,549 | INFO  | org.opendaylight.yang.gen.v1.urn.opendaylight.netvirt.natservice.rev160111.napt.switches.RouterToNaptSwitch_AsyncDataTreeChangeListenerBase-DataTreeChangeHandler-0 | AbstractSnatService              | 358 - org.opendaylight.netvirt.natservice-impl - 0.6.3.redhat-1 | installInboundTerminatingServiceTblEntry : creating entry for Terminating Service Table for switch 127745018911474, routerId -1

This is flow for this one, note -1 in flowId:

Table:36, Host:host-192-168-24-7.localdomain, DpnId:127745018911474/0x742ef4795af2, FlowId:SNAT.127745018911474.36.-1INBOUND,MplsLabel:100001,Reason:None
Flow: {"barrier": false, "flow-name": "SNAT.127745018911474.36.-1INBOUND", "idle-timeout": 0, "installHw": true, "priority": 42, "strict": false, "table_id": 36, "id": "SNAT.127745018911474.36.-1INBOUND", "cookie": "0x8000006", "hard-timeout": 0, "match": {"tunnel": {"tunnel-id": 100001}, "ethernet-match": {"ethernet-type": {"type": 2048}}}, "instructions": {"instruction": [{"order": 0, "apply-actions": {"action": [{"order": 0, "openflowplugin-extension-nicira-action:nx-reg-load": {"dst": {"of-metadata": [null], "start": 0, "end": 23}, "value": "0x30d42"}}, {"order": 1, "openflowplugin-extension-nicira-action:nx-resubmit": {"table": 44}}]}}]}}

Comment 5 Aswin Suryanarayanan 2018-07-02 13:06:15 UTC

we are checking for the presence of the value in neutron model [1]? nat util tries to retrieve it from a vpnservice model, shouldn't that create an issue?

[1]https://github.com/opendaylight/netvirt/blob/stable/oxygen/neutronvpn/impl/src/main/java/org/opendaylight/netvirt/neutronvpn/NeutronvpnUtils.java#L37, 

[2]https://github.com/opendaylight/netvirt/blob/stable/oxygen/natservice/impl/src/main/java/org/opendaylight/netvirt/natservice/internal/NatUtil.java#L258


when we retrieve it in AbstractSnatService it seems to be -1 , but a while later when we retrieve it in ConntrackBasedSnatService it seems to have a valid value

Comment 6 Vishal Thapar 2018-07-02 13:45:37 UTC
(In reply to Aswin Suryanarayanan from comment #5)
> 
> we are checking for the presence of the value in neutron model [1]? nat util
> tries to retrieve it from a vpnservice model, shouldn't that create an issue?
> 
> [1]https://github.com/opendaylight/netvirt/blob/stable/oxygen/neutronvpn/
> impl/src/main/java/org/opendaylight/netvirt/neutronvpn/NeutronvpnUtils.
> java#L37, 
> 
> [2]https://github.com/opendaylight/netvirt/blob/stable/oxygen/natservice/
> impl/src/main/java/org/opendaylight/netvirt/natservice/internal/NatUtil.
> java#L258
> 
> 
> when we retrieve it in AbstractSnatService it seems to be -1 , but a while
> later when we retrieve it in ConntrackBasedSnatService it seems to have a
> valid value

Do they both retrieve from different places? One from neutron model other from vpn? BTW, link in [1] is pointing to import statement.

Comment 7 Aswin Suryanarayanan 2018-07-02 13:57:52 UTC
Oh Copy paste error. Yes it seems to be different model in the first look.

[1]https://github.com/opendaylight/netvirt/blob/stable/oxygen/neutronvpn/impl/src/main/java/org/opendaylight/netvirt/neutronvpn/NeutronvpnUtils.java#L371

Comment 8 Vishal Thapar 2018-07-04 06:25:26 UTC
If routerId is null, we're not doing anything, even if routerId becomes available later. So, fix is to wait for routerId to be available and once it is, process it.

Comment 14 Aswin Suryanarayanan 2018-07-31 06:34:34 UTC
*** Bug 1609334 has been marked as a duplicate of this bug. ***

Comment 21 Joanne O'Flynn 2018-08-14 09:38:21 UTC
This bug is marked for inclusion in the errata but does not currently contain draft documentation text. To ensure the timely release of this advisory please provide draft documentation text for this bug as soon as possible.

If you do not think this bug requires errata documentation, set the requires_doc_text flag to "-".


To add draft documentation text:

* Select the documentation type from the "Doc Type" drop down field.

* A template will be provided in the "Doc Text" field based on the "Doc Type" value selected. Enter draft text in the "Doc Text" field.

Comment 22 Waldemar Znoinski 2018-08-16 14:58:09 UTC
checked failures from last 2 weeks in downstream OSP13 CI (using latest opendaylight RPM) and no sight of this problem nor the vpnId messages

Comment 24 errata-xmlrpc 2018-08-29 16:20:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2598

Comment 27 Red Hat Bugzilla 2023-09-15 01:27:26 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.