Bug 1987249
| Summary: | [OVN] Neutron sets the type of LSP with ha_router_replicated_interface device owner to empty | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Kamil Sambor <ksambor> | |
| Component: | python-networking-ovn | Assignee: | Jakub Libosvar <jlibosva> | |
| Status: | CLOSED ERRATA | QA Contact: | Roman Safronov <rsafrono> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | high | |||
| Version: | 16.1 (Train) | CC: | apevec, atragler, jlibosva, lhh, lmartins, majopela, pmannidi, rsafrono, scohen, spower | |
| Target Milestone: | z7 | Keywords: | Triaged | |
| Target Release: | 16.1 (Train on RHEL 8.2) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | python-networking-ovn-7.3.1-1.20210714143308.el8ost | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1998579 (view as bug list) | Environment: | ||
| Last Closed: | 2021-12-09 20:20:17 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
|
Description
Kamil Sambor
2021-07-29 10:24:35 UTC
I looked at the live env - thanks Roman for providing it. There seems to be 2 things I found:
1) The metadata port was created on br-migration which was later deleted, leaving the metadata port unplugged or the port was not re-plugged to br-int.
2) The LSP connected to router is not set as router port:
_uuid : 4e93ef48-2c78-4bc8-9bc9-74234929c4b8
addresses : [unknown]
dhcpv4_options : []
dhcpv6_options : []
dynamic_addresses : []
enabled : true
external_ids : {"neutron:cidrs"="192.168.168.1/24", "neutron:device_id"="8ad182d0-a6e9-4cee-b2b9-ba083b76c75f", "neutron:device_owner"="network:ha_router_replicated_interface", "neutron:network_name"=neutron-7b1de520-d7b6-4140-b178-74d4122faed7, "neutron:port_name"="", "neutron:project_id"="4d05a9e4a662427d9c0d619fc5ef0ffd", "neutron:revision_number"="2341", "neutron:security_group_ids"=""}
ha_chassis_group : []
name : "247d4fe4-8721-41ce-98e8-299fcb62257c"
options : {mcast_flood_reports="true", requested-chassis=controller-1.redhat.local}
parent_name : []
port_security : []
tag : []
tag_request : []
type : ""
up : false
The db-sync script sets the port to correct state:
2021-08-01 16:37:04.868 48 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): SetLRouterPortInLSwitchPortCommand(lswitch_port=247d4fe4-8721-41ce-98e8-299fcb62257c, lrouter_port=lrp-247d4fe4-8721-41ce-98e8-299fcb62257c, is_gw_port=False, if_exists=True, lsp_address=router) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
but somebody then misconfigures the LSP:
record 324: 2021-08-05 17:10:18.778
table Logical_Switch_Port row "247d4fe4-8721-41ce-98e8-299fcb62257c" (4e93ef48):
addresses=[router]
options={router-port=lrp-247d4fe4-8721-41ce-98e8-299fcb62257c}
type=router
table Logical_Router_Port row "lrp-247d4fe4-8721-41ce-98e8-299fcb62257c" (d1e2e77b):
external_ids={"neutron:network_name"=neutron-7b1de520-d7b6-4140-b178-74d4122faed7, "neutron:revision_number"="2337", "neutron:router_name"="8ad182d0-a6e9-4cee-b2b9-ba083b76c75f", "neutron:subnet_ids"="affdbfe7-5478-4088-912b-3c549b0650be"}
record 325: 2021-08-05 17:10:18.785
table Logical_Switch_Port row "247d4fe4-8721-41ce-98e8-299fcb62257c" (4e93ef48):
up=true
record 326: 2021-08-05 17:10:19.299
table Logical_Switch_Port row "4ea7881d-bbe7-4415-926d-f49d0dc4adfb" (f97d8fc0):
addresses=[router]
options={router-port=lrp-4ea7881d-bbe7-4415-926d-f49d0dc4adfb}
type=router
table Logical_Router_Port row "lrp-4ea7881d-bbe7-4415-926d-f49d0dc4adfb" (8d52cebd):
external_ids={"neutron:network_name"=neutron-a5e872d7-320a-4368-8972-023fdcf6687a, "neutron:revision_number"="2336", "neutron:router_name"="d2ca7c41-ba1e-47ad-834e-fd3c56a0c297", "neutron:subnet_ids"="8966895c-75db-4d03-8e5e-501ee1c201dd"}
record 327: 2021-08-05 17:10:19.305
table Logical_Switch_Port row "4ea7881d-bbe7-4415-926d-f49d0dc4adfb" (f97d8fc0):
up=true
record 328: 2021-08-05 17:10:19.391
table Logical_Switch_Port row "247d4fe4-8721-41ce-98e8-299fcb62257c" (4e93ef48):
addresses=[unknown]
options={mcast_flood_reports="true", requested-chassis=controller-1.redhat.local}
external_ids={"neutron:cidrs"="192.168.168.1/24", "neutron:device_id"="8ad182d0-a6e9-4cee-b2b9-ba083b76c75f", "neutron:device_owner"="network:ha_router_replicated_interface", "neutron:network_name"=neutron-7b1de520-d7b6-4140-b178-74d4122faed7, "neutron:port_name"="", "neutron:project_id"="4d05a9e4a662427d9c0d619fc5ef0ffd", "neutron:revision_number"="2338", "neutron:security_group_ids"=""}
type=""
It seems to be some of the events because one of the neutron server contain transaction responsible for doing it:
2021-08-05 12:03:39.797 27 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): SetLSwitchPortCommand(lport=247d4fe4-8721-41ce-98e8-299fcb62257c, columns={'external_ids': {'neutron:port_name': '', 'neutron:device_id': '8ad182d0-a6e9-4cee-b2b9-ba083b76c75f', 'neutron:project_id': '4d05a9e4a662427d9c0d619fc5ef0ffd', 'neutron:cidrs': '192.168.168.1/24', 'neutron:device_owner': 'network:ha_router_replicated_interface', 'neutron:network_name': 'neutron-7b1de520-d7b6-4140-b178-74d4122faed7', 'neutron:security_group_ids': '', 'neutron:revision_number': '2212'}, 'parent_name': [], 'tag': [], 'options': {'requested-chassis': 'controller-1.redhat.local', 'mcast_flood_reports': 'true'}, 'enabled': True, 'port_security': [], 'dhcpv4_options': [], 'dhcpv6_options': [], 'type': '', 'addresses': ['unknown'], 'ha_chassis_group': []}, if_exists=False) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
The culprit might be the device owner of the router port: network:ha_router_replicated_interface. I'm wondering if there was any change to the tests to use L3 HA router.
This has no dev ack or pm ack and the email for the blocker porcess has not been sent. please review and if it is a blocker, follow the process to get it approved. I installed previous puddle and I don't see the behavior described in comment 2. This is a regression in between the puddles. I think the culprit might be this patch - https://code.engineering.redhat.com/gerrit/c/networking-ovn/+/253716/2/networking_ovn/ml2/db_migration.py lines 58-60 When I used older environment and changed the vif_details to what OVN uses, the issue re-appeared. Still investigating. Disregard the previous comment. I found out that this happens when the router port is in DOWN state and db-sync/maintenance task fixes it. This also happens on previous composes and it's not a regression. (In reply to Jakub Libosvar from comment #7) > This also happens on previous composes and it's not a regression. Hi Kuba , Did you success to reproduce the issue on this puddle : RHOS-16.1-RHEL-8-20210727.n.1 ? can you share what is the puddle id of "previous composes" that you mentioned? (In reply to Eran Kuris from comment #8) > (In reply to Jakub Libosvar from comment #7) > > This also happens on previous composes and it's not a regression. > > Hi Kuba , > > Did you success to reproduce the issue on this puddle : > RHOS-16.1-RHEL-8-20210727.n.1 ? > > can you share what is the puddle id of "previous composes" that you > mentioned? I reproduced on RHOS-16.1-RHEL-8-20210604.n.0 and it's been reported for puddle RHOS-16.1-RHEL-8-20210727.n.1 Was tested on RHOS-16.1-RHEL-8-20210928.n.1 OVN migration workload VMs remain accessible after deleting neutron resources. Tested nodvr2dvr, nodvr2nodvr and dvr2dvr ovs2ovn migration scenarios. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.7 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3762 |