Bug 1874009 - [UPI Kuryr] Wrong tagging of compute node ports breaks kuryr ports pool functionality
Summary: [UPI Kuryr] Wrong tagging of compute node ports breaks kuryr ports pool funct...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.z
Assignee: Luis Tomas Bolivar
QA Contact: GenadiC
URL:
Whiteboard:
Depends On: 1873448
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-31 09:56 UTC by OpenShift BugZilla Robot
Modified: 2020-09-30 14:06 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-30 14:06:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4120 0 None closed [release-4.5] Bug 1874009: Ensure proper tagging of compute nodes ports 2020-09-21 06:57:11 UTC
Red Hat Product Errata RHBA-2020:3760 0 None None None 2020-09-30 14:06:40 UTC

Description OpenShift BugZilla Robot 2020-08-31 09:56:12 UTC
+++ This bug was initially created as a clone of Bug #1873448 +++

Created attachment 1712943 [details]
Kuryr controller logs

Description of problem:

kuryr-controller pod remains in crashloop after running tempest and NP tests on 4.5 UPI deployment in OSP 13.
The namespaces created during tempest and NP tests cannot be deleted due to error removing the ports:

ERROR kuryr_kubernetes.controller.drivers.vif_pool [-] Error removing the port 01a994ca-5286-45c1-b6f6-ba6cb663837a: openstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://10.46.22.24:13696/v2.0/ports/01a994ca-5286-45c1-b6f6-ba6cb663837a, Port 01a994ca-5286-45c1-b6f6-ba6cb663837a is currently a subport for trunk 1fb6802c-69d5-469a-95ee-9b157b0d608d.

The ports are in active status. It happens with the trunks in all the worker nodes.
Restarting the controller doesn't recover it and the controller will keep crashlooping.

$ oc -n openshift-kuryr get pods
NAME                                   READY   STATUS    RESTARTS   AGE
kuryr-cni-2hn5w                        1/1     Running   0          18h
kuryr-cni-2zm85                        1/1     Running   0          18h
kuryr-cni-5jgtv                        1/1     Running   1          18h
kuryr-cni-9dr4x                        1/1     Running   0          18h
kuryr-cni-g9hq9                        1/1     Running   0          18h
kuryr-cni-k4rvv                        1/1     Running   0          18h
kuryr-controller-857bb8dc46-ps4xs      1/1     Running   106        18h
kuryr-dns-admission-controller-48hpl   1/1     Running   0          18h
kuryr-dns-admission-controller-9hdrb   1/1     Running   0          18h
kuryr-dns-admission-controller-b7qkb   1/1     Running   0          18h

$ oc get ns
NAME                                               STATUS        AGE
default                                            Active        20h
kube-node-lease                                    Active        20h
kube-public                                        Active        20h
kube-system                                        Active        20h
kuryr-namespace-2107688107                         Terminating   17h
network-policy-1136                                Terminating   16h
network-policy-1217                                Terminating   15h
network-policy-1649                                Terminating   15h
network-policy-1678                                Terminating   15h
network-policy-2176                                Terminating   16h
network-policy-2578                                Terminating   16h
network-policy-3199                                Terminating   16h
network-policy-3312                                Terminating   15h
network-policy-3340                                Terminating   16h
network-policy-5163                                Terminating   15h
network-policy-7220                                Terminating   16h
network-policy-7736                                Terminating   16h
network-policy-8173                                Terminating   15h
network-policy-8267                                Terminating   16h
network-policy-8403                                Terminating   16h
network-policy-8568                                Terminating   16h
network-policy-9343                                Terminating   16h
network-policy-9624                                Terminating   16h
network-policy-b-2382                              Terminating   15h
network-policy-b-2597                              Terminating   16h
network-policy-b-4786                              Terminating   15h
network-policy-b-512                               Terminating   16h
network-policy-b-5566                              Terminating   16h
network-policy-b-8452                              Terminating   16h
network-policy-c-6442                              Terminating   16h
openshift                                          Active        19h
openshift-apiserver                                Active        19h


Version-Release number of selected component (if applicable):

4.5.0-0.nightly-2020-08-27-110054
OSP 13 2020-08-05.1


How reproducible: don't have enough data


Steps to Reproduce:
1. Install 4.5 UPI on OSP 13 with Kuryr
2. Run tempest and NP tests

Actual results: kuryr-controller in crashloop and namespaces in Terminating status


Expected results: no crashloops and successful namespace removals


Additional info:

$ openstack network trunk list
+--------------------------------------+-----------------------------+--------------------------------------+-------------+
| ID                                   | Name                        | Parent Port                          | Description |
+--------------------------------------+-----------------------------+--------------------------------------+-------------+
| 1fb6802c-69d5-469a-95ee-9b157b0d608d | ostest-6tf5m-worker-trunk-1 | 691f1761-3599-4c1e-86aa-e008aafce806 |             |
| 6122a1f3-a7ee-4cde-93a4-8ee5cef478dc | ostest-6tf5m-worker-trunk-2 | 9ca45ccb-a009-4aa6-b702-d4648e604a01 |             |
| 7a5eb902-73ec-415f-bcd0-d193d1fc0521 | ostest-6tf5m-master-trunk-0 | d5fe7a41-ef2f-4dea-a4b3-e95745c0bb44 |             |
| 9ac571e6-ce6d-4f50-b313-bcab8f0e6c00 | ostest-6tf5m-worker-trunk-0 | f5f614cb-098c-40e9-9ca0-eb37b00b5e15 |             |
| b563e48c-5a66-43ff-87d5-96fa661201f0 | ostest-6tf5m-master-trunk-1 | f604d46d-c0a0-4546-9bb9-29c91c00aa10 |             |
| dc2eaf3f-98e8-4bb8-b618-c9275e402a81 | ostest-6tf5m-master-trunk-2 | 786eef48-7dab-41ac-8d1f-85345758fe98 |             |
+--------------------------------------+-----------------------------+--------------------------------------+-------------+

--- Additional comment from juriarte on 2020-08-31 07:25:05 UTC ---

OCP master ports are tagged as: openshiftClusterID=ostest-6tf5m while OCP worker ports are tagged as: [openshiftClusterID=ostest-6tf5m]

Comment 2 Jon Uriarte 2020-09-21 07:03:44 UTC
Verified in:
4.5.0-0.nightly-2020-09-18-174934
OSP 13 2020-09-16.1

The installer works and the workers ports are correctly tagged now:

| tags                    | openshiftClusterID=ostest-dgn5c

so namespaces are correctly deleted.

Comment 7 errata-xmlrpc 2020-09-30 14:06:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.13 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3760


Note You need to log in before you can comment on or make changes to this bug.