Bug 1874840

Summary: [Kuryr] Cannot terminate namespaces due to error removing ports
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: Luis Tomas Bolivar <ltomasbo>
Networking sub component: kuryr QA Contact: GenadiC <gcheresh>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: juriarte
Version: 4.5   
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-21 17:42:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1873449    
Bug Blocks:    

Description OpenShift BugZilla Robot 2020-09-02 11:54:55 UTC
+++ This bug was initially created as a clone of Bug #1873449 +++

Created attachment 1712944 [details]
Kuryr controller logs

Description of problem:

kuryr-controller pod remains in crashloop after running tempest and NP tests on 4.5 UPI deployment in OSP 13.
The namespaces created during tempest and NP tests cannot be deleted due to error removing the ports:

ERROR kuryr_kubernetes.controller.drivers.vif_pool [-] Error removing the port 01a994ca-5286-45c1-b6f6-ba6cb663837a: openstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://10.46.22.24:13696/v2.0/ports/01a994ca-5286-45c1-b6f6-ba6cb663837a, Port 01a994ca-5286-45c1-b6f6-ba6cb663837a is currently a subport for trunk 1fb6802c-69d5-469a-95ee-9b157b0d608d.

The ports are in active status. It happens with the trunks in all the worker nodes.

$ oc -n openshift-kuryr get pods
NAME                                   READY   STATUS    RESTARTS   AGE
kuryr-cni-2hn5w                        1/1     Running   0          18h
kuryr-cni-2zm85                        1/1     Running   0          18h
kuryr-cni-5jgtv                        1/1     Running   1          18h
kuryr-cni-9dr4x                        1/1     Running   0          18h
kuryr-cni-g9hq9                        1/1     Running   0          18h
kuryr-cni-k4rvv                        1/1     Running   0          18h
kuryr-controller-857bb8dc46-ps4xs      1/1     Running   106        18h
kuryr-dns-admission-controller-48hpl   1/1     Running   0          18h
kuryr-dns-admission-controller-9hdrb   1/1     Running   0          18h
kuryr-dns-admission-controller-b7qkb   1/1     Running   0          18h

$ oc get ns
NAME                                               STATUS        AGE
default                                            Active        20h
kube-node-lease                                    Active        20h
kube-public                                        Active        20h
kube-system                                        Active        20h
kuryr-namespace-2107688107                         Terminating   17h
network-policy-1136                                Terminating   16h
network-policy-1217                                Terminating   15h
network-policy-1649                                Terminating   15h
network-policy-1678                                Terminating   15h
network-policy-2176                                Terminating   16h
network-policy-2578                                Terminating   16h
network-policy-3199                                Terminating   16h
network-policy-3312                                Terminating   15h
network-policy-3340                                Terminating   16h
network-policy-5163                                Terminating   15h
network-policy-7220                                Terminating   16h
network-policy-7736                                Terminating   16h
network-policy-8173                                Terminating   15h
network-policy-8267                                Terminating   16h
network-policy-8403                                Terminating   16h
network-policy-8568                                Terminating   16h
network-policy-9343                                Terminating   16h
network-policy-9624                                Terminating   16h
network-policy-b-2382                              Terminating   15h
network-policy-b-2597                              Terminating   16h
network-policy-b-4786                              Terminating   15h
network-policy-b-512                               Terminating   16h
network-policy-b-5566                              Terminating   16h
network-policy-b-8452                              Terminating   16h
network-policy-c-6442                              Terminating   16h
openshift                                          Active        19h
openshift-apiserver                                Active        19h


Version-Release number of selected component (if applicable):

4.5.0-0.nightly-2020-08-27-110054
OSP 13 2020-08-05.1


How reproducible: don't have enough data


Steps to Reproduce:
1. Install 4.5 UPI on OSP 13 with Kuryr
2. Run tempest and NP tests

Actual results: kuryr-controller in crashloop and namespaces in Terminating status


Expected results: no crashloops and successful namespace removals


Additional info:

$ openstack network trunk list
+--------------------------------------+-----------------------------+--------------------------------------+-------------+
| ID                                   | Name                        | Parent Port                          | Description |
+--------------------------------------+-----------------------------+--------------------------------------+-------------+
| 1fb6802c-69d5-469a-95ee-9b157b0d608d | ostest-6tf5m-worker-trunk-1 | 691f1761-3599-4c1e-86aa-e008aafce806 |             |
| 6122a1f3-a7ee-4cde-93a4-8ee5cef478dc | ostest-6tf5m-worker-trunk-2 | 9ca45ccb-a009-4aa6-b702-d4648e604a01 |             |
| 7a5eb902-73ec-415f-bcd0-d193d1fc0521 | ostest-6tf5m-master-trunk-0 | d5fe7a41-ef2f-4dea-a4b3-e95745c0bb44 |             |
| 9ac571e6-ce6d-4f50-b313-bcab8f0e6c00 | ostest-6tf5m-worker-trunk-0 | f5f614cb-098c-40e9-9ca0-eb37b00b5e15 |             |
| b563e48c-5a66-43ff-87d5-96fa661201f0 | ostest-6tf5m-master-trunk-1 | f604d46d-c0a0-4546-9bb9-29c91c00aa10 |             |
| dc2eaf3f-98e8-4bb8-b618-c9275e402a81 | ostest-6tf5m-master-trunk-2 | 786eef48-7dab-41ac-8d1f-85345758fe98 |             |
+--------------------------------------+-----------------------------+--------------------------------------+-------------+

--- Additional comment from ltomasbo on 2020-08-28 14:25:16 UTC ---

Error on the kuryr controller looks like:
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool [-] Error removing the port fe7832c7-954e-454b-91c1-bc7a5fc57458: openstack.exceptions.ConflictException
: ConflictException: 409: Client Error for url: https://10.46.22.24:13696/v2.0/ports/fe7832c7-954e-454b-91c1-bc7a5fc57458, Port fe7832c7-954e-454b-91c1-bc7a5fc57458 is currently a s
ubport for trunk 1fb6802c-69d5-469a-95ee-9b157b0d608d.
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool Traceback (most recent call last):
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool   File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 89
8, in _precreated_ports
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool     os_net.delete_port(port_id)
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool   File "/usr/local/lib/python3.6/site-packages/openstack/network/v2/_proxy.py", line 1749, in delete_por
t
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool     if_revision=if_revision)
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool   File "/usr/local/lib/python3.6/site-packages/openstack/proxy.py", line 46, in check
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool     return method(self, expected, actual, *args, **kwargs)
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool   File "/usr/local/lib/python3.6/site-packages/openstack/network/v2/_proxy.py", line 75, in _delete
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool     rv = res.delete(self, if_revision=if_revision)
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool   File "/usr/local/lib/python3.6/site-packages/openstack/resource.py", line 1622, in delete
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool     self._translate_response(response, has_body=False, **kwargs)
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool   File "/usr/local/lib/python3.6/site-packages/openstack/resource.py", line 1113, in _translate_response
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool     exceptions.raise_from_response(response, error_message=error_message)
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool   File "/usr/local/lib/python3.6/site-packages/openstack/exceptions.py", line 235, in raise_from_respons
e
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool     http_status=http_status, request_id=request_id
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool openstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://10.46.22.2
4:13696/v2.0/ports/fe7832c7-954e-454b-91c1-bc7a5fc57458, Port fe7832c7-954e-454b-91c1-bc7a5fc57458 is currently a subport for trunk 1fb6802c-69d5-469a-95ee-9b157b0d608d.
2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool

--- Additional comment from ltomasbo on 2020-08-31 06:35:54 UTC ---

Problem is due to wrong tagging of worker node parent ports. This patch https://review.opendev.org/#/c/748670/ will help as it will ensure namespace are deleted anyway, but it won't solve the problem of wrong tagging which is the culprit and that breaks the proper ports pool functionality by not being able to re-discover the existing created ports

Comment 3 Jon Uriarte 2020-09-15 10:36:55 UTC
Verified in:
4.5.0-0.nightly-2020-09-14-124053
OSP 13 2020-09-03.2

Successful UPI installation.

Kuryr pods after running tempest, NP and conformance tests:

$ oc -n openshift-kuryr get pods
NAME                                   READY   STATUS    RESTARTS   AGE
kuryr-cni-6dmm2                        1/1     Running   0          12h
kuryr-cni-7s549                        1/1     Running   2          12h
kuryr-cni-bmj7h                        1/1     Running   0          12h
kuryr-cni-flv7m                        1/1     Running   0          12h
kuryr-cni-kdrrm                        1/1     Running   3          12h
kuryr-cni-mfcxt                        1/1     Running   2          12h
kuryr-controller-7f8675b9b8-7hq4g      1/1     Running   1          12h
kuryr-dns-admission-controller-6t8v8   1/1     Running   0          12h
kuryr-dns-admission-controller-bw4b7   1/1     Running   0          12h
kuryr-dns-admission-controller-vrlhp   1/1     Running   0          12h

The controller is not crashlooping and there are not namespaces in Terminating status.

Tests results:

Tempest:
Pass 26 Skip 13

NP:
22 Passed | 1 Failed:
[Fail] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [BeforeEach] should stop enforcing policies after they are deleted [Feature:NetworkPolicy-21] 
/home/cloud-user/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:1427

Conformance:
13 fail, 261 pass, 0 skip (56m48s)

Comment 5 errata-xmlrpc 2020-09-21 17:42:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3719