+++ This bug was initially created as a clone of Bug #1873449 +++ Created attachment 1712944 [details] Kuryr controller logs Description of problem: kuryr-controller pod remains in crashloop after running tempest and NP tests on 4.5 UPI deployment in OSP 13. The namespaces created during tempest and NP tests cannot be deleted due to error removing the ports: ERROR kuryr_kubernetes.controller.drivers.vif_pool [-] Error removing the port 01a994ca-5286-45c1-b6f6-ba6cb663837a: openstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://10.46.22.24:13696/v2.0/ports/01a994ca-5286-45c1-b6f6-ba6cb663837a, Port 01a994ca-5286-45c1-b6f6-ba6cb663837a is currently a subport for trunk 1fb6802c-69d5-469a-95ee-9b157b0d608d. The ports are in active status. It happens with the trunks in all the worker nodes. $ oc -n openshift-kuryr get pods NAME READY STATUS RESTARTS AGE kuryr-cni-2hn5w 1/1 Running 0 18h kuryr-cni-2zm85 1/1 Running 0 18h kuryr-cni-5jgtv 1/1 Running 1 18h kuryr-cni-9dr4x 1/1 Running 0 18h kuryr-cni-g9hq9 1/1 Running 0 18h kuryr-cni-k4rvv 1/1 Running 0 18h kuryr-controller-857bb8dc46-ps4xs 1/1 Running 106 18h kuryr-dns-admission-controller-48hpl 1/1 Running 0 18h kuryr-dns-admission-controller-9hdrb 1/1 Running 0 18h kuryr-dns-admission-controller-b7qkb 1/1 Running 0 18h $ oc get ns NAME STATUS AGE default Active 20h kube-node-lease Active 20h kube-public Active 20h kube-system Active 20h kuryr-namespace-2107688107 Terminating 17h network-policy-1136 Terminating 16h network-policy-1217 Terminating 15h network-policy-1649 Terminating 15h network-policy-1678 Terminating 15h network-policy-2176 Terminating 16h network-policy-2578 Terminating 16h network-policy-3199 Terminating 16h network-policy-3312 Terminating 15h network-policy-3340 Terminating 16h network-policy-5163 Terminating 15h network-policy-7220 Terminating 16h network-policy-7736 Terminating 16h network-policy-8173 Terminating 15h network-policy-8267 Terminating 16h network-policy-8403 Terminating 16h network-policy-8568 Terminating 16h network-policy-9343 Terminating 16h network-policy-9624 Terminating 16h network-policy-b-2382 Terminating 15h network-policy-b-2597 Terminating 16h network-policy-b-4786 Terminating 15h network-policy-b-512 Terminating 16h network-policy-b-5566 Terminating 16h network-policy-b-8452 Terminating 16h network-policy-c-6442 Terminating 16h openshift Active 19h openshift-apiserver Active 19h Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-08-27-110054 OSP 13 2020-08-05.1 How reproducible: don't have enough data Steps to Reproduce: 1. Install 4.5 UPI on OSP 13 with Kuryr 2. Run tempest and NP tests Actual results: kuryr-controller in crashloop and namespaces in Terminating status Expected results: no crashloops and successful namespace removals Additional info: $ openstack network trunk list +--------------------------------------+-----------------------------+--------------------------------------+-------------+ | ID | Name | Parent Port | Description | +--------------------------------------+-----------------------------+--------------------------------------+-------------+ | 1fb6802c-69d5-469a-95ee-9b157b0d608d | ostest-6tf5m-worker-trunk-1 | 691f1761-3599-4c1e-86aa-e008aafce806 | | | 6122a1f3-a7ee-4cde-93a4-8ee5cef478dc | ostest-6tf5m-worker-trunk-2 | 9ca45ccb-a009-4aa6-b702-d4648e604a01 | | | 7a5eb902-73ec-415f-bcd0-d193d1fc0521 | ostest-6tf5m-master-trunk-0 | d5fe7a41-ef2f-4dea-a4b3-e95745c0bb44 | | | 9ac571e6-ce6d-4f50-b313-bcab8f0e6c00 | ostest-6tf5m-worker-trunk-0 | f5f614cb-098c-40e9-9ca0-eb37b00b5e15 | | | b563e48c-5a66-43ff-87d5-96fa661201f0 | ostest-6tf5m-master-trunk-1 | f604d46d-c0a0-4546-9bb9-29c91c00aa10 | | | dc2eaf3f-98e8-4bb8-b618-c9275e402a81 | ostest-6tf5m-master-trunk-2 | 786eef48-7dab-41ac-8d1f-85345758fe98 | | +--------------------------------------+-----------------------------+--------------------------------------+-------------+ --- Additional comment from ltomasbo on 2020-08-28 14:25:16 UTC --- Error on the kuryr controller looks like: 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool [-] Error removing the port fe7832c7-954e-454b-91c1-bc7a5fc57458: openstack.exceptions.ConflictException : ConflictException: 409: Client Error for url: https://10.46.22.24:13696/v2.0/ports/fe7832c7-954e-454b-91c1-bc7a5fc57458, Port fe7832c7-954e-454b-91c1-bc7a5fc57458 is currently a s ubport for trunk 1fb6802c-69d5-469a-95ee-9b157b0d608d. 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool Traceback (most recent call last): 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool File "/usr/local/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 89 8, in _precreated_ports 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool os_net.delete_port(port_id) 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool File "/usr/local/lib/python3.6/site-packages/openstack/network/v2/_proxy.py", line 1749, in delete_por t 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool if_revision=if_revision) 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool File "/usr/local/lib/python3.6/site-packages/openstack/proxy.py", line 46, in check 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool return method(self, expected, actual, *args, **kwargs) 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool File "/usr/local/lib/python3.6/site-packages/openstack/network/v2/_proxy.py", line 75, in _delete 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool rv = res.delete(self, if_revision=if_revision) 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool File "/usr/local/lib/python3.6/site-packages/openstack/resource.py", line 1622, in delete 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool self._translate_response(response, has_body=False, **kwargs) 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool File "/usr/local/lib/python3.6/site-packages/openstack/resource.py", line 1113, in _translate_response 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool exceptions.raise_from_response(response, error_message=error_message) 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool File "/usr/local/lib/python3.6/site-packages/openstack/exceptions.py", line 235, in raise_from_respons e 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool http_status=http_status, request_id=request_id 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool openstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://10.46.22.2 4:13696/v2.0/ports/fe7832c7-954e-454b-91c1-bc7a5fc57458, Port fe7832c7-954e-454b-91c1-bc7a5fc57458 is currently a subport for trunk 1fb6802c-69d5-469a-95ee-9b157b0d608d. 2020-08-28 14:19:55.415 1 ERROR kuryr_kubernetes.controller.drivers.vif_pool --- Additional comment from ltomasbo on 2020-08-31 06:35:54 UTC --- Problem is due to wrong tagging of worker node parent ports. This patch https://review.opendev.org/#/c/748670/ will help as it will ensure namespace are deleted anyway, but it won't solve the problem of wrong tagging which is the culprit and that breaks the proper ports pool functionality by not being able to re-discover the existing created ports
Verified in: 4.5.0-0.nightly-2020-09-14-124053 OSP 13 2020-09-03.2 Successful UPI installation. Kuryr pods after running tempest, NP and conformance tests: $ oc -n openshift-kuryr get pods NAME READY STATUS RESTARTS AGE kuryr-cni-6dmm2 1/1 Running 0 12h kuryr-cni-7s549 1/1 Running 2 12h kuryr-cni-bmj7h 1/1 Running 0 12h kuryr-cni-flv7m 1/1 Running 0 12h kuryr-cni-kdrrm 1/1 Running 3 12h kuryr-cni-mfcxt 1/1 Running 2 12h kuryr-controller-7f8675b9b8-7hq4g 1/1 Running 1 12h kuryr-dns-admission-controller-6t8v8 1/1 Running 0 12h kuryr-dns-admission-controller-bw4b7 1/1 Running 0 12h kuryr-dns-admission-controller-vrlhp 1/1 Running 0 12h The controller is not crashlooping and there are not namespaces in Terminating status. Tests results: Tempest: Pass 26 Skip 13 NP: 22 Passed | 1 Failed: [Fail] [sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [BeforeEach] should stop enforcing policies after they are deleted [Feature:NetworkPolicy-21] /home/cloud-user/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/network/network_policy.go:1427 Conformance: 13 fail, 261 pass, 0 skip (56m48s)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.5.11 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3719