Created attachment 1712943 [details] Kuryr controller logs Description of problem: kuryr-controller pod remains in crashloop after running tempest and NP tests on 4.5 UPI deployment in OSP 13. The namespaces created during tempest and NP tests cannot be deleted due to error removing the ports: ERROR kuryr_kubernetes.controller.drivers.vif_pool [-] Error removing the port 01a994ca-5286-45c1-b6f6-ba6cb663837a: openstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://10.46.22.24:13696/v2.0/ports/01a994ca-5286-45c1-b6f6-ba6cb663837a, Port 01a994ca-5286-45c1-b6f6-ba6cb663837a is currently a subport for trunk 1fb6802c-69d5-469a-95ee-9b157b0d608d. The ports are in active status. It happens with the trunks in all the worker nodes. Restarting the controller doesn't recover it and the controller will keep crashlooping. $ oc -n openshift-kuryr get pods NAME READY STATUS RESTARTS AGE kuryr-cni-2hn5w 1/1 Running 0 18h kuryr-cni-2zm85 1/1 Running 0 18h kuryr-cni-5jgtv 1/1 Running 1 18h kuryr-cni-9dr4x 1/1 Running 0 18h kuryr-cni-g9hq9 1/1 Running 0 18h kuryr-cni-k4rvv 1/1 Running 0 18h kuryr-controller-857bb8dc46-ps4xs 1/1 Running 106 18h kuryr-dns-admission-controller-48hpl 1/1 Running 0 18h kuryr-dns-admission-controller-9hdrb 1/1 Running 0 18h kuryr-dns-admission-controller-b7qkb 1/1 Running 0 18h $ oc get ns NAME STATUS AGE default Active 20h kube-node-lease Active 20h kube-public Active 20h kube-system Active 20h kuryr-namespace-2107688107 Terminating 17h network-policy-1136 Terminating 16h network-policy-1217 Terminating 15h network-policy-1649 Terminating 15h network-policy-1678 Terminating 15h network-policy-2176 Terminating 16h network-policy-2578 Terminating 16h network-policy-3199 Terminating 16h network-policy-3312 Terminating 15h network-policy-3340 Terminating 16h network-policy-5163 Terminating 15h network-policy-7220 Terminating 16h network-policy-7736 Terminating 16h network-policy-8173 Terminating 15h network-policy-8267 Terminating 16h network-policy-8403 Terminating 16h network-policy-8568 Terminating 16h network-policy-9343 Terminating 16h network-policy-9624 Terminating 16h network-policy-b-2382 Terminating 15h network-policy-b-2597 Terminating 16h network-policy-b-4786 Terminating 15h network-policy-b-512 Terminating 16h network-policy-b-5566 Terminating 16h network-policy-b-8452 Terminating 16h network-policy-c-6442 Terminating 16h openshift Active 19h openshift-apiserver Active 19h Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-08-27-110054 OSP 13 2020-08-05.1 How reproducible: don't have enough data Steps to Reproduce: 1. Install 4.5 UPI on OSP 13 with Kuryr 2. Run tempest and NP tests Actual results: kuryr-controller in crashloop and namespaces in Terminating status Expected results: no crashloops and successful namespace removals Additional info: $ openstack network trunk list +--------------------------------------+-----------------------------+--------------------------------------+-------------+ | ID | Name | Parent Port | Description | +--------------------------------------+-----------------------------+--------------------------------------+-------------+ | 1fb6802c-69d5-469a-95ee-9b157b0d608d | ostest-6tf5m-worker-trunk-1 | 691f1761-3599-4c1e-86aa-e008aafce806 | | | 6122a1f3-a7ee-4cde-93a4-8ee5cef478dc | ostest-6tf5m-worker-trunk-2 | 9ca45ccb-a009-4aa6-b702-d4648e604a01 | | | 7a5eb902-73ec-415f-bcd0-d193d1fc0521 | ostest-6tf5m-master-trunk-0 | d5fe7a41-ef2f-4dea-a4b3-e95745c0bb44 | | | 9ac571e6-ce6d-4f50-b313-bcab8f0e6c00 | ostest-6tf5m-worker-trunk-0 | f5f614cb-098c-40e9-9ca0-eb37b00b5e15 | | | b563e48c-5a66-43ff-87d5-96fa661201f0 | ostest-6tf5m-master-trunk-1 | f604d46d-c0a0-4546-9bb9-29c91c00aa10 | | | dc2eaf3f-98e8-4bb8-b618-c9275e402a81 | ostest-6tf5m-master-trunk-2 | 786eef48-7dab-41ac-8d1f-85345758fe98 | | +--------------------------------------+-----------------------------+--------------------------------------+-------------+
OCP master ports are tagged as: openshiftClusterID=ostest-6tf5m while OCP worker ports are tagged as: [openshiftClusterID=ostest-6tf5m]
Verified in: 4.6.0-0.nightly-2020-09-10-100526 OSP 13 2020-09-03.2 The installer works and the workers ports are correctly tagged now: | tags | openshiftClusterID=ostest-4w85r so namespaces are correctly deleted.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196