Bug 1884655
Summary: | KeyError on self._existing_vifs[port_id] | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | rlobillo | ||||||
Component: | Networking | Assignee: | Luis Tomas Bolivar <ltomasbo> | ||||||
Networking sub component: | kuryr | QA Contact: | GenadiC <gcheresh> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | high | CC: | bbennett, mdemaced | ||||||
Version: | 4.6 | ||||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.7.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2021-02-24 15:22:23 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1886696 | ||||||||
Attachments: |
|
Verified on 4.7.0-0.nightly-2020-10-22-044506 over OSP16.1 with OVN-octavia (RHOS-16.1-RHEL-8-20201007.n.0). Conformance tests run without kuryr-controller restarts and with expected results (Attached): $ cat conformance_results/kuryr_pods.txt # Kuryr pods before running conformance tests - ANSIBLE MANAGED BLOCK NAME READY STATUS RESTARTS AGE kuryr-cni-chrzk 1/1 Running 0 143m kuryr-cni-drhbk 1/1 Running 0 143m kuryr-cni-pclf8 1/1 Running 0 140m kuryr-cni-w5tf9 1/1 Running 0 142m kuryr-cni-zfgzt 1/1 Running 0 140m kuryr-cni-zl6wv 1/1 Running 0 141m kuryr-controller-74584966c5-689jm 1/1 Running 1 118m # END ANSIBLE MANAGED BLOCK # Kuryr pods after running conformance tests - ANSIBLE MANAGED BLOCK NAME READY STATUS RESTARTS AGE kuryr-cni-chrzk 1/1 Running 0 3h18m kuryr-cni-drhbk 1/1 Running 2 3h19m kuryr-cni-pclf8 1/1 Running 0 3h15m kuryr-cni-w5tf9 1/1 Running 2 3h17m kuryr-cni-zfgzt 1/1 Running 2 3h16m kuryr-cni-zl6wv 1/1 Running 0 3h17m kuryr-controller-74584966c5-689jm 1/1 Running 1 174m # END ANSIBLE MANAGED BLOCK Furthermore, downscaling workers is deleting properly the subports for the already deleted worker: (shiftstack) [stack@undercloud-0 ~]$ openstack port list --device-owner="" --tags="openshiftClusterID=ostest-rmrft" | grep -v ostest +--------------------------------------+---------------------------+-------------------+-------------------------------------------------------------------------------+--------+ | ID | Name | MAC Address | Fixed IP Addresses | Status | +--------------------------------------+---------------------------+-------------------+-------------------------------------------------------------------------------+--------+ | 035c6cef-5aa0-4104-a049-3c5b0ba73444 | | fa:16:3e:ab:15:43 | ip_address='10.128.106.136', subnet_id='c7dcf66b-31e5-4d81-b10b-9073c56710b7' | DOWN | | 07c8b907-5e1f-4d9c-b920-b0154a01ed6f | | fa:16:3e:9b:cb:cb | ip_address='10.128.93.30', subnet_id='0c38384b-15e0-4505-938a-56b5c8d91ce5' | DOWN | | 0c892901-b58b-4bcf-9232-1e403edff30e | | fa:16:3e:47:6f:8b | ip_address='10.128.92.79', subnet_id='0c38384b-15e0-4505-938a-56b5c8d91ce5' | DOWN | | 150d2f75-0125-4a6b-86de-81887d0c6d18 | | fa:16:3e:61:dc:a3 | ip_address='10.128.10.104', subnet_id='12a657ee-189a-4e56-822e-d507836e3830' | DOWN | | 1802138a-8d14-4882-98c4-f204a5ba0a92 | | fa:16:3e:78:f2:17 | ip_address='10.128.35.224', subnet_id='0e71d23e-689f-4289-b103-39c357dfa989' | DOWN | | 1f3850fd-73e9-4cf6-b065-ae6ec7df86fe | | fa:16:3e:9d:b8:48 | ip_address='10.128.8.188', subnet_id='d886942e-a82d-4629-974f-ececa12a6aa0' | DOWN | | 378f221f-88f9-4b84-943a-cf0344e32029 | | fa:16:3e:29:89:fa | ip_address='10.128.9.217', subnet_id='d886942e-a82d-4629-974f-ececa12a6aa0' | DOWN | | 39bb6b06-997a-48f0-919b-988ba25ca89e | | fa:16:3e:53:40:8a | ip_address='10.128.8.148', subnet_id='d886942e-a82d-4629-974f-ececa12a6aa0' | DOWN | | 3bad459b-e9ff-41ce-b6b4-60a499bbe33a | | fa:16:3e:8f:13:bf | ip_address='10.128.106.21', subnet_id='c7dcf66b-31e5-4d81-b10b-9073c56710b7' | DOWN | | 4b601c09-106b-4589-8045-8d90c8301274 | | fa:16:3e:0b:aa:3b | ip_address='10.128.10.68', subnet_id='12a657ee-189a-4e56-822e-d507836e3830' | DOWN | | 5e749451-ae31-438d-963c-8095e45c843d | | fa:16:3e:5d:02:fa | ip_address='10.128.34.36', subnet_id='0e71d23e-689f-4289-b103-39c357dfa989' | DOWN | | 6653e09b-2cf5-457a-9144-9cb6ed091aca | | fa:16:3e:53:4c:a2 | ip_address='10.128.66.105', subnet_id='fa481858-2961-430a-973a-df53c36c1517' | DOWN | | 7b6f020d-8284-4809-aa37-f46553b88c3c | | fa:16:3e:f7:40:15 | ip_address='10.128.35.160', subnet_id='0e71d23e-689f-4289-b103-39c357dfa989' | DOWN | | 85493a5f-4a1d-4023-920d-6ce617094a9c | | fa:16:3e:3a:2e:17 | ip_address='10.128.66.12', subnet_id='fa481858-2961-430a-973a-df53c36c1517' | DOWN | | 860aee23-9397-472d-ac12-433227532e58 | | fa:16:3e:f5:c2:d3 | ip_address='10.128.10.186', subnet_id='12a657ee-189a-4e56-822e-d507836e3830' | DOWN | | 91a36e95-2233-4bca-bca0-92676a350be4 | | fa:16:3e:21:57:78 | ip_address='10.128.8.6', subnet_id='d886942e-a82d-4629-974f-ececa12a6aa0' | DOWN | | 976cc808-e52e-4b98-a2c6-94743a51823f | | fa:16:3e:d8:55:51 | ip_address='10.128.107.229', subnet_id='c7dcf66b-31e5-4d81-b10b-9073c56710b7' | DOWN | | 98e80a49-83df-409c-b384-2b8e807528b2 | | fa:16:3e:02:ff:25 | ip_address='10.128.92.121', subnet_id='0c38384b-15e0-4505-938a-56b5c8d91ce5' | DOWN | | 9ea6f35c-4efa-44fe-9dec-ad6882401273 | | fa:16:3e:6a:62:09 | ip_address='10.128.66.115', subnet_id='fa481858-2961-430a-973a-df53c36c1517' | DOWN | | b144491c-1717-4fd8-b0db-6711a4753519 | | fa:16:3e:02:50:29 | ip_address='10.128.31.186', subnet_id='f78f8a86-295b-470a-ac80-9388a2c3bfdb' | DOWN | | b2c8ba93-8550-4b5a-973c-5275af1f8eb3 | | fa:16:3e:96:e7:ff | ip_address='10.128.66.107', subnet_id='fa481858-2961-430a-973a-df53c36c1517' | DOWN | | b2e4ccbf-bc71-4d19-9025-293019683175 | | fa:16:3e:4e:f6:c9 | ip_address='10.128.92.217', subnet_id='0c38384b-15e0-4505-938a-56b5c8d91ce5' | DOWN | | b3b4aafb-3129-46c9-8b66-bddd3dc8e3e3 | | fa:16:3e:77:6f:ce | ip_address='10.128.67.139', subnet_id='fa481858-2961-430a-973a-df53c36c1517' | DOWN | | bb8b5b25-334d-4947-bde4-da1744797251 | | fa:16:3e:52:a5:ff | ip_address='10.128.8.240', subnet_id='d886942e-a82d-4629-974f-ececa12a6aa0' | DOWN | | bcc7861d-4e05-41a2-b224-36e03b8d4239 | | fa:16:3e:8c:eb:fc | ip_address='10.128.66.248', subnet_id='fa481858-2961-430a-973a-df53c36c1517' | DOWN | | cb873486-ff6b-4f3a-bf2b-60b50df16fb8 | | fa:16:3e:f1:86:56 | ip_address='10.128.92.195', subnet_id='0c38384b-15e0-4505-938a-56b5c8d91ce5' | DOWN | | d3d75cd4-e816-45d3-9bbe-14ac0f80cf5b | | fa:16:3e:93:32:0c | ip_address='10.128.31.32', subnet_id='f78f8a86-295b-470a-ac80-9388a2c3bfdb' | DOWN | | dbcdaeb4-b6fc-473d-b0f6-7a6a21e3236f | | fa:16:3e:cc:31:eb | ip_address='10.128.31.152', subnet_id='f78f8a86-295b-470a-ac80-9388a2c3bfdb' | DOWN | | e697f816-a90a-44fa-89a7-e5db90abb77e | | fa:16:3e:1e:25:27 | ip_address='10.128.8.11', subnet_id='d886942e-a82d-4629-974f-ececa12a6aa0' | DOWN | | fa3f335a-fee4-4911-bed9-427c44567a2a | | fa:16:3e:0f:09:d6 | ip_address='10.128.93.67', subnet_id='0c38384b-15e0-4505-938a-56b5c8d91ce5' | DOWN | +--------------------------------------+---------------------------+-------------------+-------------------------------------------------------------------------------+--------+ >> Wait 20 minutes. (shiftstack) [stack@undercloud-0 ~]$ openstack port list --device-owner="" --tags="openshiftClusterID=ostest-rmrft" | grep -v ostest +--------------------------------------+---------------------------+-------------------+---------------------------------------------------------------------------+--------+ | ID | Name | MAC Address | Fixed IP Addresses | Status | +--------------------------------------+---------------------------+-------------------+---------------------------------------------------------------------------+--------+ +--------------------------------------+---------------------------+-------------------+---------------------------------------------------------------------------+--------+ (shiftstack) [stack@undercloud-0 ~]$ >> Keeping the stability of kuryr pods: (shiftstack) [stack@undercloud-0 ~]$ oc get pods -n openshift-kuryr NAME READY STATUS RESTARTS AGE kuryr-cni-chrzk 1/1 Running 0 19h kuryr-cni-pclf8 1/1 Running 0 19h kuryr-cni-w5tf9 1/1 Running 2 19h kuryr-cni-zfgzt 1/1 Running 2 19h kuryr-cni-zl6wv 1/1 Running 0 19h kuryr-controller-74584966c5-689jm 1/1 Running 1 19h Created attachment 1723754 [details]
NP+Conformance results with the fix
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |
Created attachment 1718465 [details] kuryr-controller logs Description of problem: While running conformance tests, Kuryr-controller got restarted with below exception: 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry [-] Report handler unhealthy KuryrPortHandler: KeyError: 'd7cc9b11-13e8-481b-b494-6be3eb0ed57a' 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry Traceback (most recent call last): 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/retry.py", line 81, in __call__ 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry self._handler(event, *args, **kwargs) 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 90, in __call__ 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry self.on_present(obj) 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/kuryrport.py", line 66, in on_present 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry if not self.get_vifs(kuryrport_crd): 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/handlers/kuryrport.py", line 231, in get_vifs 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry pod, project_id, subnets, security_groups) 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 1181, in request_vif 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry pod, project_id, subnets, security_groups) 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 210, in request_vif 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry tuple(sorted(security_groups))) 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry File "/usr/lib/python3.6/site-packages/kuryr_kubernetes/controller/drivers/vif_pool.py", line 867, in _get_port_from_pool 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry return self._existing_vifs[port_id] 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry KeyError: 'd7cc9b11-13e8-481b-b494-6be3eb0ed57a' 2020-10-02 12:16:16.302 1 ERROR kuryr_kubernetes.handlers.retry [00m Version-Release number of selected component (if applicable): $ cat core_puddle_version RHOS-16.1-RHEL-8-20200917.n.3 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-10-02-001427 True False 6h31m Cluster version is 4.6.0-0.nightly-2020-10-02-001427 How reproducible: Steps to Reproduce: 1. Install latest OCP on OSP16.1+OVN-Octavia 2. Run Conformance tests 3. Actual results: kuryr-controller is restarted. Expected results: No restarts. Additional info: kuryr-controller logs attached