1759984 – Races between retries and deletion actions

Bug 1759984 - Races between retries and deletion actions

Summary: Races between retries and deletion actions

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.3.0
Assignee:	Luis Tomas Bolivar
QA Contact:	Jon Uriarte
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1759986
TreeView+	depends on / blocked

Reported:	2019-10-09 14:39 UTC by Luis Tomas Bolivar
Modified:	2020-05-13 21:27 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1759986 (view as bug list)
Environment:
Last Closed:	2020-05-13 21:27:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift kuryr-kubernetes pull 59	'None'	'closed'	'Bug 1759984: Fix race conditions between handling ResourceNotReady and deletions'	2019-11-20 11:29:31 UTC
Launchpad	1847441	None	None	None	2019-10-09 14:39:55 UTC
Launchpad	1847446	None	None	None	2019-10-09 14:39:55 UTC
Launchpad	1847453	None	None	None	2019-10-09 14:39:55 UTC
OpenStack gerrit	687504	'None'	'MERGED'	'Avoid race between activating vif and pod deletion'	2019-11-20 11:29:27 UTC
OpenStack gerrit	687514	'None'	'MERGED'	'Avoid race between pod creation retry and namespace deletion'	2019-11-20 11:29:28 UTC
OpenStack gerrit	687520	'None'	'MERGED'	'Avoid namespace deletion error if processing a duplicated event'	2019-11-20 11:29:29 UTC
OpenStack gerrit	688109	'None'	'MERGED'	'Avoid race between Retries and Deletion actions'	2019-11-20 11:29:30 UTC
Red Hat Product Errata	RHBA-2020:0062	None	None	None	2020-01-23 11:07:03 UTC

Description Luis Tomas Bolivar 2019-10-09 14:39:55 UTC

There are different races between retry actions (activation vif, getting vif for a pod, ...) and deletion actions. It may happen that some retry action gets postponed until the resource has already been deleted, leaving to kuryr-controller errors

Comment 2 Jon Uriarte 2019-10-18 16:08:22 UTC

Verified on OCP 4.3.0-0.nightly-2019-10-18-051534 build on top of OSP 13 2019-10-01.1 puddle.

release image: registry.svc.ci.openshift.org/ocp/release@sha256:2a8f99a817784b303bd76706e14b23cffd98fca1e96b672dfb0b534a79ec5a86

Before this BZ was fixed these errors were shown in kuryr-controller logs when running openshift-tests:

· ERROR kuryr_kubernetes.handlers.retry [-] Report handler unhealthy VIFHandler: PortNotFoundClient: Port d3b2d608-19cd-4ef4-b726-b98119ef0cae could not be found.
· ERROR kuryr_kubernetes.handlers.logging NotFound: Subnet 039d7edf-3942-40cc-af46-0ed867e2a18c could not be found.
· ERROR kuryr_kubernetes.handlers.logging self._drv_vif_pool.delete_network_pools(net_crd['spec']['netId'])
  ERROR kuryr_kubernetes.handlers.logging TypeError: 'NoneType' object has no attribute '__getitem__'

After executing openshift/origin e2e kubernetes/conformance tests none of those messages were found, and kuryr-controller pod was not restarted due to them.

Comment 4 errata-xmlrpc 2020-05-13 21:27:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062

Note You need to log in before you can comment on or make changes to this bug.