Bug 2074236

Summary:	TALO not keeping up with large scale SNO ZTP deployment case
Product:	OpenShift Container Platform	Reporter:	jun
Component:	Telco Edge	Assignee:	jun
Telco Edge sub component:	RAN	QA Contact:	yliu1
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	akrzos, imiller
Version:	4.10
Target Milestone:	---
Target Release:	4.11.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:
Clones:	2075134 (view as bug list)		Environment:
Last Closed:	2022-08-26 16:43:57 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2075134

Description jun 2022-04-11 20:12:23 UTC

Description of problem:
TALO couldn't keep up with the amount of reconcile events in large scale ZTP SNO tests. As the number of installed SNOs increases, it takes longer and longer for TALO to get back to a particular SNO and progress its upgrade sequence further. Eventually it becomes too much and TALO starts to give up on SNOs that have gone beyond the 4 hour limit.

Version-Release number of selected component (if applicable):


How reproducible:
100%


Steps to Reproduce:
1. ZTP SNO deployment test at 50 clusters per hour or 100 cluster per hour
2. 
3.

Actual results:
See description

Expected results:
It should be able to do 50 clusters per hour at minimum. Ideally 100 as well or even higher.

Additional info:

Comment 2 jun 2022-04-13 17:22:49 UTC

Changed to verified to unblock backporting