Bug 2074236

Summary: TALO not keeping up with large scale SNO ZTP deployment case
Product: OpenShift Container Platform Reporter: jun
Component: Telco EdgeAssignee: jun
Telco Edge sub component: RAN QA Contact: yliu1
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: akrzos, imiller
Version: 4.10   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2075134 (view as bug list) Environment:
Last Closed: 2022-08-26 16:43:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2075134    

Description jun 2022-04-11 20:12:23 UTC
Description of problem:
TALO couldn't keep up with the amount of reconcile events in large scale ZTP SNO tests. As the number of installed SNOs increases, it takes longer and longer for TALO to get back to a particular SNO and progress its upgrade sequence further. Eventually it becomes too much and TALO starts to give up on SNOs that have gone beyond the 4 hour limit.

Version-Release number of selected component (if applicable):


How reproducible:
100%


Steps to Reproduce:
1. ZTP SNO deployment test at 50 clusters per hour or 100 cluster per hour
2. 
3.

Actual results:
See description

Expected results:
It should be able to do 50 clusters per hour at minimum. Ideally 100 as well or even higher.

Additional info:

Comment 2 jun 2022-04-13 17:22:49 UTC
Changed to verified to unblock backporting