Description of problem: After scaling the baremetal environment to 500 worker nodes, the operator installation via operatorhub/CLI fails with `unknown failure` message, InstallPlan has not created and stuck forever. But I could install on a smaller sized cluster (3 master + 10 workers) running same OCP release, adding more workers seems to be a problem for OLM. Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2021-10-21-105053 - Gets in to 'Unknown failure' without install plan 4.10.0-0.nightly-2021-12-12-184227 - Gets in to 'Upgrade Pending' without install plan How reproducible: Always with 400+ nodes Steps to Reproduce: 1. Deploy a cluster with 4.10 nightly with atleast 10 workers 2. Install operators, it will be a success 3. Add more workers atleast 400 and try to re-install any operator or install new operator 4. Installation would get stuck with Unknown failure Actual results: Installation get stuck forever Expected results: Operators installation should go thru as normal Additional info: seems to be related to bz1860185
Hi Murali, Sorry for the delay in getting back to you. The end of the year was a hectic rush to the finish line. I wanted to ask you whether you still have the cluster up and running. If you could still share the the provisioner host details, it would be super appreciated. In the mean time we'll look into the must-gather to see if there's anything we can learn. Cheers, Per
Hey Per, Nope, we don't have the environment currently, they were temporarily allocated from scalelab. I will get a smaller size(120 node) lab allocation from next week, will let you know if this is reproducible again. Thanks, Murali
Hey Murali, Just wanted to touch-based with you again on this issue. Any developments on your side? Cheers, Per
Bumping this down to medium/medium. Doesn't seem like a blocker atm.
Hey Per, I have a 120 node cluster but testing something else on 4.9 GA, I will do a 4.10 nightly build after this and will try to reproduce the problem. I will DM you once I find something useful. 500 node with 4.9.12 was working fine and able to install operators throughout. Thanks, Murali
You're a legend, Murali! Thank you ^^
Hey Per, I tried it on a 120 node cluster running 4.10.0-fc.2 build, I am able to install operators without any issue. Not sure if the problem was only on that particular nightly release or something else, anyway will open a fresh bz if I see it again. Thanks, Murali
Awesome, thank you, Murali! I'll close as NOTABUG and hopefully won't hear back from you on this matter XD