Description of problem: While creating 5000 empty projects on Azure, kube-apiserver container starts continuously exiting and restarting. The API is unreachable (not even intermittently) and oc adm must-gather is not possible. Will add a private comment with location of master journals and tarball of all master pod logs. The cluster is on 3 master/3 computes of size Standard_D4s_v3 (4 vCpu/16Gi memory). The same test on equivalent sized AWS instances (m4.xlarge) is successful. Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2020-09-22-073212 How reproducible: Mostly. 1 successful run creating 5K projects, 3 failed. Steps to Reproduce: 1. Create Azure cluster with Standard_D4s_v3 instances. 3 masters, 3 computes 2. for i in {0..4999}; do echo $i; oc new-project --skip-config-write test$i; done Actual results: Cluster becomes unresponsive. No oc commands work. kube-apiserver container on all masters continuously exits and restarts.
The API servers time out creating RBAC objects. This is very probably due to slow etcd on Azure. Moving to etcd for them to look at metrics.
IPI for Azure has 8 CPU and has specific build requirements outlined below[1]. If the test is conducted with hardware below these thresholds then I would say perf failure is expected. We should be testing based on what we ship. Minimum Azure Requirements Summary: - at least Standard_D8s_v3 (8 vCPU, 32GiB memory) - 1 TiB Premium SSD (P30) - host caching to ReadOnly Closing as not a bug if you can rerun the test with the min hardware requirements and still hit these same failure cases we can explore at that time. [1]https://docs.google.com/document/d/1yPpakMC1OSOWeeM4m_bHDuLECLPXrpq5ow2C9HEcs1A/edit#heading=h.lvwt62wax7yu
If those are the minimum requirements they need to be in the documentation
Or better yet, the default installation configuration for IPI on Azure.
Sounds great, thank you!
Mike, I am planning to update these requirements in docs here: https://docs.openshift.com/container-platform/4.8/installing/installing_azure/installing-azure-account.html#installation-azure-limits_installing-azure-account We already have the vCPU requirements mentioned here. However, need to ad: - 1 TiB Premium SSD (P30) - host caching to ReadOnly As per my analysis, this seems to be relevant to OS Disk component. Can you confirm what component could these be mapped to?
Confirming that this is relevant to OS Disk.
PR - https://github.com/openshift/openshift-docs/pull/35887
Confirmed we now have minimum disk size and performance requirements in the docs.
@sbatsche - Request your response for the above query. Since this bug is related to Azure and any similar fixes would need to be taken as a different scenario, I am marking this bug closed.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days