Description of problem: There is a kubernetes issue https://github.com/kubernetes/kubernetes/issues/79365 that affects HPAs in OpenShift. It seems that because of this issue is triggered alert KubeHpaReplicasMismatch after OCS is installed. Version-Release number of selected component (if applicable): OCP: 4.6.0-0.nightly-2020-10-03-051134 OCS: ocs-operator.v4.6.0-108.ci How reproducible: 100% Steps to Reproduce: 1. Install OCP. 2. Install OCS. 3. Navigate to Monitoring -> Alerting in OCP UI. Actual results: There is alert KubeHpaReplicasMismatch: HPA openshift-storage/noobaa-endpoint has not matched the desired number of replicas for longer than 15 minutes. Expected results: There should be no KubeHpaReplicasMismatch alert. Additional info: OCS BZ 1885313
reassigning to Node component who is handling HPA.
Hi, I investigated the original issue with @Filip I would like to add the folowing information: 1. The KubeHpaReplicasMismatch event on the HPA is persistent and continues 2. From my obervation the cause for the event is fact that the HPA decided that the desiredReplicas should be 0 (we can see it in the status section of the HPA resource) while the actual replica count is 1 3. The desried replica should never be 0 because the spec specify minReplicas of 1 (and a maxReplicas of 2) 4. The current CPU metric observed by the HPA controller is "Unknown/80%", this can be seen when using the describe command on the HPA resource 5. We are aware there is a known issue regarding pods with initContainers (https://bugzilla.redhat.com/show_bug.cgi?id=1867477) but our pods' spec does not define any initContainer or any sidecar containers 6. This was testetd on top of OCP 4.6 we had a different issue when we tested on top of OCP 4.5 I hope this information is helpful
Can you oc describe the hpa resource? Does the spec for openshift-storage/noobaa-endpoint contain a zero for the Replicas count? Setting it to zero will disable the autoscaling for that resource. Can you add the logs from the HPA as well?
> Can you oc describe the hpa resource? > Can you add the logs from the HPA as well? @Filip Can you please provide it? maybe from the OCS bug? @Ryan The noobaa-endpoint deployment is created with a replica of 1 and we do not change it at all, we leave this responsibility to the HPA
Are you ok if we target a fix for this bug in 4.6.z?
One more thing to keep in mind is after 1) Install OCP. Monitoring is usually still installing after an OCP cluster is 'up'. Does the ocs-operator make sure that all it's dependent components are running?
> Does the ocs-operator make sure that all it's dependent components are running? This is a very good point, and I am not sure what the answer is. I will have to check and come back with an answer. But even if this is correct, why does it not resolve itself later in the lifetime of the cluster?