1836299 – NooBaa Operator deploys with HPA that fires maxreplicas alerts by default

Bug 1836299 - NooBaa Operator deploys with HPA that fires maxreplicas alerts by default

Summary: NooBaa Operator deploys with HPA that fires maxreplicas alerts by default

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	Multi-Cloud Object Gateway
Sub Component:
Version:	4.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	OCS 4.6.0
Assignee:	Ohad
QA Contact:	Filip Balák
Docs Contact:	Erin Donnelly
URL:
Whiteboard:
Duplicates (2):	1800599 1840102 (view as bug list)
Depends On:
Blocks:	1826482 1859307 1882359
TreeView+	depends on / blocked

Reported:	2020-05-15 15:08 UTC by Caden Marchese
Modified:	2024-03-25 15:56 UTC (History)
CC List:	21 users (show)
Fixed In Version:	v4.6.0-56.ci
Doc Type:	Bug Fix
Doc Text:	.`MAX HPA` value exceeding `1` no longer triggers an alert In previous versions of Red Hat OpenShift Container Storage, the autoscaling feature for pods was not available. Therefore, the `MAX HPA` value could not be greater than `1`, or an alert was triggered. With this update, this feature is enabled and the alert is no longer triggered.
Clone Of:
Environment:
Last Closed:	2020-12-17 06:22:30 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1885313	unspecified	CLOSED	noobaa-endpoint HPA fires KubeHpaReplicasMismatch alert after installation	2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution)	5116241	None	None	None	2020-06-03 07:48:50 UTC
Red Hat Product Errata	RHSA-2020:5605	None	None	None	2020-12-17 06:22:47 UTC

Internal Links: 1885313

Description Caden Marchese 2020-05-15 15:08:28 UTC

Description of problem:
Deploying OCS rolls out a NooBaa operator that creates the following HPA:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
...
spec:
  maxReplicas: 1
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: noobaa-endpoint
  targetCPUUtilizationPercentage: 80
status:
  currentCPUUtilizationPercentage: 1
  currentReplicas: 1
  desiredReplicas: 1

This HPA results in the below alert:

May 6, 8:56 am
HPA openshift-storage/noobaa-endpoint has been running at max replicas for longer than 15 minutes.

Since the maxReplicas and minReplicas are 1, it will always be at max replicas, and any attempt to disable the HPA or expand max replicas is overwritten, so it seems like a potential defect.

Version-Release number of selected component (if applicable):
4.3.18

Steps to Reproduce:
1. Deploy OCS on top of 4.3.
2. Wait for NooBaa operator to deploy.
3. Observe the HPA alerts.

Actual results:
Alert fires

Expected results:
I assume that the HPA is there for some reason, but if there's no ability to change it without editing the operator and there is no scaleable range, it could just be removed from the operator. 

I can provide must-gather logs on request, if they are needed.

Comment 1 Jan Safranek 2020-05-19 14:11:41 UTC

Moving to OCS (and guessing OCS version an component name).

Comment 3 Nimrod Becker 2020-05-19 14:21:26 UTC

@raz I guess we can push out of 4.5 since auto scale won't be in ?

Comment 5 Nimrod Becker 2020-05-24 06:51:09 UTC

*** Bug 1800599 has been marked as a duplicate of this bug. ***

Comment 6 Ashish Singh 2020-05-26 11:48:27 UTC

*** Bug 1840102 has been marked as a duplicate of this bug. ***

Comment 19 Ashish Singh 2020-06-01 10:41:51 UTC

@Bipin & @Nimrod,

Is this the epic for the auto-scale of noobaa-endpoint?
   https://issues.redhat.com/browse/KNIP-1422

Regards,
Ashish Singh

Comment 20 Nimrod Becker 2020-06-07 05:47:25 UTC

Yes.

Comment 21 Nimrod Becker 2020-06-17 16:24:39 UTC

As mentioned before, as well as another investigation on a similar bug (See https://bugzilla.redhat.com/show_bug.cgi?id=1788126#c23)
These alerts cannot be suppressed.

Pushing to 4.6 when autoscaling might be delivered and then those alerts would stop.

Comment 22 Martin Bukatovic 2020-08-27 17:42:09 UTC

(In reply to Nimrod Becker from comment #21)
> As mentioned before, as well as another investigation on a similar bug (See
> https://bugzilla.redhat.com/show_bug.cgi?id=1788126#c23)
> These alerts cannot be suppressed.

BZ 1788126 could be fixed via inhibition rules, as noted in comment
https://bugzilla.redhat.com/show_bug.cgi?id=1788126#c29

Please reevaluate.

If you want to claim that this can't be fixed, I expect mcg dev team to
find someone from openshift team to validate such opinion.

Comment 23 Nimrod Becker 2020-08-31 07:18:01 UTC

This can't be fixed, we need to change the scale of HPA to 1-2 which will happen in 4.6

Comment 24 Nimrod Becker 2020-09-01 09:00:18 UTC

As part of the endpoint HPA TP in 4.6, the default was set to 1 - 2, the alerts won't fire.
Closing this.

Comment 28 Filip Balák 2020-10-05 13:39:41 UTC

It seems that the fix was not complete. 

I don't see alert mentioned in bug description:
"HPA openshift-storage/noobaa-endpoint has been running at max replicas for longer than 15 minutes."

but I still see the following alert after installation:
"KubeHpaReplicasMismatch: HPA openshift-storage/noobaa-endpoint has not matched the desired number of replicas for longer than 15 minutes."

-> ASSIGNED

Tested with:
ocs-operator.v4.6.0-108.ci


$ oc get deployment noobaa-endpoint -n openshift-storage -o yaml
kind: Deployment
apiVersion: apps/v1
(...)
spec:
  replicas: 1
(...)
status:
  observedGeneration: 1
  replicas: 1
  updatedReplicas: 1
  readyReplicas: 1
  availableReplicas: 1

$ oc get HorizontalPodAutoscaler noobaa-endpoint -n openshift-storage -o yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
(...)
spec:
  maxReplicas: 2
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: noobaa-endpoint
  targetCPUUtilizationPercentage: 80
status:
  currentReplicas: 1
  desiredReplicas: 0

Comment 29 Ohad 2020-10-05 14:50:04 UTC

@Filip, I can tell from the last message that this is not the same issue. 
Can we please close this one and open a new one for the "desiredReplicas: 0" problem?

Comment 30 Filip Balák 2020-10-05 15:32:19 UTC

Ok, based on comments 28 and 29 I VERIFY that the original HPA alert is gone. For remaining HPA problems were reported BZ 1885313 and BZ 1885320.

Comment 31 Mudit Agarwal 2020-11-18 05:59:13 UTC

Nimrod/Ohad, I have changed the doc type to 'Bug Fix' because the BZ is fixed now, pls provide the doc text accordingly.

Comment 33 errata-xmlrpc 2020-12-17 06:22:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5605

Note You need to log in before you can comment on or make changes to this bug.

aivaraslaimikis
aos-bugs
assingh
bkunal
ebondare
edonnell
etamir
fbalak
jritter
jsafrane
madam
mbukatov
mpandey
muagarwa
nbecker
nberry
ocs-bugs
omitrani
ratamir
sostapov
tdesala