Bug 1896916

Summary:	Changes on spec.logStore.elasticsearch.nodeCount not reflected when decreasing the number of nodes
Product:	OpenShift Container Platform	Reporter:	Michael Burke <mburke>
Component:	Documentation	Assignee:	Michael Burke <mburke>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Xiaoli Tian <xtian>
Severity:	unspecified	Docs Contact:	Vikram Goyal <vigoyal>
Priority:	unspecified
Version:	4.5	CC:	aos-bugs, jokerman
Target Milestone:	---
Target Release:	4.7.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-12-03 01:59:11 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Michael Burke 2020-11-11 20:12:20 UTC

This bug was initially created as a copy of Bug #1879150

I am copying this bug because: 


> would you be alright with just capturing this in the docs for now and we can
> create a jira card to track work for the operator preventing this scale down
> with appropriate status conditions presented to the user?
As highlighted previously, I'm fine if we document the risk of scaling down in relation to potential data loss. That should be sufficient as this bug is around capabilities of scaling down which did not work as expected. The other issue should be captured in the Feature-Request to have it at least on the radar, tracked and eventually implemented.



Description of problem:

Running OpenShift 4.5.8 with Cluster Logging clusterlogging.4.5.0-202009041228.p0 does not correctly decrease the number of Elasticsearch Nodes.

> Spec:
>   Collection:
>     Logs:
>       Fluentd:
>       Type:  fluentd
>   Curation:
>     Curator:
>       Schedule:  30 3 * * *
>     Type:        curator
>   Log Store:
>     Elasticsearch:
>       Node Count:         5
>       Redundancy Policy:  FullRedundancy
>       Resources:
>         Limits:
>           Memory:  4Gi
>         Requests:
>           Cpu:     500m
>           Memory:  2Gi
>       Storage:
>         Size:                200G
>         Storage Class Name:  gp2
>     Retention Policy:
>       Application:
>         Max Age:  1d
>       Audit:
>         Max Age:  7d
>       Infra:
>         Max Age:     7d
>     Type:            elasticsearch
>   Management State:  Managed
>   Visualization:
>     Kibana:
>       Replicas:  1
>     Type:        kibana

shows

> $ oc get pod -l component=elasticsearch
> NAME                                            READY   STATUS    RESTARTS   AGE
> elasticsearch-cd-hh1vvavv-1-db447f8c4-797hz     2/2     Running   0          50m
> elasticsearch-cd-hh1vvavv-2-8c6fb9f45-8zgsr     2/2     Running   0          50m
> elasticsearch-cdm-gbgfqisu-1-75b49786b6-m72qt   2/2     Running   0          72m
> elasticsearch-cdm-gbgfqisu-2-7f77c4947f-vmx7t   2/2     Running   0          72m
> elasticsearch-cdm-gbgfqisu-3-6d5955bd8d-vnz9h   2/2     Running   0          72m

When updating ClusterLogging resource "instance" and decreasing the node count to 3 we still see 5 Elasticsearch nodes running.

> Spec:
>   Collection:
>     Logs:
>       Fluentd:
>       Type:  fluentd
>   Curation:
>     Curator:
>       Schedule:  30 3 * * *
>     Type:        curator
>   Log Store:
>     Elasticsearch:
>       Node Count:         3
>       Redundancy Policy:  FullRedundancy
>       Resources:
>         Limits:
>           Memory:  4Gi
>         Requests:
>           Cpu:     500m
>           Memory:  2Gi
>       Storage:
>         Size:                200G
>         Storage Class Name:  gp2
>     Retention Policy:
>       Application:
>         Max Age:  1d
>       Audit:
>         Max Age:  7d
>       Infra:
>         Max Age:     7d
>     Type:            elasticsearch
>   Management State:  Managed
>   Visualization:
>     Kibana:
>       Replicas:  1
>     Type:        kibana

> $ oc get pod -l component=elasticsearch
> NAME                                            READY   STATUS    RESTARTS   AGE
> elasticsearch-cd-hh1vvavv-1-db447f8c4-797hz     2/2     Running   0          50m
> elasticsearch-cd-hh1vvavv-2-8c6fb9f45-8zgsr     2/2     Running   0          50m
> elasticsearch-cdm-gbgfqisu-1-75b49786b6-m72qt   2/2     Running   0          72m
> elasticsearch-cdm-gbgfqisu-2-7f77c4947f-vmx7t   2/2     Running   0          72m
> elasticsearch-cdm-gbgfqisu-3-6d5955bd8d-vnz9h   2/2     Running   0          72m

Even when deleting Elasticsearch pod it will be re-created immediately. Also when adjusting "Redundancy Policy" from "FullRedundancy" to "SingleRedundancy" it does not take any effect.

Version-Release number of selected component (if applicable):

 - clusterlogging.4.5.0-202009041228.p0

How reproducible:

 - Always

Steps to Reproduce:
1. Install OpenShift Logging according https://docs.openshift.com/container-platform/4.5/logging/cluster-logging-deploying.html
2. Increase the number of Elasticsearch Nodes from 3 to 5
3. Decrease the number of Elasticsearch Nodes from 5 to 3

Actual results:

All 5 Elasticsearch Nodes keep running and there is no attempt made reduce the number of Elasticsearch Nodes. Also changes to "Redundancy Policy" are not reflected (if done at the same time or not)

Expected results:

Number of Elasticsearch Nodes to be properly reflected at all time and the Operator to take action if spec.logStore.elasticsearch.nodeCount is modified.

Additional info:

Comment 1 Michael Burke 2020-11-16 19:26:58 UTC

*** Bug 1898310 has been marked as a duplicate of this bug. ***

Comment 2 Michael Burke 2020-11-16 20:43:48 UTC

https://github.com/openshift/openshift-docs/pull/27404

Comment 3 Michael Burke 2020-12-03 01:59:11 UTC

Changes are live: https://docs.openshift.com/container-platform/4.6/logging/config/cluster-logging-log-store.html#cluster-logging-manual-rollout-rolling_cluster-logging-store