Bug 1879150
Summary: | Changes on spec.logStore.elasticsearch.nodeCount not reflected when decreasing the number of nodes | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Simon Reber <sreber> | |
Component: | Logging | Assignee: | ewolinet | |
Status: | CLOSED ERRATA | QA Contact: | Qiaoling Tang <qitang> | |
Severity: | medium | Docs Contact: | Rolfe Dlugy-Hegwer <rdlugyhe> | |
Priority: | medium | |||
Version: | 4.5 | CC: | anli, aos-bugs, ewolinet, jcantril, periklis, qitang, rdlugyhe, rsandu | |
Target Milestone: | --- | |||
Target Release: | 4.7.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | logging-exploration | |||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
* Previously, when the Cluster Logging Operator (CLO) scaled down the number of Elasticsearch nodes in the clusterlogging custom resource (CR) to three nodes, it omitted previously-created nodes that had unique IDs. The Elasticsearch Operator (EO) rejected the update because it has safeguards that prevent nodes with unique IDs from being removed. The current release fixes this issue. Now, when the CLO scales down the number of nodes and updates the Elasticsearch CR, the CLO does not omit nodes that have unique IDs. Instead, the CLO marks those nodes as `count 0`. As a result, users can scale down their cluster to three nodes using the clusterlogging CR.
(link:https://bugzilla.redhat.com/show_bug.cgi?id=1879150[*BZ#1879150*])
|
Story Points: | --- | |
Clone Of: | ||||
: | 1898310 (view as bug list) | Environment: | ||
Last Closed: | 2021-02-24 11:21:18 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1890801 |
Description
Simon Reber
2020-09-15 14:28:27 UTC
Moving to 4.7 as this is not a 4.6 blocker @Simon Please collect a full must-gather for cluster-logging using to get a full picture of the stack: https://github.com/openshift/cluster-logging-operator/tree/master/must-gather Marking UpcomingSprint as will not be merged or addressed by EOD Verified with elasticsearch-operator.4.7.0-202011030448.p0 50% ES cluster went into Red Status in 10 scale down. Move back to assign to continue investigate. When the replicas shards wasn't created, the ES may went into Red. ##Before scale down: + oc exec -c elasticsearch elasticsearch-cdm-znu3x9e7-1-78b488bcf6-zq22z -- es_util --query=_cat/shards .security 0 p STARTED 5 29.6kb 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 .security 0 r STARTED 5 29.6kb 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 audit-000001 1 p STARTED 10.131.0.27 elasticsearch-cdm-znu3x9e7-3 audit-000001 2 p STARTED 0 230b 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 audit-000001 0 p STARTED 0 230b 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 app-000001 1 p STARTED 0 230b 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 app-000001 2 p STARTED 0 230b 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 app-000001 0 p STARTED 10.131.0.27 elasticsearch-cdm-znu3x9e7-3 infra-000001 1 p STARTED 10.131.0.27 elasticsearch-cdm-znu3x9e7-3 infra-000001 2 p STARTED 7917 4.3mb 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 infra-000001 0 p STARTED 7191 4mb 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 .kibana_1 0 r STARTED 0 230b 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 .kibana_1 0 p STARTED 10.131.0.27 elasticsearch-cdm-znu3x9e7-3 ##After scale down: + oc exec -c elasticsearch elasticsearch-cdm-znu3x9e7-1-78b488bcf6-zq22z -- es_cluster_health { "cluster_name" : "elasticsearch", "status" : "red", "timed_out" : false, "number_of_nodes" : 2, "number_of_data_nodes" : 2, "active_primary_shards" : 8, "active_shards" : 9, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 4, "delayed_unassigned_shards" : 4, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 69.23076923076923 } + oc exec -c elasticsearch elasticsearch-cdm-znu3x9e7-1-78b488bcf6-zq22z -- es_util --query=_cat/shards .security 0 p STARTED 5 29.6kb 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 .security 0 r STARTED 5 29.6kb 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 audit-000001 1 p UNASSIGNED audit-000001 2 p STARTED 0 230b 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 audit-000001 0 p STARTED 0 230b 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 app-000001 1 p STARTED 0 127.6kb 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 app-000001 2 p STARTED 0 136kb 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 app-000001 0 p UNASSIGNED infra-000001 1 p UNASSIGNED infra-000001 2 p STARTED 7917 4.3mb 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 infra-000001 0 p STARTED 7191 4mb 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 .kibana_1 0 p STARTED 0 230b 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 When the replicas shards wasn't created, the ES may went into Red. ##Before scale down: + oc exec -c elasticsearch elasticsearch-cdm-znu3x9e7-1-78b488bcf6-zq22z -- es_util --query=_cat/shards .security 0 p STARTED 5 29.6kb 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 .security 0 r STARTED 5 29.6kb 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 audit-000001 1 p STARTED 10.131.0.27 elasticsearch-cdm-znu3x9e7-3 audit-000001 2 p STARTED 0 230b 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 audit-000001 0 p STARTED 0 230b 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 app-000001 1 p STARTED 0 230b 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 app-000001 2 p STARTED 0 230b 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 app-000001 0 p STARTED 10.131.0.27 elasticsearch-cdm-znu3x9e7-3 infra-000001 1 p STARTED 10.131.0.27 elasticsearch-cdm-znu3x9e7-3 infra-000001 2 p STARTED 7917 4.3mb 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 infra-000001 0 p STARTED 7191 4mb 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 .kibana_1 0 r STARTED 0 230b 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 .kibana_1 0 p STARTED 10.131.0.27 elasticsearch-cdm-znu3x9e7-3 ##After scale down: + oc exec -c elasticsearch elasticsearch-cdm-znu3x9e7-1-78b488bcf6-zq22z -- es_cluster_health { "cluster_name" : "elasticsearch", "status" : "red", "timed_out" : false, "number_of_nodes" : 2, "number_of_data_nodes" : 2, "active_primary_shards" : 8, "active_shards" : 9, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 4, "delayed_unassigned_shards" : 4, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 69.23076923076923 } + oc exec -c elasticsearch elasticsearch-cdm-znu3x9e7-1-78b488bcf6-zq22z -- es_util --query=_cat/shards .security 0 p STARTED 5 29.6kb 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 .security 0 r STARTED 5 29.6kb 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 audit-000001 1 p UNASSIGNED audit-000001 2 p STARTED 0 230b 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 audit-000001 0 p STARTED 0 230b 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 app-000001 1 p STARTED 0 127.6kb 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 app-000001 2 p STARTED 0 136kb 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 app-000001 0 p UNASSIGNED infra-000001 1 p UNASSIGNED infra-000001 2 p STARTED 7917 4.3mb 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 infra-000001 0 p STARTED 7191 4mb 10.129.2.25 elasticsearch-cdm-znu3x9e7-2 .kibana_1 0 p STARTED 0 230b 10.128.2.22 elasticsearch-cdm-znu3x9e7-1 To scale down the ES cluster, I think the ES must match the some conditions. ZeroRedundancy: Don't all allow scale down. SingleRedundancy: Even if the replias shard has been created, the ES nodes only can be scaled down one by one. MultipleRedundancy: Even if all replias shards have been created, As we don't know where the replicas shards located. the ES nodes should be scaled down one by one. FullRedundancy: If all replicas have been created, scale down 1 to n-1 nodes. The EO should check the replicas shard status and block new indics generation. @anli I think that would be a further feature. If a user is going to be scaling down their ES cluster, they should understand the risk for data loss if there is no replication. Docs BZ for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1896916 Thanks Michael Move to verified Created https://github.com/openshift/openshift-docs/pull/27404 to document the warnings about scaling down and the node minimums as listed in https://issues.redhat.com/browse/LOG-981. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0652 |