Bug 1942609
Summary: | Setting up Kibana and Elasticsearch replica to 0, Kibana pods are created and indexmanagement jobs | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Oscar Casal Sanchez <ocasalsa> |
Component: | Logging | Assignee: | ewolinet |
Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 4.6 | CC: | aos-bugs, ewolinet, gkarager, qitang |
Target Milestone: | --- | ||
Target Release: | 4.6.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | logging-exploration | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: CLO would update the number of replicas based on some assumptions on the CL/instance object.
Consequence: We would incorrectly set the number of kibana replicas as part of creating the kibana CR object.
Fix: We correctly evaluate if the number of replicas aren't specified in the CL/instance object for Kibana. And will default to 0 if not provided.
Result: We can set replicas as 0 for Kibana in the CL/instance object and that value will be passed on to the kibana CR object instead of overriding it to be 1.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-06-29 06:30:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Oscar Casal Sanchez
2021-03-24 15:55:15 UTC
The index management cronjobs are going to schedule pods, setting the number of es nodes to 0 should have no impact on that. The fact that the number of kibana replicas don't match what is specified there is a bug though. It's possible CLO is overwriting this value when creating the Kibana CR. @Oscar, can you provide the yaml output of the kibana CR for the cluster? @ewolinet, - About the issue with Kibana I was having some issues today with the labs for getting one clean and running. Let me until tomorrow for being able to reproduce it and providing the Kibana CR. - About the issue with the indexmanagement jobs I agree that it shouldn't have an impact, but if ES is scale down to 0, I can understand that we should have the enough logic in the operator to don't run the indexmanagement jobs. It doesn't make sense to have a job running each 15 minutes when you know that it won't work, the same with the curator jobs. Then, the operator should have the logic implemented for having in consideration that ES doesn't exist more since it's defined like that and doesn't create the jobs for curator and indexmanagement. If you prefer, we could manage this in a different bug and I could split it and move this issue to one different, but from my perspective, it's clear that it's a bug: something that it's trying to execute against another thing that it's defined that it shouldn't exist. (In reply to Oscar Casal Sanchez from comment #3) > @ewolinet, > > - About the issue with Kibana > > I was having some issues today with the labs for getting one clean and > running. Let me until tomorrow for being able to reproduce it and providing > the Kibana CR. I was trying to recreate this locally and am unable to see this happen as well. If I define both an ES and Kibana section for my cl/instance object I see the kibana and elasticsearch cr's get created but with 0 replicas. > - About the issue with the indexmanagement jobs > > I agree that it shouldn't have an impact, but if ES is scale down to 0, I > can understand that we should have the enough logic in the operator to don't > run the indexmanagement jobs. It doesn't make sense to have a job running > each 15 minutes when you know that it won't work, the same with the curator > jobs. Then, the operator should have the logic implemented for having in > consideration that ES doesn't exist more since it's defined like that and > doesn't create the jobs for curator and indexmanagement. > > If you prefer, we could manage this in a different bug and I could split > it and move this issue to one different, but from my perspective, it's clear > that it's a bug: something that it's trying to execute against another thing > that it's defined that it shouldn't exist. While I agree it could be handled better. I would argue that this is not a normal use case. If you were going to specify 0 ES nodes, you shouldn't define a logstore section in your cl/instance object (this prevents the indexmanagement cronjobs from being created). Please do not open a bug, this is working as expected. If you feel it should be different I would request that it instead is a RFE and it can be prioritized like any other feature. Hello, I was able to reproduce following the steps below (take in consideration that something similar is happening with ES as you can see below) Environment: ~~~ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.22 True False 15h Cluster version is 4.6.22 $ oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterlogging.4.6.0-202103130248.p0 Cluster Logging 4.6.0-202103130248.p0 Succeeded elasticsearch-operator.4.6.0-202103130248.p0 OpenShift Elasticsearch Operator 4.6.0-202103130248.p0 Succeeded ~~~ ### 1. Create CLO instance like below without kibana definition in the spec section. The Kibana pod is not created as it's expected ~~~ apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" logStore: type: "elasticsearch" elasticsearch: nodeCount: 1 resources: limits: memory: 2Gi requests: cpu: 200m memory: 2Gi storage: {} redundancyPolicy: "ZeroRedundancy" curation: type: "curator" curator: resources: limits: memory: 200Mi requests: cpu: 200m memory: 200Mi schedule: "*/5 * * * *" collection: logs: type: "fluentd" fluentd: resources: {} ~~~ ###2. Added to the CLO instance the Kibana definition with "replicas: 1" and the Kibana pod was created as expected ~~~ $ oc edit clusterlogging ... visualization: kibana: replicas: 0 resources: limits: memory: 512Mi requests: cpu: 500m memory: 512Mi type: kibana ... $ oc get clusterlogging instance -o jsonpath='{.spec.visualization}' {"kibana":{"replicas":1,"resources":{"limits":{"memory":"512Mi"},"requests":{"cpu":"500m","memory":"512Mi"}}},"type":"kibana"} $ oc get pods -l component=kibana NAME READY STATUS RESTARTS AGE kibana-6cb7cbbb97-sgfz2 2/2 Running 0 87s $ oc get kibana NAME MANAGEMENT STATE REPLICAS kibana Managed 1 ###3. Move kibana replicas to 0 and Kibana pod is not deleted as it should be ~~~ $ oc edit clusterlogging instance ... spec: ... visualization: kibana: replicas: 0 ... $ oc get pods -l component=kibana NAME READY STATUS RESTARTS AGE kibana-b66b87b58-5s5m6 2/2 Running 0 32m ### Verify that in the CLO instance, the Kibana replicas is 0 $ oc get clusterlogging instance -o jsonpath='{.spec.visualization}' {"kibana":{"replicas":0,"resources":{"limits":{"memory":"512Mi"},"requests":{"cpu":"500m","memory":"512Mi"}}},"type":"kibana"} ### The Kibana CR maintains the Replicas to 1 instead of 0 $ oc get kibana NAME MANAGEMENT STATE REPLICAS kibana Managed 1 ~~~ 4. Tried to delete from CLO instance the kibana configuration and the Kibana pod remains running ~~~ $ oc get clusterlogging instance -o jsonpath='{.spec.visualization}' $ oc get pods -l component=kibana NAME READY STATUS RESTARTS AGE kibana-b66b87b58-5s5m6 2/2 Running 0 41m $ oc get kibana kibana NAME MANAGEMENT STATE REPLICAS kibana Managed 1 ~~~ The same happens for Elasticsearch if it's modified the CLO for setting in the LogStore stanza `nodeCount: 0`. The fluentd pods are restarted (then, they are detecting a change), but the ES pod continues running. ~~~ $ oc get clusterlogging instance -o jsonpath='{.spec.logStore}' {"elasticsearch":{"nodeCount":0,"redundancyPolicy":"ZeroRedundancy","resources":{"limits":{"memory":"2Gi"},"requests":{"cpu":"200m","memory":"2Gi"}},"storage":{}},"type":"elasticsearch"} $ oc get pods -l component=elasticsearch NAME READY STATUS RESTARTS AGE elasticsearch-cdm-pyyw8xl2-1-7687d9484c-r5pnn 2/2 Running 0 46m $ oc get elasticsearch NAME MANAGEMENT STATE HEALTH NODES DATA NODES SHARD ALLOCATION INDEX MANAGEMENT elasticsearch Managed green 1 1 all $ oc get pods -l component=fluentd NAME READY STATUS RESTARTS AGE fluentd-qmhmd 1/1 Running 0 44m fluentd-r72mj 1/1 Running 0 29s fluentd-rgq87 1/1 Running 0 44m fluentd-tspm2 1/1 Running 0 11s fluentd-tvq6g 1/1 Running 0 41s fluentd-w465b 0/1 ContainerCreating 0 2s ~~~ Now, we delete from the CLO the stanza for Elasticsearch and the pod is deleted (this is one difference with what's happening to Kibana that it's not deleted) ~~~ $ oc edit clusterlogging $ oc get clusterlogging instance -o jsonpath='{.spec.logStore}' $ oc get pods -l component=elasticsearch No resources found in openshift-logging namespace. (In reply to Oscar Casal Sanchez from comment #5) > ###3. Move kibana replicas to 0 and Kibana pod is not deleted as it should be > ### The Kibana CR maintains the Replicas to 1 instead of 0 > $ oc get kibana > NAME MANAGEMENT STATE REPLICAS > kibana Managed 1 > ~~~ > > 4. Tried to delete from CLO instance the kibana configuration and the Kibana > pod remains running > > ~~~ > $ oc get clusterlogging instance -o jsonpath='{.spec.visualization}' > > $ oc get pods -l component=kibana > NAME READY STATUS RESTARTS AGE > kibana-b66b87b58-5s5m6 2/2 Running 0 41m > > $ oc get kibana kibana > NAME MANAGEMENT STATE REPLICAS > kibana Managed 1 > ~~~ > Let me try to recreate this using your above steps, it seems the issue is stemming from how the Cluster Logging Operator is gating/controlling the kibana CR. > The same happens for Elasticsearch if it's modified the CLO for setting in > the LogStore stanza `nodeCount: 0`. The fluentd pods are restarted (then, > they are detecting a change), but the ES pod continues running. This is because the Elasticsearch Operator is preventing a total scale down (it does this to protect the cluster, a minimum of one master and one data node is required for a cluster). This works as expected and there should be a status message in the elasticsearch CR that explains this. Thank you Oscar. I'm able to recreate this using your above steps and I can see where in the code it prevents this. It looks like this is a continuation from https://bugzilla.redhat.com/show_bug.cgi?id=1901424 Hello ewolinet, Thank you so much for your update. I'm glad to see that you are able to reproduce it now and that you are able to see where it's failing. My apologies for perhaps missing some steps in the case description that could lead you to not being able to reproduce it. I'll create a solution related to this issue, although, at this moment, I don't feel that it has a high priority since it's a case corner use case, it would be good to fix it in the future. Best regards, Oscar The fix for this was merged back on May 6th. Not sure why our bot didn't link the fix here nor move this card along https://github.com/openshift/cluster-logging-operator/pull/998 Verified this on issue on clusterlogging.4.6.0-202106021513, elasticsearch-operator.4.6.0-202106100456. Issue is fixed: When kibana replica is set to 0, kibana pods are not created. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Enterprise security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2500 |