[Description of problem] When the CLO instance is defined with Kibana replica to 0 like this: ~~~ $ oc -n openshift-logging get clusterlogging instance -o yaml ... spec: collection: logs: fluentd: resources: {} type: fluentd logStore: elasticsearch: nodeCount: 0 redundancyPolicy: ZeroRedundancy resources: limits: memory: 2Gi requests: cpu: 200m memory: 2Gi storage: {} type: elasticsearch managementState: Managed visualization: kibana: replicas: 0 resources: limits: memory: 512Mi requests: cpu: 500m memory: 512Mi type: kibana ~~~ After that, checking the pods created, it's visible that a Kibana pod is running and the indexmanagement jobs are running: ~~~ $ oc get pods NAME READY STATUS RESTARTS AGE cluster-logging-operator-565d4fd5bc-czg8h 1/1 Running 0 11m elasticsearch-im-app-1616600700-zhncg 0/1 Error 0 3m39s elasticsearch-im-audit-1616600700-44wzc 0/1 Error 0 3m37s elasticsearch-im-infra-1616600700-jbk65 0/1 Error 0 3m30s fluentd-4bj6g 1/1 Running 0 10m fluentd-82zd5 1/1 Running 0 10m fluentd-h2dbk 1/1 Running 0 10m fluentd-lwcbw 1/1 Running 0 10m fluentd-mtwx4 1/1 Running 0 10m fluentd-sz7gh 1/1 Running 0 10m kibana-84cd8747c5-s9mct 2/2 Running 0 9m32s ~~~ [Version-Release number of selected component (if applicable):] OCP 4.6 ~~~ $ oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterlogging.4.6.0-202103060451.p0 Cluster Logging 4.6.0-202103060451.p0 Succeeded elasticsearch-operator.4.6.0-202103060018.p0 OpenShift Elasticsearch Operator 4.6.0-202103060018.p0 Succeeded ~~~ [How reproducible:] Always [Steps to Reproduce] 1. Define the CLO as: ~~~ $ oc -n openshift-logging get clusterlogging instance -o yaml ... spec: collection: logs: fluentd: resources: {} type: fluentd logStore: elasticsearch: nodeCount: 0 redundancyPolicy: ZeroRedundancy resources: limits: memory: 2Gi requests: cpu: 200m memory: 2Gi storage: {} type: elasticsearch managementState: Managed visualization: kibana: replicas: 0 resources: limits: memory: 512Mi requests: cpu: 500m memory: 512Mi type: kibana ~~~ 2. Check that kibana pod is created and indexmanagement jobs ~~~ $ oc get pods NAME READY STATUS RESTARTS AGE cluster-logging-operator-565d4fd5bc-czg8h 1/1 Running 0 11m elasticsearch-im-app-1616600700-zhncg 0/1 Error 0 3m39s elasticsearch-im-audit-1616600700-44wzc 0/1 Error 0 3m37s elasticsearch-im-infra-1616600700-jbk65 0/1 Error 0 3m30s fluentd-4bj6g 1/1 Running 0 10m fluentd-82zd5 1/1 Running 0 10m fluentd-h2dbk 1/1 Running 0 10m fluentd-lwcbw 1/1 Running 0 10m fluentd-mtwx4 1/1 Running 0 10m fluentd-sz7gh 1/1 Running 0 10m kibana-84cd8747c5-s9mct 2/2 Running 0 9m32s ~~~ [Actual results] - Kibana pod is created even when replica is 0 - Indexmanagement pods are running even when the logStore replica is defined to 0 [Expected results:] - Not kibana pod - Not Indexmanagement jobs running
The index management cronjobs are going to schedule pods, setting the number of es nodes to 0 should have no impact on that. The fact that the number of kibana replicas don't match what is specified there is a bug though.
It's possible CLO is overwriting this value when creating the Kibana CR. @Oscar, can you provide the yaml output of the kibana CR for the cluster?
@ewolinet, - About the issue with Kibana I was having some issues today with the labs for getting one clean and running. Let me until tomorrow for being able to reproduce it and providing the Kibana CR. - About the issue with the indexmanagement jobs I agree that it shouldn't have an impact, but if ES is scale down to 0, I can understand that we should have the enough logic in the operator to don't run the indexmanagement jobs. It doesn't make sense to have a job running each 15 minutes when you know that it won't work, the same with the curator jobs. Then, the operator should have the logic implemented for having in consideration that ES doesn't exist more since it's defined like that and doesn't create the jobs for curator and indexmanagement. If you prefer, we could manage this in a different bug and I could split it and move this issue to one different, but from my perspective, it's clear that it's a bug: something that it's trying to execute against another thing that it's defined that it shouldn't exist.
(In reply to Oscar Casal Sanchez from comment #3) > @ewolinet, > > - About the issue with Kibana > > I was having some issues today with the labs for getting one clean and > running. Let me until tomorrow for being able to reproduce it and providing > the Kibana CR. I was trying to recreate this locally and am unable to see this happen as well. If I define both an ES and Kibana section for my cl/instance object I see the kibana and elasticsearch cr's get created but with 0 replicas. > - About the issue with the indexmanagement jobs > > I agree that it shouldn't have an impact, but if ES is scale down to 0, I > can understand that we should have the enough logic in the operator to don't > run the indexmanagement jobs. It doesn't make sense to have a job running > each 15 minutes when you know that it won't work, the same with the curator > jobs. Then, the operator should have the logic implemented for having in > consideration that ES doesn't exist more since it's defined like that and > doesn't create the jobs for curator and indexmanagement. > > If you prefer, we could manage this in a different bug and I could split > it and move this issue to one different, but from my perspective, it's clear > that it's a bug: something that it's trying to execute against another thing > that it's defined that it shouldn't exist. While I agree it could be handled better. I would argue that this is not a normal use case. If you were going to specify 0 ES nodes, you shouldn't define a logstore section in your cl/instance object (this prevents the indexmanagement cronjobs from being created). Please do not open a bug, this is working as expected. If you feel it should be different I would request that it instead is a RFE and it can be prioritized like any other feature.
Hello, I was able to reproduce following the steps below (take in consideration that something similar is happening with ES as you can see below) Environment: ~~~ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.22 True False 15h Cluster version is 4.6.22 $ oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterlogging.4.6.0-202103130248.p0 Cluster Logging 4.6.0-202103130248.p0 Succeeded elasticsearch-operator.4.6.0-202103130248.p0 OpenShift Elasticsearch Operator 4.6.0-202103130248.p0 Succeeded ~~~ ### 1. Create CLO instance like below without kibana definition in the spec section. The Kibana pod is not created as it's expected ~~~ apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" logStore: type: "elasticsearch" elasticsearch: nodeCount: 1 resources: limits: memory: 2Gi requests: cpu: 200m memory: 2Gi storage: {} redundancyPolicy: "ZeroRedundancy" curation: type: "curator" curator: resources: limits: memory: 200Mi requests: cpu: 200m memory: 200Mi schedule: "*/5 * * * *" collection: logs: type: "fluentd" fluentd: resources: {} ~~~ ###2. Added to the CLO instance the Kibana definition with "replicas: 1" and the Kibana pod was created as expected ~~~ $ oc edit clusterlogging ... visualization: kibana: replicas: 0 resources: limits: memory: 512Mi requests: cpu: 500m memory: 512Mi type: kibana ... $ oc get clusterlogging instance -o jsonpath='{.spec.visualization}' {"kibana":{"replicas":1,"resources":{"limits":{"memory":"512Mi"},"requests":{"cpu":"500m","memory":"512Mi"}}},"type":"kibana"} $ oc get pods -l component=kibana NAME READY STATUS RESTARTS AGE kibana-6cb7cbbb97-sgfz2 2/2 Running 0 87s $ oc get kibana NAME MANAGEMENT STATE REPLICAS kibana Managed 1 ###3. Move kibana replicas to 0 and Kibana pod is not deleted as it should be ~~~ $ oc edit clusterlogging instance ... spec: ... visualization: kibana: replicas: 0 ... $ oc get pods -l component=kibana NAME READY STATUS RESTARTS AGE kibana-b66b87b58-5s5m6 2/2 Running 0 32m ### Verify that in the CLO instance, the Kibana replicas is 0 $ oc get clusterlogging instance -o jsonpath='{.spec.visualization}' {"kibana":{"replicas":0,"resources":{"limits":{"memory":"512Mi"},"requests":{"cpu":"500m","memory":"512Mi"}}},"type":"kibana"} ### The Kibana CR maintains the Replicas to 1 instead of 0 $ oc get kibana NAME MANAGEMENT STATE REPLICAS kibana Managed 1 ~~~ 4. Tried to delete from CLO instance the kibana configuration and the Kibana pod remains running ~~~ $ oc get clusterlogging instance -o jsonpath='{.spec.visualization}' $ oc get pods -l component=kibana NAME READY STATUS RESTARTS AGE kibana-b66b87b58-5s5m6 2/2 Running 0 41m $ oc get kibana kibana NAME MANAGEMENT STATE REPLICAS kibana Managed 1 ~~~ The same happens for Elasticsearch if it's modified the CLO for setting in the LogStore stanza `nodeCount: 0`. The fluentd pods are restarted (then, they are detecting a change), but the ES pod continues running. ~~~ $ oc get clusterlogging instance -o jsonpath='{.spec.logStore}' {"elasticsearch":{"nodeCount":0,"redundancyPolicy":"ZeroRedundancy","resources":{"limits":{"memory":"2Gi"},"requests":{"cpu":"200m","memory":"2Gi"}},"storage":{}},"type":"elasticsearch"} $ oc get pods -l component=elasticsearch NAME READY STATUS RESTARTS AGE elasticsearch-cdm-pyyw8xl2-1-7687d9484c-r5pnn 2/2 Running 0 46m $ oc get elasticsearch NAME MANAGEMENT STATE HEALTH NODES DATA NODES SHARD ALLOCATION INDEX MANAGEMENT elasticsearch Managed green 1 1 all $ oc get pods -l component=fluentd NAME READY STATUS RESTARTS AGE fluentd-qmhmd 1/1 Running 0 44m fluentd-r72mj 1/1 Running 0 29s fluentd-rgq87 1/1 Running 0 44m fluentd-tspm2 1/1 Running 0 11s fluentd-tvq6g 1/1 Running 0 41s fluentd-w465b 0/1 ContainerCreating 0 2s ~~~ Now, we delete from the CLO the stanza for Elasticsearch and the pod is deleted (this is one difference with what's happening to Kibana that it's not deleted) ~~~ $ oc edit clusterlogging $ oc get clusterlogging instance -o jsonpath='{.spec.logStore}' $ oc get pods -l component=elasticsearch No resources found in openshift-logging namespace.
(In reply to Oscar Casal Sanchez from comment #5) > ###3. Move kibana replicas to 0 and Kibana pod is not deleted as it should be > ### The Kibana CR maintains the Replicas to 1 instead of 0 > $ oc get kibana > NAME MANAGEMENT STATE REPLICAS > kibana Managed 1 > ~~~ > > 4. Tried to delete from CLO instance the kibana configuration and the Kibana > pod remains running > > ~~~ > $ oc get clusterlogging instance -o jsonpath='{.spec.visualization}' > > $ oc get pods -l component=kibana > NAME READY STATUS RESTARTS AGE > kibana-b66b87b58-5s5m6 2/2 Running 0 41m > > $ oc get kibana kibana > NAME MANAGEMENT STATE REPLICAS > kibana Managed 1 > ~~~ > Let me try to recreate this using your above steps, it seems the issue is stemming from how the Cluster Logging Operator is gating/controlling the kibana CR. > The same happens for Elasticsearch if it's modified the CLO for setting in > the LogStore stanza `nodeCount: 0`. The fluentd pods are restarted (then, > they are detecting a change), but the ES pod continues running. This is because the Elasticsearch Operator is preventing a total scale down (it does this to protect the cluster, a minimum of one master and one data node is required for a cluster). This works as expected and there should be a status message in the elasticsearch CR that explains this.
Thank you Oscar. I'm able to recreate this using your above steps and I can see where in the code it prevents this. It looks like this is a continuation from https://bugzilla.redhat.com/show_bug.cgi?id=1901424
Hello ewolinet, Thank you so much for your update. I'm glad to see that you are able to reproduce it now and that you are able to see where it's failing. My apologies for perhaps missing some steps in the case description that could lead you to not being able to reproduce it. I'll create a solution related to this issue, although, at this moment, I don't feel that it has a high priority since it's a case corner use case, it would be good to fix it in the future. Best regards, Oscar
The fix for this was merged back on May 6th. Not sure why our bot didn't link the fix here nor move this card along https://github.com/openshift/cluster-logging-operator/pull/998
Verified this on issue on clusterlogging.4.6.0-202106021513, elasticsearch-operator.4.6.0-202106100456. Issue is fixed: When kibana replica is set to 0, kibana pods are not created.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Enterprise security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2500