Description of problem: Deploy logging with 3 ES nodes, then wait until all pods running, change es node count to 4 in clusterlogging CRD instance, wait for about 10 minutes, the number of es node count is still 3 in the elasticsearch CRD instance. No logs in cluster-logging-operator pod. Add ES nodes from 2 to 4 in the clusterlogging CRD instance, the ES node count can be changed to 4 in the elasticsearch CRD instance, and the ES pods could be scaled up. Find log `level=info msg="Elasticsearch node configuration change found, updating elasticsearch"` in cluster-logging-operator pod. Version-Release number of selected component (if applicable): image-registry.openshift-image-registry.svc:5000/openshift/ose-cluster-logging-operator:v4.1.0-201905191700 How reproducible: Always Steps to Reproduce: 1.Deploy logging via OLM, set es node count to 3 in the clusterlogging CRD instance 2.wait until all logging pods running, change es node count to 4 in clusterlogging CRD instance 3.check pods in `openshift-logging` namespace, and check the es node count in elasticsearch CRD instance and clusterlogging CRD instance Actual results: Expected results: Additional info:
Actual results: the es nodeCount in elasticsearch CRD instance isn't changed after changing es nodeCount from 3 to n (n>3) in the clusterlogging CRD instance Expected results: the es node count should be the same as it in the clusterlogging CRD instance. Additional info: Scaling up es nodes from 1 or 2 to n(n>=3), no issue. Scaling up es nodes from 4 or 5 to 6, no issue. This issue only happens when scaling up from 3 nodes to n(n > 3) nodes The workaround is: 1. change es nodeCount in clusterlogging CRD instance, 2. use `oc delete elasticsearch elasticsearch -n openshift-logging` to delete elasticsearch CRD instance, then the elasticsearch would be recreated, and the nodeCount is what it set in the clusterlogging CRD instance.
this should likely be cloned+backported to 4.1.z
The issue isn't fixed. Got error msg in EO pod after change es node count from 3 to 4 in clusterlogging CR instance. {"level":"error","ts":1561442637.229709,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"openshift-logging/elasticsearch","error":"Failed to reconcile Elasticsearch deployment spec: Unsupported change to UUIDs made: Previously used GenUUID \"jw91ctq6\" is no longer found in Spec.Nodes","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} $ oc get elasticsearch -oyaml apiVersion: v1 items: - apiVersion: logging.openshift.io/v1 kind: Elasticsearch metadata: creationTimestamp: "2019-06-25T03:39:34Z" generation: 37 name: elasticsearch namespace: openshift-logging ownerReferences: - apiVersion: logging.openshift.io/v1 controller: true kind: ClusterLogging name: instance uid: d67cda8e-96fa-11e9-a275-06e6146aca30 resourceVersion: "431049" selfLink: /apis/logging.openshift.io/v1/namespaces/openshift-logging/elasticsearches/elasticsearch uid: d9521b52-96fa-11e9-a275-06e6146aca30 spec: managementState: Managed nodeSpec: image: image-registry.openshift-image-registry.svc:5000/openshift/ose-logging-elasticsearch5:latest resources: limits: cpu: "1" memory: 2Gi requests: cpu: 200m memory: 1Gi nodes: - nodeCount: 3 resources: {} roles: - client - data - master storage: size: 10Gi storageClassName: gp2 - nodeCount: 1 resources: {} roles: - client - data storage: size: 10Gi storageClassName: gp2 redundancyPolicy: FullRedundancy status: cluster: activePrimaryShards: 17 activeShards: 23 initializingShards: 0 numDataNodes: 3 numNodes: 3 pendingTasks: 0 relocatingShards: 0 status: green unassignedShards: 0 clusterHealth: "" conditions: - lastTransitionTime: "2019-06-25T06:03:57Z" message: Previously used GenUUID "jw91ctq6" is no longer found in Spec.Nodes reason: Invalid Spec status: "True" type: InvalidUUID nodes: - deploymentName: elasticsearch-cdm-jw91ctq6-1 upgradeStatus: {} - deploymentName: elasticsearch-cdm-jw91ctq6-2 upgradeStatus: {} - deploymentName: elasticsearch-cdm-jw91ctq6-3 upgradeStatus: {} pods: client: failed: [] notReady: [] ready: - elasticsearch-cdm-jw91ctq6-1-fbbd7bfc-nglll - elasticsearch-cdm-jw91ctq6-2-564f89f647-bhtvm - elasticsearch-cdm-jw91ctq6-3-86dbf67c7-bhwvg data: failed: [] notReady: [] ready: - elasticsearch-cdm-jw91ctq6-1-fbbd7bfc-nglll - elasticsearch-cdm-jw91ctq6-2-564f89f647-bhtvm - elasticsearch-cdm-jw91ctq6-3-86dbf67c7-bhwvg master: failed: [] notReady: [] ready: - elasticsearch-cdm-jw91ctq6-1-fbbd7bfc-nglll - elasticsearch-cdm-jw91ctq6-2-564f89f647-bhtvm - elasticsearch-cdm-jw91ctq6-3-86dbf67c7-bhwvg shardAllocationEnabled: all kind: List metadata: resourceVersion: "" selfLink: ""
ose-elasticsearch-operator-v4.2.0-201906241432
ose-cluster-logging-operator-v4.2.0-201906241832
Out of curiosity, how much time did you wait for after the initial creation of the clusterlogging object before updating the elasticsearch node count?
In c4, it's about several hours. The ES cluster was in green status before I updating the elasticsearch node count. And it alse could be reproduced by: creating clusterlogging instance, waiting for the ES cluster be in Green status, updating the es node count.
I can recreate this, looking into why CLO is dropping the uuids
https://github.com/openshift/cluster-logging-operator/pull/205
Verified in ose-cluster-logging-operator-v4.2.0-201907222219
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922