Bug 1712721
| Summary: | Can't scalep up ES nodes from 3 to N (N>3) in clusterlogging CRD instance. | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Qiaoling Tang <qitang> | |
| Component: | Logging | Assignee: | ewolinet | |
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.1.0 | CC: | aos-bugs, ewolinet, pweil, rmeggins | |
| Target Milestone: | --- | |||
| Target Release: | 4.2.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1712955 (view as bug list) | Environment: | ||
| Last Closed: | 2019-10-16 06:29:13 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1712955 | |||
|
Description
Qiaoling Tang
2019-05-22 06:53:35 UTC
Actual results: the es nodeCount in elasticsearch CRD instance isn't changed after changing es nodeCount from 3 to n (n>3) in the clusterlogging CRD instance Expected results: the es node count should be the same as it in the clusterlogging CRD instance. Additional info: Scaling up es nodes from 1 or 2 to n(n>=3), no issue. Scaling up es nodes from 4 or 5 to 6, no issue. This issue only happens when scaling up from 3 nodes to n(n > 3) nodes The workaround is: 1. change es nodeCount in clusterlogging CRD instance, 2. use `oc delete elasticsearch elasticsearch -n openshift-logging` to delete elasticsearch CRD instance, then the elasticsearch would be recreated, and the nodeCount is what it set in the clusterlogging CRD instance. this should likely be cloned+backported to 4.1.z The issue isn't fixed.
Got error msg in EO pod after change es node count from 3 to 4 in clusterlogging CR instance.
{"level":"error","ts":1561442637.229709,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"openshift-logging/elasticsearch","error":"Failed to reconcile Elasticsearch deployment spec: Unsupported change to UUIDs made: Previously used GenUUID \"jw91ctq6\" is no longer found in Spec.Nodes","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/_output/src/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
$ oc get elasticsearch -oyaml
apiVersion: v1
items:
- apiVersion: logging.openshift.io/v1
kind: Elasticsearch
metadata:
creationTimestamp: "2019-06-25T03:39:34Z"
generation: 37
name: elasticsearch
namespace: openshift-logging
ownerReferences:
- apiVersion: logging.openshift.io/v1
controller: true
kind: ClusterLogging
name: instance
uid: d67cda8e-96fa-11e9-a275-06e6146aca30
resourceVersion: "431049"
selfLink: /apis/logging.openshift.io/v1/namespaces/openshift-logging/elasticsearches/elasticsearch
uid: d9521b52-96fa-11e9-a275-06e6146aca30
spec:
managementState: Managed
nodeSpec:
image: image-registry.openshift-image-registry.svc:5000/openshift/ose-logging-elasticsearch5:latest
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: 200m
memory: 1Gi
nodes:
- nodeCount: 3
resources: {}
roles:
- client
- data
- master
storage:
size: 10Gi
storageClassName: gp2
- nodeCount: 1
resources: {}
roles:
- client
- data
storage:
size: 10Gi
storageClassName: gp2
redundancyPolicy: FullRedundancy
status:
cluster:
activePrimaryShards: 17
activeShards: 23
initializingShards: 0
numDataNodes: 3
numNodes: 3
pendingTasks: 0
relocatingShards: 0
status: green
unassignedShards: 0
clusterHealth: ""
conditions:
- lastTransitionTime: "2019-06-25T06:03:57Z"
message: Previously used GenUUID "jw91ctq6" is no longer found in Spec.Nodes
reason: Invalid Spec
status: "True"
type: InvalidUUID
nodes:
- deploymentName: elasticsearch-cdm-jw91ctq6-1
upgradeStatus: {}
- deploymentName: elasticsearch-cdm-jw91ctq6-2
upgradeStatus: {}
- deploymentName: elasticsearch-cdm-jw91ctq6-3
upgradeStatus: {}
pods:
client:
failed: []
notReady: []
ready:
- elasticsearch-cdm-jw91ctq6-1-fbbd7bfc-nglll
- elasticsearch-cdm-jw91ctq6-2-564f89f647-bhtvm
- elasticsearch-cdm-jw91ctq6-3-86dbf67c7-bhwvg
data:
failed: []
notReady: []
ready:
- elasticsearch-cdm-jw91ctq6-1-fbbd7bfc-nglll
- elasticsearch-cdm-jw91ctq6-2-564f89f647-bhtvm
- elasticsearch-cdm-jw91ctq6-3-86dbf67c7-bhwvg
master:
failed: []
notReady: []
ready:
- elasticsearch-cdm-jw91ctq6-1-fbbd7bfc-nglll
- elasticsearch-cdm-jw91ctq6-2-564f89f647-bhtvm
- elasticsearch-cdm-jw91ctq6-3-86dbf67c7-bhwvg
shardAllocationEnabled: all
kind: List
metadata:
resourceVersion: ""
selfLink: ""
ose-elasticsearch-operator-v4.2.0-201906241432 ose-cluster-logging-operator-v4.2.0-201906241832 Out of curiosity, how much time did you wait for after the initial creation of the clusterlogging object before updating the elasticsearch node count? In c4, it's about several hours. The ES cluster was in green status before I updating the elasticsearch node count. And it alse could be reproduced by: creating clusterlogging instance, waiting for the ES cluster be in Green status, updating the es node count. I can recreate this, looking into why CLO is dropping the uuids Verified in ose-cluster-logging-operator-v4.2.0-201907222219 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |