Bug 1887357
| Summary: | ES pods can't recover from `Pending` status. | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Qiaoling Tang <qitang> | |
| Component: | Logging | Assignee: | ewolinet | |
| Status: | CLOSED ERRATA | QA Contact: | Qiaoling Tang <qitang> | |
| Severity: | high | Docs Contact: | Rolfe Dlugy-Hegwer <rdlugyhe> | |
| Priority: | unspecified | |||
| Version: | 4.6 | CC: | aos-bugs, ewolinet, periklis, rdlugyhe | |
| Target Milestone: | --- | |||
| Target Release: | 4.7.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | logging-exploration | |||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
* Previously, nodes did not recover from "Pending" status because a software bug did not correctly update their status in the Elasticsearch custom resource (CR). The current release fixes this issue, so the nodes can recover when their status is "Pending."
(link:https://bugzilla.redhat.com/show_bug.cgi?id=1887357[*BZ#1887357*])
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1887943 (view as bug list) | Environment: | ||
| Last Closed: | 2021-02-24 11:21:19 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1880926, 1887943 | |||
@qitang Was the reason the pods were pending initially due to the memory request being too large? (In reply to ewolinet from comment #1) > @qitang > > Was the reason the pods were pending initially due to the memory request > being too large? Yes, when I described the ES pods, the output was: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 26s 0/6 nodes are available: 6 Insufficient memory. Warning FailedScheduling 26s 0/6 nodes are available: 6 Insufficient memory. Tested with quay.io/openshift/origin-elasticsearch-operator@sha256:c59349755eeefe446a5c39a2caf9dce1320a462530e8ac7b9f73fa38bc10a468, the status could be updated, the ES pod could be redeployed.
nodes:
- conditions:
- lastTransitionTime: "2020-10-14T00:41:23Z"
message: '0/6 nodes are available: 6 Insufficient memory.'
reason: Unschedulable
status: "True"
type: Unschedulable
deploymentName: elasticsearch-cdm-n4txturr-1
upgradeStatus:
scheduledUpgrade: "True"
underUpgrade: "True"
upgradePhase: nodeRestarting
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Errata Advisory for Openshift Logging 5.0.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0652 |
Description of problem: 1. deploy EO and CLO 2. create CL instance, request more resources that the cluster has (e.g. memory) 3. ES pods are `Pending`, check the ES instance, it doesn't show why the pods are pending: managementState: Managed nodeSpec: proxyResources: limits: memory: 64Mi requests: cpu: 100m memory: 64Mi resources: requests: memory: 16Gi nodes: - genUUID: 5bx1e4fy nodeCount: 3 proxyResources: {} resources: {} roles: - client - data - master storage: size: 20Gi storageClassName: gp2 redundancyPolicy: SingleRedundancy status: cluster: activePrimaryShards: 0 activeShards: 0 initializingShards: 0 numDataNodes: 0 numNodes: 0 pendingTasks: 0 relocatingShards: 0 status: cluster health unknown unassignedShards: 0 nodes: - deploymentName: elasticsearch-cdm-5bx1e4fy-1 upgradeStatus: {} - deploymentName: elasticsearch-cdm-5bx1e4fy-2 upgradeStatus: {} - deploymentName: elasticsearch-cdm-5bx1e4fy-3 upgradeStatus: {} pods: client: failed: [] notReady: - elasticsearch-cdm-5bx1e4fy-3-6f7d45fd7-fmwlf - elasticsearch-cdm-5bx1e4fy-1-79f4cddd4f-8dc7b - elasticsearch-cdm-5bx1e4fy-2-8c55476bd-dj56j ready: [] data: failed: [] notReady: - elasticsearch-cdm-5bx1e4fy-3-6f7d45fd7-fmwlf - elasticsearch-cdm-5bx1e4fy-1-79f4cddd4f-8dc7b - elasticsearch-cdm-5bx1e4fy-2-8c55476bd-dj56j ready: [] master: failed: [] notReady: - elasticsearch-cdm-5bx1e4fy-1-79f4cddd4f-8dc7b - elasticsearch-cdm-5bx1e4fy-2-8c55476bd-dj56j - elasticsearch-cdm-5bx1e4fy-3-6f7d45fd7-fmwlf ready: [] shardAllocationEnabled: shard allocation unknown $ oc get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-5bf4bc5d44-4tpqm 1/1 Running 0 11m elasticsearch-cdm-5bx1e4fy-1-79f4cddd4f-8dc7b 0/2 Pending 0 10m elasticsearch-cdm-5bx1e4fy-2-8c55476bd-dj56j 0/2 Pending 0 10m elasticsearch-cdm-5bx1e4fy-3-6f7d45fd7-fmwlf 0/2 Pending 0 10m 4. adjust the request memory in CL instance to make the ES pods schedulable 5. wait a few minutes, check the ES pods, all the pods are `Pending`, the request memory is changed in ES CR instance, but the request memory in ES deployment isn't. managementState: Managed nodeSpec: proxyResources: limits: memory: 64Mi requests: cpu: 100m memory: 64Mi resources: requests: memory: 2Gi nodes: - genUUID: 5bx1e4fy nodeCount: 3 proxyResources: {} resources: {} roles: - client - data - master storage: size: 20Gi storageClassName: gp2 redundancyPolicy: SingleRedundancy status: cluster: activePrimaryShards: 0 activeShards: 0 initializingShards: 0 numDataNodes: 0 numNodes: 0 pendingTasks: 0 relocatingShards: 0 status: cluster health unknown unassignedShards: 0 nodes: - deploymentName: elasticsearch-cdm-5bx1e4fy-1 upgradeStatus: {} - deploymentName: elasticsearch-cdm-5bx1e4fy-2 upgradeStatus: {} - deploymentName: elasticsearch-cdm-5bx1e4fy-3 upgradeStatus: {} pods: client: failed: [] notReady: - elasticsearch-cdm-5bx1e4fy-3-6f7d45fd7-fmwlf - elasticsearch-cdm-5bx1e4fy-1-79f4cddd4f-8dc7b - elasticsearch-cdm-5bx1e4fy-2-8c55476bd-dj56j ready: [] data: failed: [] notReady: - elasticsearch-cdm-5bx1e4fy-3-6f7d45fd7-fmwlf - elasticsearch-cdm-5bx1e4fy-1-79f4cddd4f-8dc7b - elasticsearch-cdm-5bx1e4fy-2-8c55476bd-dj56j ready: [] master: failed: [] notReady: - elasticsearch-cdm-5bx1e4fy-1-79f4cddd4f-8dc7b - elasticsearch-cdm-5bx1e4fy-2-8c55476bd-dj56j - elasticsearch-cdm-5bx1e4fy-3-6f7d45fd7-fmwlf ready: [] shardAllocationEnabled: shard allocation unknown EO log: {"level":"error","ts":1602489488.6414049,"logger":"elasticsearch-operator","caller":"k8shandler/reconciler.go:65","msg":"failed to get LowestClusterVersion","cluster":"elasticsearch","namespace":"openshift-logging","error":"Get \"https://elasticsearch.openshift-logging.svc:9200/_cluster/stats/nodes/_all\": dial tcp 172.30.58.115:9200: connect: connection refused"} Version-Release number of selected component (if applicable): elasticsearch-operator.4.6.0-202010030042.p0 How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: