Bug 1741350 - Failed to upgrade ES pods because cpu limit is set to zero in the deployment
Summary: Failed to upgrade ES pods because cpu limit is set to zero in the deployment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.1.z
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.1.z
Assignee: Jeff Cantrill
QA Contact: Anping Li
URL:
Whiteboard:
: 1740957 (view as bug list)
Depends On: 1740447
Blocks: 1740957
TreeView+ depends on / blocked
 
Reported: 2019-08-14 20:55 UTC by Jeff Cantrill
Modified: 2020-01-27 13:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1740447
Environment:
Last Closed: 2019-08-28 19:55:01 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift elasticsearch-operator pull 182 None closed Bug 1741350: Set resource limits if they are greater then 0 2020-06-08 07:37:18 UTC
Red Hat Product Errata RHBA-2019:2547 None None None 2019-08-28 19:55:07 UTC

Description Jeff Cantrill 2019-08-14 20:55:33 UTC
+++ This bug was initially created as a clone of Bug #1740447 +++

Description of problem:
Deploy logging 4.1.4, then try to upgrade logging to 4.1.11, the ES pods couldn't upgrade successfully due to `Error occurred while updating node elasticsearch-cdm-pkyex0nr-1: Deployment.apps \"elasticsearch-cdm-pkyex0nr-1\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit`

$ oc logs -n openshift-operators-redhat elasticsearch-operator-8d644c48-j4d2v
time="2019-08-13T02:56:13Z" level=info msg="Go Version: go1.10.8"
time="2019-08-13T02:56:13Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-08-13T02:56:13Z" level=info msg="operator-sdk Version: 0.0.7"
time="2019-08-13T02:56:13Z" level=info msg="Watching logging.openshift.io/v1, Elasticsearch, , 5000000000"
time="2019-08-13T02:56:42Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-13T02:56:42Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-pkyex0nr-1: Deployment.apps \"elasticsearch-cdm-pkyex0nr-1\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit"
time="2019-08-13T02:56:44Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-pkyex0nr-2: Deployment.apps \"elasticsearch-cdm-pkyex0nr-2\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit"
time="2019-08-13T02:56:47Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 2 shards in preparation for cluster restart"
time="2019-08-13T02:56:47Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-pkyex0nr-3: Deployment.apps \"elasticsearch-cdm-pkyex0nr-3\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit"
time="2019-08-13T02:56:58Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-13T02:57:10Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 1 shards in preparation for cluster restart"
time="2019-08-13T02:57:34Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-13T02:57:47Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 2 shards in preparation for cluster restart"
time="2019-08-13T02:57:59Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 3 shards in preparation for cluster restart"
time="2019-08-13T02:58:10Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 4 shards in preparation for cluster restart"

$ oc get clusterlogging instance -oyaml
  logStore:
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: SingleRedundancy
      resources:
        requests:
          cpu: 600m
          memory: 4Gi
      storage:
        size: 10Gi
        storageClassName: gp2
    type: elasticsearch


$ oc get elasticsearch elasticsearch -oyaml
spec:
  managementState: Managed
  nodeSpec:
    image: image-registry.openshift-image-registry.svc:5000/openshift/ose-logging-elasticsearch5:v4.1.11-201908122027
    resources:
      requests:
        cpu: 600m
        memory: 4Gi
  nodes:
  - genUUID: pkyex0nr
    nodeCount: 3
    resources: {}
    roles:
    - client
    - data
    - master
    storage:
      size: 10Gi
      storageClassName: gp2
  redundancyPolicy: SingleRedundancy


$ oc get deploy elasticsearch-cdm-pkyex0nr-1 -oyaml 

        image: registry.redhat.io/openshift4/ose-logging-elasticsearch5:v4.1.4-201906271212
        imagePullPolicy: IfNotPresent
        name: elasticsearch
        ports:
        - containerPort: 9300
          name: cluster
          protocol: TCP
        - containerPort: 9200
          name: restapi
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /usr/share/elasticsearch/probe/readiness.sh
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 30
        resources:
          limits:
            cpu: 600m
            memory: 4Gi
          requests:
            cpu: 600m
            memory: 4Gi

Version-Release number of selected component (if applicable):
from:
ose-cluster-logging-operator:v4.1.4-201906271212
ose-elasticsearch-operator:v4.1.4-201906271212

to:
ose-elasticsearch-operator:v4.1.11-201908122027
ose-cluster-logging-operator:v4.1.11-201908122027


How reproducible:
Always

Steps to Reproduce:
1. Deploy logging 4.1.4
2. upgrade it to 4.1.11
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jeff Cantrill 2019-08-14 20:57:37 UTC
*** Bug 1740957 has been marked as a duplicate of this bug. ***

Comment 2 Anping Li 2019-08-19 12:23:29 UTC
The ES weren't upgraded,  but the ES still working at this phase.
[anli@preserve-anli-slave 41b]$ oc logs elasticsearch-operator-b75f66bdc-8t5qz
time="2019-08-19T12:02:04Z" level=info msg="Go Version: go1.10.8"
time="2019-08-19T12:02:04Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-08-19T12:02:04Z" level=info msg="operator-sdk Version: 0.0.7"
time="2019-08-19T12:02:04Z" level=info msg="Watching logging.openshift.io/v1, Elasticsearch, , 5000000000"
time="2019-08-19T12:02:43Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 7 shards in preparation for cluster restart"
time="2019-08-19T12:02:43Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-sjnug6wx-1: Deployment.apps \"elasticsearch-cdm-sjnug6wx-1\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"200m\": must be less than or equal to cpu limit"
time="2019-08-19T12:02:54Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 7 shards in preparation for cluster restart"
time="2019-08-19T12:02:54Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-sjnug6wx-2: Deployment.apps \"elasticsearch-cdm-sjnug6wx-2\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"200m\": must be less than or equal to cpu limit"
time="2019-08-19T12:03:05Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:03:05Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-sjnug6wx-3: Deployment.apps \"elasticsearch-cdm-sjnug6wx-3\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"200m\": must be less than or equal to cpu limit"
time="2019-08-19T12:03:17Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:03:33Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:03:46Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-19T12:04:06Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:04:19Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:04:39Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:04:54Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:05:10Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:05:23Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:05:43Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-19T12:06:07Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-19T12:06:22Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 7 shards in preparation for cluster restart"
time="2019-08-19T12:06:37Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:06:50Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 8 shards in preparation for cluster restart"
time="2019-08-19T12:07:06Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:07:18Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-19T12:07:33Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:07:48Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:08:01Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 4 shards in preparation for cluster restart"
time="2019-08-19T12:08:17Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:08:31Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:08:43Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-19T12:09:07Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 4 shards in preparation for cluster restart"
time="2019-08-19T12:09:18Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"

Comment 3 Ben Parees 2019-08-19 21:22:11 UTC
Can someone explain how this bug got introduced into 4.1.11 in the first place?  nothing should have gone into 4.1.11 that wasn't first verified at working in 4.2.  Can you link to the 4.1.z PR that introduced it so we can see how it got merged?

Comment 7 Anping Li 2019-08-23 06:59:39 UTC
The clusterlogging could be upgraded from 4.1.4 to 4.1.13.

Comment 9 errata-xmlrpc 2019-08-28 19:55:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2547


Note You need to log in before you can comment on or make changes to this bug.