Bug 1741350

Summary: Failed to upgrade ES pods because cpu limit is set to zero in the deployment
Product: OpenShift Container Platform Reporter: Jeff Cantrill <jcantril>
Component: LoggingAssignee: Jeff Cantrill <jcantril>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.zCC: anli, aos-bugs, bparees, qitang, rmeggins
Target Milestone: ---Keywords: Regression
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1740447 Environment:
Last Closed: 2019-08-28 19:55:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1740447    
Bug Blocks: 1740957    

Description Jeff Cantrill 2019-08-14 20:55:33 UTC
+++ This bug was initially created as a clone of Bug #1740447 +++

Description of problem:
Deploy logging 4.1.4, then try to upgrade logging to 4.1.11, the ES pods couldn't upgrade successfully due to `Error occurred while updating node elasticsearch-cdm-pkyex0nr-1: Deployment.apps \"elasticsearch-cdm-pkyex0nr-1\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit`

$ oc logs -n openshift-operators-redhat elasticsearch-operator-8d644c48-j4d2v
time="2019-08-13T02:56:13Z" level=info msg="Go Version: go1.10.8"
time="2019-08-13T02:56:13Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-08-13T02:56:13Z" level=info msg="operator-sdk Version: 0.0.7"
time="2019-08-13T02:56:13Z" level=info msg="Watching logging.openshift.io/v1, Elasticsearch, , 5000000000"
time="2019-08-13T02:56:42Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-13T02:56:42Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-pkyex0nr-1: Deployment.apps \"elasticsearch-cdm-pkyex0nr-1\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit"
time="2019-08-13T02:56:44Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-pkyex0nr-2: Deployment.apps \"elasticsearch-cdm-pkyex0nr-2\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit"
time="2019-08-13T02:56:47Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 2 shards in preparation for cluster restart"
time="2019-08-13T02:56:47Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-pkyex0nr-3: Deployment.apps \"elasticsearch-cdm-pkyex0nr-3\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit"
time="2019-08-13T02:56:58Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-13T02:57:10Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 1 shards in preparation for cluster restart"
time="2019-08-13T02:57:34Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-13T02:57:47Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 2 shards in preparation for cluster restart"
time="2019-08-13T02:57:59Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 3 shards in preparation for cluster restart"
time="2019-08-13T02:58:10Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 4 shards in preparation for cluster restart"

$ oc get clusterlogging instance -oyaml
  logStore:
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: SingleRedundancy
      resources:
        requests:
          cpu: 600m
          memory: 4Gi
      storage:
        size: 10Gi
        storageClassName: gp2
    type: elasticsearch


$ oc get elasticsearch elasticsearch -oyaml
spec:
  managementState: Managed
  nodeSpec:
    image: image-registry.openshift-image-registry.svc:5000/openshift/ose-logging-elasticsearch5:v4.1.11-201908122027
    resources:
      requests:
        cpu: 600m
        memory: 4Gi
  nodes:
  - genUUID: pkyex0nr
    nodeCount: 3
    resources: {}
    roles:
    - client
    - data
    - master
    storage:
      size: 10Gi
      storageClassName: gp2
  redundancyPolicy: SingleRedundancy


$ oc get deploy elasticsearch-cdm-pkyex0nr-1 -oyaml 

        image: registry.redhat.io/openshift4/ose-logging-elasticsearch5:v4.1.4-201906271212
        imagePullPolicy: IfNotPresent
        name: elasticsearch
        ports:
        - containerPort: 9300
          name: cluster
          protocol: TCP
        - containerPort: 9200
          name: restapi
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /usr/share/elasticsearch/probe/readiness.sh
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 30
        resources:
          limits:
            cpu: 600m
            memory: 4Gi
          requests:
            cpu: 600m
            memory: 4Gi

Version-Release number of selected component (if applicable):
from:
ose-cluster-logging-operator:v4.1.4-201906271212
ose-elasticsearch-operator:v4.1.4-201906271212

to:
ose-elasticsearch-operator:v4.1.11-201908122027
ose-cluster-logging-operator:v4.1.11-201908122027


How reproducible:
Always

Steps to Reproduce:
1. Deploy logging 4.1.4
2. upgrade it to 4.1.11
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jeff Cantrill 2019-08-14 20:57:37 UTC
*** Bug 1740957 has been marked as a duplicate of this bug. ***

Comment 2 Anping Li 2019-08-19 12:23:29 UTC
The ES weren't upgraded,  but the ES still working at this phase.
[anli@preserve-anli-slave 41b]$ oc logs elasticsearch-operator-b75f66bdc-8t5qz
time="2019-08-19T12:02:04Z" level=info msg="Go Version: go1.10.8"
time="2019-08-19T12:02:04Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-08-19T12:02:04Z" level=info msg="operator-sdk Version: 0.0.7"
time="2019-08-19T12:02:04Z" level=info msg="Watching logging.openshift.io/v1, Elasticsearch, , 5000000000"
time="2019-08-19T12:02:43Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 7 shards in preparation for cluster restart"
time="2019-08-19T12:02:43Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-sjnug6wx-1: Deployment.apps \"elasticsearch-cdm-sjnug6wx-1\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"200m\": must be less than or equal to cpu limit"
time="2019-08-19T12:02:54Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 7 shards in preparation for cluster restart"
time="2019-08-19T12:02:54Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-sjnug6wx-2: Deployment.apps \"elasticsearch-cdm-sjnug6wx-2\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"200m\": must be less than or equal to cpu limit"
time="2019-08-19T12:03:05Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:03:05Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-sjnug6wx-3: Deployment.apps \"elasticsearch-cdm-sjnug6wx-3\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"200m\": must be less than or equal to cpu limit"
time="2019-08-19T12:03:17Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:03:33Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:03:46Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-19T12:04:06Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:04:19Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:04:39Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:04:54Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:05:10Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:05:23Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:05:43Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-19T12:06:07Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-19T12:06:22Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 7 shards in preparation for cluster restart"
time="2019-08-19T12:06:37Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:06:50Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 8 shards in preparation for cluster restart"
time="2019-08-19T12:07:06Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:07:18Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-19T12:07:33Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:07:48Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:08:01Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 4 shards in preparation for cluster restart"
time="2019-08-19T12:08:17Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:08:31Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-19T12:08:43Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-19T12:09:07Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 4 shards in preparation for cluster restart"
time="2019-08-19T12:09:18Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"

Comment 3 Ben Parees 2019-08-19 21:22:11 UTC
Can someone explain how this bug got introduced into 4.1.11 in the first place?  nothing should have gone into 4.1.11 that wasn't first verified at working in 4.2.  Can you link to the 4.1.z PR that introduced it so we can see how it got merged?

Comment 7 Anping Li 2019-08-23 06:59:39 UTC
The clusterlogging could be upgraded from 4.1.4 to 4.1.13.

Comment 9 errata-xmlrpc 2019-08-28 19:55:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2547