Bug 1740447

Summary:	Failed to upgrade ES pods because cpu limit is set to zero in the deployment
Product:	OpenShift Container Platform	Reporter:	Qiaoling Tang <qitang>
Component:	Logging	Assignee:	Jeff Cantrill <jcantril>
Status:	CLOSED ERRATA	QA Contact:	Qiaoling Tang <qitang>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.1.z	CC:	aos-bugs, rmeggins
Target Milestone:	---	Keywords:	Regression
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1740957 1741350 (view as bug list)		Environment:
Last Closed:	2019-10-16 06:35:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1740957, 1741350

Description Qiaoling Tang 2019-08-13 03:18:13 UTC

Description of problem:
Deploy logging 4.1.4, then try to upgrade logging to 4.1.11, the ES pods couldn't upgrade successfully due to `Error occurred while updating node elasticsearch-cdm-pkyex0nr-1: Deployment.apps \"elasticsearch-cdm-pkyex0nr-1\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit`

$ oc logs -n openshift-operators-redhat elasticsearch-operator-8d644c48-j4d2v
time="2019-08-13T02:56:13Z" level=info msg="Go Version: go1.10.8"
time="2019-08-13T02:56:13Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-08-13T02:56:13Z" level=info msg="operator-sdk Version: 0.0.7"
time="2019-08-13T02:56:13Z" level=info msg="Watching logging.openshift.io/v1, Elasticsearch, , 5000000000"
time="2019-08-13T02:56:42Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 5 shards in preparation for cluster restart"
time="2019-08-13T02:56:42Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-pkyex0nr-1: Deployment.apps \"elasticsearch-cdm-pkyex0nr-1\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit"
time="2019-08-13T02:56:44Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-pkyex0nr-2: Deployment.apps \"elasticsearch-cdm-pkyex0nr-2\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit"
time="2019-08-13T02:56:47Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 2 shards in preparation for cluster restart"
time="2019-08-13T02:56:47Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-pkyex0nr-3: Deployment.apps \"elasticsearch-cdm-pkyex0nr-3\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"600m\": must be less than or equal to cpu limit"
time="2019-08-13T02:56:58Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-13T02:57:10Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 1 shards in preparation for cluster restart"
time="2019-08-13T02:57:34Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 6 shards in preparation for cluster restart"
time="2019-08-13T02:57:47Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 2 shards in preparation for cluster restart"
time="2019-08-13T02:57:59Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 3 shards in preparation for cluster restart"
time="2019-08-13T02:58:10Z" level=warning msg="Unable to perform synchronized flush: Failed to flush 4 shards in preparation for cluster restart"

$ oc get clusterlogging instance -oyaml
  logStore:
    elasticsearch:
      nodeCount: 3
      redundancyPolicy: SingleRedundancy
      resources:
        requests:
          cpu: 600m
          memory: 4Gi
      storage:
        size: 10Gi
        storageClassName: gp2
    type: elasticsearch


$ oc get elasticsearch elasticsearch -oyaml
spec:
  managementState: Managed
  nodeSpec:
    image: image-registry.openshift-image-registry.svc:5000/openshift/ose-logging-elasticsearch5:v4.1.11-201908122027
    resources:
      requests:
        cpu: 600m
        memory: 4Gi
  nodes:
  - genUUID: pkyex0nr
    nodeCount: 3
    resources: {}
    roles:
    - client
    - data
    - master
    storage:
      size: 10Gi
      storageClassName: gp2
  redundancyPolicy: SingleRedundancy


$ oc get deploy elasticsearch-cdm-pkyex0nr-1 -oyaml 

        image: registry.redhat.io/openshift4/ose-logging-elasticsearch5:v4.1.4-201906271212
        imagePullPolicy: IfNotPresent
        name: elasticsearch
        ports:
        - containerPort: 9300
          name: cluster
          protocol: TCP
        - containerPort: 9200
          name: restapi
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /usr/share/elasticsearch/probe/readiness.sh
          failureThreshold: 3
          initialDelaySeconds: 10
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 30
        resources:
          limits:
            cpu: 600m
            memory: 4Gi
          requests:
            cpu: 600m
            memory: 4Gi

Version-Release number of selected component (if applicable):
from:
ose-cluster-logging-operator:v4.1.4-201906271212
ose-elasticsearch-operator:v4.1.4-201906271212

to:
ose-elasticsearch-operator:v4.1.11-201908122027
ose-cluster-logging-operator:v4.1.11-201908122027


How reproducible:
Always

Steps to Reproduce:
1. Deploy logging 4.1.4
2. upgrade it to 4.1.11
3.

Actual results:


Expected results:


Additional info:

Comment 2 Qiaoling Tang 2019-08-16 03:17:29 UTC

Verified with ose-elasticsearch-operator-v4.2.0-201908151419

Comment 3 errata-xmlrpc 2019-10-16 06:35:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922