Bug 1664497 - ES pod isn't upgraded when the image tag changed in CLO env vars.
Summary: ES pod isn't upgraded when the image tag changed in CLO env vars.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.1.0
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-09 02:19 UTC by Qiaoling Tang
Modified: 2019-06-04 10:41 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2019-06-04 10:41:38 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:41:43 UTC

Description Qiaoling Tang 2019-01-09 02:19:46 UTC
Description of problem:
Deploy logging via OLM, wait until all pods become running, make sure the managementstate is "Managed" in CRs, check the env vars of CLO:
$ oc exec cluster-logging-operator-846bbdbf46-9sz7w env 
RSYSLOG_IMAGE=docker.io/viaq/rsyslog:latest
WATCH_NAMESPACE=openshift-operators
OPERATOR_NAME=cluster-logging-operator
ELASTICSEARCH_IMAGE=docker.io/openshift/origin-logging-elasticsearch5:latest
FLUENTD_IMAGE=docker.io/openshift/origin-logging-fluentd:latest
KIBANA_IMAGE=docker.io/openshift/origin-logging-kibana5:latest
CURATOR_IMAGE=docker.io/openshift/origin-logging-curator5:latest
OAUTH_PROXY_IMAGE=docker.io/openshift/oauth-proxy:latest

change the env vars of CLO to trigger an upgrade:
$ oc set env deploy/cluster-logging-operator KIBANA_IMAGE=docker.io/openshift/origin-logging-kibana5:v4.0 ELASTICSEARCH_IMAGE=docker.io/openshift/origin-logging-elasticsearch5:v4.0 FLUENTD_IMAGE=docker.io/openshift/origin-logging-fluentd:v4.0 CURATOR_IMAGE=docker.io/openshift/origin-logging-curator5:v4.0 RSYSLOG_IMAGE=docker.io/viaq/rsyslog:8.38.0 OAUTH_PROXY_IMAGE=docker.io/openshift/oauth-proxy:v1.1.0

wait for a while, the CLO pod has been redeployed, check CLO env vars, they are changed:
$ oc exec cluster-logging-operator-9d796994f-ct7nw env |grep IMAGE
ELASTICSEARCH_IMAGE=docker.io/openshift/origin-logging-elasticsearch5:v4.0
FLUENTD_IMAGE=docker.io/openshift/origin-logging-fluentd:v4.0
KIBANA_IMAGE=docker.io/openshift/origin-logging-kibana5:v4.0
CURATOR_IMAGE=docker.io/openshift/origin-logging-curator5:v4.0
OAUTH_PROXY_IMAGE=docker.io/openshift/oauth-proxy:v1.1.0
RSYSLOG_IMAGE=docker.io/viaq/rsyslog:8.38.0

the kibana and fluentd pods have been upgraded, ES pod has been redeployed, but the ES pod still use the default image:
$ oc get pod elasticsearch-clientdatamaster-0-1-84d764899d-5zlml -o yaml |grep image
    image: docker.io/openshift/origin-logging-elasticsearch5:latest
    imagePullPolicy: IfNotPresent
  imagePullSecrets:
    image: docker.io/openshift/origin-logging-elasticsearch5:latest
    imageID: docker.io/openshift/origin-logging-elasticsearch5@sha256:8204af37eb27ff08a2b091f4019d7fa2617a8f5a1449a1b02c4c3dd750f7444e

the image tag in ES deployment and elasticsearch CR are changed:
$ oc get deploy elasticsearch-clientdatamaster-0-1 -o yaml|grep image
        image: docker.io/openshift/origin-logging-elasticsearch5:v4.0
        imagePullPolicy: IfNotPresent
$ oc get elasticsearches.logging.openshift.io elasticsearch -o yaml|grep image
    image: docker.io/openshift/origin-logging-elasticsearch5:v4.0

no new replicset generated for ES pod after changing the env vars for CLO
$ oc get rs
NAME                                            DESIRED   CURRENT   READY     AGE
cluster-logging-operator-846bbdbf46             0         0         0         40m
cluster-logging-operator-9d796994f              1         1         1         33m
elasticsearch-clientdatamaster-0-1-84d764899d   1         1         1         38m
elasticsearch-operator-5c467bd96f               1         1         1         40m
kibana-675b587dfd                               0         0         0         38m
kibana-797c89966d                               1         1         1         33m

logs in EO:
$ oc logs elasticsearch-operator-5c467bd96f-5c2dd
time="2019-01-09T01:16:25Z" level=info msg="Go Version: go1.10.3"
time="2019-01-09T01:16:25Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-01-09T01:16:25Z" level=info msg="operator-sdk Version: 0.0.7"
time="2019-01-09T01:16:25Z" level=info msg="Metrics service elasticsearch-operator created"
time="2019-01-09T01:16:25Z" level=info msg="Watching logging.openshift.io/v1alpha1, Elasticsearch, openshift-operators, 5000000000"
time="2019-01-09T01:17:36Z" level=info msg="Constructing new resource elasticsearch-clientdatamaster-0-1"
time="2019-01-09T01:17:36Z" level=info msg="Updating node resource elasticsearch-clientdatamaster-0-1"
time="2019-01-09T01:27:53Z" level=info msg="Updating node resource elasticsearch-clientdatamaster-0-1"
time="2019-01-09T01:27:53Z" level=info msg="Rolling upgrade: began upgrading node: elasticsearch-clientdatamaster-0-1"
time="2019-01-09T01:27:53Z" level=info msg="Rolling upgrade: waiting for node 'elasticsearch-clientdatamaster-0-1-84d764899d-bt89r' to rejoin the cluster..."

Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME      VERSION                           AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.alpha-2019-01-08-224750   True        False         1h        Cluster version is 4.0.0-0.alpha-2019-01-08-224750


How reproducible:
Always

Steps to Reproduce:
1. Deploy logging via OLM
2. set env var for CLO
3. check pod status and images

Actual results:


Expected results:


Additional info:

Comment 2 ewolinet 2019-01-09 21:17:29 UTC
I was able to recreate this. To resolve this I needed to remove the `paused: True` field from the Deployment to cause it to roll out the pod with the updated image value.

Comment 4 Qiaoling Tang 2019-01-22 03:25:11 UTC
ES pod isn't upgraded, besides, the image tag in es deployment doesn't change after changing the clo env vars, but the es image tag in elasticsearch CR changed.

$ oc exec cluster-logging-operator-5666d54945-hc99z env |grep IMAGE
ELASTICSEARCH_IMAGE=docker.io/openshift/origin-logging-elasticsearch5:v4.0
FLUENTD_IMAGE=docker.io/openshift/origin-logging-fluentd:v4.0
KIBANA_IMAGE=docker.io/openshift/origin-logging-kibana5:v4.0
CURATOR_IMAGE=docker.io/openshift/origin-logging-curator5:v4.0
OAUTH_PROXY_IMAGE=docker.io/openshift/oauth-proxy:v1.1.0
RSYSLOG_IMAGE=docker.io/viaq/rsyslog:8.38.0

$ oc get pod elasticsearch-clientdatamaster-0-1-84d764899d-mjvrt -o yaml |grep image:
    image: docker.io/openshift/origin-logging-elasticsearch5:latest
    image: docker.io/openshift/origin-logging-elasticsearch5:latest

$ oc get deploy elasticsearch-clientdatamaster-0-1 -o yaml |grep image
        image: docker.io/openshift/origin-logging-elasticsearch5:latest
        imagePullPolicy: IfNotPresent

$ oc get elasticsearch -o yaml |grep image
      image: docker.io/openshift/origin-logging-elasticsearch5:v4.0

$ oc get pod
NAME                                                  READY     STATUS    RESTARTS   AGE
cluster-logging-operator-5666d54945-hc99z             1/1       Running   0          3m
elasticsearch-clientdatamaster-0-1-84d764899d-mjvrt   1/1       Running   0          8m
elasticsearch-operator-86599f8849-pvpj5               1/1       Running   0          9m
fluentd-5d8nx                                         1/1       Running   0          2m
fluentd-7658g                                         1/1       Running   0          2m
fluentd-8jq5b                                         1/1       Running   0          2m
fluentd-cwmnq                                         1/1       Running   0          2m
fluentd-ndzch                                         1/1       Running   0          2m
fluentd-ng7n6                                         1/1       Running   0          2m
kibana-797c89966d-7xq6b                               2/2       Running   0          2m
$ oc logs elasticsearch-operator-86599f8849-pvpj5
time="2019-01-22T03:04:16Z" level=info msg="Go Version: go1.10.3"
time="2019-01-22T03:04:16Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-01-22T03:04:16Z" level=info msg="operator-sdk Version: 0.0.7"
time="2019-01-22T03:04:16Z" level=info msg="Metrics service elasticsearch-operator created"
time="2019-01-22T03:04:16Z" level=info msg="Watching logging.openshift.io/v1alpha1, Elasticsearch, openshift-logging, 5000000000"
time="2019-01-22T03:04:48Z" level=info msg="Constructing new resource elasticsearch-clientdatamaster-0-1"
time="2019-01-22T03:04:53Z" level=info msg="Updating node resource to be paused again elasticsearch-clientdatamaster-0-1"
time="2019-01-22T03:10:45Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
time="2019-01-22T03:10:49Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
time="2019-01-22T03:10:54Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
time="2019-01-22T03:10:58Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
time="2019-01-22T03:11:03Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
time="2019-01-22T03:11:08Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
time="2019-01-22T03:11:12Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
time="2019-01-22T03:11:17Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
time="2019-01-22T03:11:21Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
time="2019-01-22T03:11:26Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
time="2019-01-22T03:11:31Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
----snip----
time="2019-01-22T03:24:19Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."
time="2019-01-22T03:24:23Z" level=warning msg="Cluster Rolling Restart requested but cluster isn't ready."

$ oc get pod elasticsearch-operator-86599f8849-pvpj5 -o yaml |grep image
    image: openshift/origin-elasticsearch-operator:latest
    imagePullPolicy: IfNotPresent
  imagePullSecrets:
    image: docker.io/openshift/origin-elasticsearch-operator:latest
    imageID: docker.io/openshift/origin-elasticsearch-operator@sha256:28138a39f8b3db638fc44eff0b43713cfa24f1e0373f1fc7858dd3deae7a53fa

Comment 6 Qiaoling Tang 2019-01-23 01:29:50 UTC
It was SingleRedundancy. 

I tried to set redundancy policy to ZeroRedundancy with nodeCount=1 and redundancy policy: FullRedundancy with nodeCount=3, all of the tests are passed.

Thanks for your correction.

Move this bug to VERIFIED.

Comment 9 errata-xmlrpc 2019-06-04 10:41:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.