Bug 1697728 - The elasticsearch pods weren't updated when redundancyPolicy changed
Summary: The elasticsearch pods weren't updated when redundancyPolicy changed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.2.0
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-09 04:33 UTC by Qiaoling Tang
Modified: 2019-10-16 06:28 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:28:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift elasticsearch-operator pull 171 0 None closed Bug 1697728: updating the index template and existing indices replica count 2020-10-28 21:21:41 UTC
Github openshift elasticsearch-operator pull 173 0 None closed Bug 1697728: Fix template replica update 2020-10-28 21:21:42 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:28:22 UTC

Description Qiaoling Tang 2019-04-09 04:33:40 UTC
Description of problem:
hange redundancyPolicy in clusterlogging CR,  the index_settings  in cm/elasticsearch is updated accordingly. but the elasticsearch pod weren't redeployed.

Version-Release number of selected component (if applicable):
quay.io/openshift/origin-logging-elasticsearch5@sha256:aa20dce94aeb394ec3e8305539fb004ba54855269b85eeb6129b067fce027453
quay.io/openshift/origin-cluster-logging-operator@sha256:949ee74661a3bac7d08084d01ce1375ff51a04f97d28ff59d7e35f49e5065a15
quay.io/openshift/origin-elasticsearch-operator@sha256:094754d814bc586f7d365f675ca7d005318ad8fe66278e467215abd3bdd94760

How reproducible:
Always

Steps to Reproduce:
1. Deploy logging via OLM with 3 es node and FullRedundancy redundancyPolicy

2. check indices in ES pod and check index settings in elasticsearch configmap, the number of replica shard is 2
$ oc exec elasticsearch-cdm-bg2gr79r-1-78c754c47f-n7gv8 -- indices
Defaulting container name to elasticsearch.
Use 'oc describe pod/elasticsearch-cdm-bg2gr79r-1-78c754c47f-n7gv8 -n openshift-logging' to see all of the containers in this pod.
Tue Apr  9 03:19:54 UTC 2019
health status index                                                         uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana                                                       -iTNBd4DSt-APZcAfCSYTQ   1   2          1            0          0              0
green  open   .operations.2019.04.09                                        JQ9zYJmKQFK7kjEjjRARXA   3   2      37782            0        104             34
green  open   .kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac              1eomWYJdQpS83BgLs81uTQ   1   0          2            0          0              0
green  open   .searchguard                                                  LyrBy99XReqPvE88UVU7qg   1   2          5            1          0              0
green  open   project.test1.17c2b9af-5a73-11e9-8644-0aa491d4dce0.2019.04.09 qF5YUSi2TjSgXz-0SPL_UQ   3   2       1079            0          3              1
$ oc get cm elasticsearch -o yaml |grep index_settings -A 3
  index_settings: |2

    PRIMARY_SHARDS=3
    REPLICA_SHARDS=2

3. change redundancyPolicy to "SingleRedundancy" in clusterlogging CR, then check redundancyPolicy in elasticsearch CR and index settings in elasticsearch configmap, the number of replica shard is 1
$ oc get clusterlogging -o yaml |grep Policy
        redundancyPolicy: SingleRedundancy
$ oc get elasticsearch -o yaml |grep Policy
    redundancyPolicy: SingleRedundancy
$ oc get cm elasticsearch -o yaml |grep index_settings -A 3
  index_settings: |2

    PRIMARY_SHARDS=3
    REPLICA_SHARDS=1

4. create new index in ES, then check indices in ES pod, the number of replica shard in the new index is still 2, and it should be 1, see the index "project.test2.xxxxxxxx"
$ oc exec elasticsearch-cdm-bg2gr79r-1-78c754c47f-n7gv8 -- indices
Defaulting container name to elasticsearch.
Use 'oc describe pod/elasticsearch-cdm-bg2gr79r-1-78c754c47f-n7gv8 -n openshift-logging' to see all of the containers in this pod.
Tue Apr  9 03:27:22 UTC 2019
health status index                                                         uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .kibana                                                       -iTNBd4DSt-APZcAfCSYTQ   1   2          1            0          0              0
green  open   project.test2.22e02269-5a77-11e9-8644-0aa491d4dce0.2019.04.09 uQOMz71uRMKrNyopjyyNHA   3   2         45            0          0              0
green  open   .operations.2019.04.09                                        JQ9zYJmKQFK7kjEjjRARXA   3   2      52270            0        146             48
green  open   .kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac              1eomWYJdQpS83BgLs81uTQ   1   0          3            0          0              0
green  open   .searchguard                                                  LyrBy99XReqPvE88UVU7qg   1   2          5            1          0              0
green  open   project.test1.17c2b9af-5a73-11e9-8644-0aa491d4dce0.2019.04.09 qF5YUSi2TjSgXz-0SPL_UQ   3   2       1526            0          3              1


Actual results:


Expected results:


Additional info:

Comment 2 Jeff Cantrill 2019-04-16 16:27:55 UTC
Moving to 4.2

Work around is to wait for the CLO to update the Elasticsearch CR and manually run the init script:

"oc exec -c elasticssearch $espod -- bash -c '$HOME/init.sh'"

Comment 3 Qiaoling Tang 2019-04-17 02:24:38 UTC
(In reply to Jeff Cantrill from comment #2)
> Moving to 4.2
> 
> Work around is to wait for the CLO to update the Elasticsearch CR and
> manually run the init script:
> 
> "oc exec -c elasticssearch $espod -- bash -c '$HOME/init.sh'"

The workaround doesn't work, the index.number_of_replicas in template isn't changed after running the init script.

Comment 4 ewolinet 2019-04-17 16:58:20 UTC
(In reply to Qiaoling Tang from comment #3)
> (In reply to Jeff Cantrill from comment #2)
> > Moving to 4.2
> > 
> > Work around is to wait for the CLO to update the Elasticsearch CR and
> > manually run the init script:
> > 
> > "oc exec -c elasticssearch $espod -- bash -c '$HOME/init.sh'"
> 
> The workaround doesn't work, the index.number_of_replicas in template isn't
> changed after running the init script.

I'm able to recreate this, but I think this issue is actually sourced from the fact that we use `sed -i` of the template files, so the first time it is run it will always have that value (not from the operator):

$ oc exec example-elasticsearch-cdm-vy14npva-1-55b6667ff5-5vxxv -c elasticsearch -- bash -c 'source /usr/share/java/elasticsearch/config/index_settings; echo $PRIMARY_SHARDS; echo $REPLICA_SHARDS'
2
1

$ oc exec example-elasticsearch-cdm-vy14npva-1-55b6667ff5-5vxxv -c elasticsearch -- bash -c 'cat $ES_HOME/index_templates/common.settings.project.template.json'
{
  "order": 5,
  "settings": {
    "index.refresh_interval": "5s",
    "index.number_of_replicas": 0,
    "index.number_of_shards": 1,
    "index.translog.flush_threshold_size": "256mb",
    "index.unassigned.node_left.delayed_timeout": "2m"
  },
  "template": "project*"

Comment 5 ewolinet 2019-04-17 17:38:14 UTC
https://github.com/openshift/origin-aggregated-logging/pull/1604
to fix workaround

Comment 8 ewolinet 2019-04-25 20:53:35 UTC
Workaround fix merged in, will need to wait for latest elasticsearch image to be published before it is available

Comment 12 Qiaoling Tang 2019-07-09 01:32:16 UTC
Thanks for the comment.

The templates on the cluster are not updated after changing the redundancy policy from zeroredundancy to singleredundancy.

$ oc exec elasticsearch-cdm-y65z0agi-2-795b95c7fb-pq5rh -- es_util --query=_template/common.* |jq
Defaulting container name to elasticsearch.
Use 'oc describe pod/elasticsearch-cdm-y65z0agi-2-795b95c7fb-pq5rh -n openshift-logging' to see all of the containers in this pod.
{
  "common.settings.operations.orphaned.json": {
    "order": 5,
    "template": ".orphaned*",
    "settings": {
      "index": {
        "refresh_interval": "5s",
        "unassigned": {
          "node_left": {
            "delayed_timeout": "2m"
          }
        },
        "number_of_shards": "3",
        "translog": {
          "flush_threshold_size": "256mb"
        },
        "number_of_replicas": "0"
      }
    },
    "mappings": {},
    "aliases": {}
  },
  "common.settings.operations.template.json": {
    "order": 5,
    "template": ".operations*",
    "settings": {
      "index": {
        "refresh_interval": "5s",
        "unassigned": {
          "node_left": {
            "delayed_timeout": "2m"
          }
        },
        "number_of_shards": "3",
        "translog": {
          "flush_threshold_size": "256mb"
        },
        "number_of_replicas": "0"
      }
    },
    "mappings": {},
    "aliases": {}
  },
  "common.settings.project.template.json": {
    "order": 5,
    "template": "project*",
    "settings": {
      "index": {
        "refresh_interval": "5s",
        "unassigned": {
          "node_left": {
            "delayed_timeout": "2m"
          }
        },
        "number_of_shards": "3",
        "translog": {
          "flush_threshold_size": "256mb"
        },
        "number_of_replicas": "0"
      }
    },
    "mappings": {},
    "aliases": {}
  },
  "common.settings.kibana.template.json": {
    "order": 0,
    "template": ".kibana*",
    "settings": {
      "index": {
        "number_of_shards": "3",
        "number_of_replicas": "0"
      }
    },
    "mappings": {},
    "aliases": {}
  }
}

$ oc get cm elasticsearch -oyaml |grep index_settings -A 3
  index_settings: |2

    PRIMARY_SHARDS=3
    REPLICA_SHARDS=1

Images:
ose-logging-elasticsearch5-v4.2.0-201907071316
ose-cluster-logging-operator-v4.2.0-201907071316
ose-elasticsearch-operator-v4.2.0-201907071316

Per comment 11, move this bug to ASSIGNED.

Comment 13 ewolinet 2019-07-09 14:58:08 UTC
I can recreate this with an image built from the master branch and see this fixed with https://github.com/openshift/elasticsearch-operator/pull/173

Comment 15 Anping Li 2019-09-03 09:16:53 UTC
Both the confimap and template are updated when i changed the redundancyPolicy.

Comment 16 errata-xmlrpc 2019-10-16 06:28:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.