Description of problem: Deploy logging 4.4 on a 4.4 OCP cluster, then upgrade the cluster to 4.5, after the upgrade finished, check the indices in the ES pods, all the indices are in `yellow` status: $ oc get pod NAME READY STATUS RESTARTS AGE cluster-logging-operator-c4464f-fszv7 1/1 Running 0 12m curator-1589333400-7vl4w 0/1 Completed 0 6m31s elasticsearch-cdm-m2j2lxw9-1-5cd8fb7c9-9n26s 2/2 Running 0 19m elasticsearch-cdm-m2j2lxw9-2-f9f58676-cbtnn 2/2 Running 0 12m elasticsearch-cdm-m2j2lxw9-3-78df69dcf-wjg7l 2/2 Running 0 16m fluentd-8kgwd 1/1 Running 0 34m fluentd-8qn7x 1/1 Running 0 33m fluentd-c95xq 1/1 Running 0 33m fluentd-csxjs 1/1 Running 0 34m fluentd-h585s 1/1 Running 2 34m fluentd-pg7h7 1/1 Running 3 33m kibana-6ff5c8d8f-s5tgq 2/2 Running 0 16m $ oc exec elasticsearch-cdm-m2j2lxw9-1-5cd8fb7c9-9n26s -- indices Defaulting container name to elasticsearch. Use 'oc describe pod/elasticsearch-cdm-m2j2lxw9-1-5cd8fb7c9-9n26s -n openshift-logging' to see all of the containers in this pod. Wed May 13 01:36:29 UTC 2020 health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open project.qitang.5682515f-a751-4bb1-a98c-a1ca0b381376.2020.05.13 LmyeqjBQTuOVO6-HEr03YA 3 1 2081 0 2 2 yellow open .kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac CvSIX5rNRVSCMFEIH9hyeQ 1 1 2 0 0 0 yellow open .operations.2020.05.13 Ob9V0iMiTQ6uTnuu5qOSGg 3 1 1179013 0 1540 1540 yellow open .searchguard UC4ukmZvTc61M5HNPLiqyg 1 1 5 0 0 0 yellow open .kibana 1LLeP9ZRROKU0WyFaGfQfw 1 1 1 0 0 0 $ oc exec elasticsearch-cdm-m2j2lxw9-1-5cd8fb7c9-9n26s -- es_util --query=_cat/nodes?v Defaulting container name to elasticsearch. Use 'oc describe pod/elasticsearch-cdm-m2j2lxw9-1-5cd8fb7c9-9n26s -n openshift-logging' to see all of the containers in this pod. ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 10.128.2.18 34 71 14 0.39 0.83 1.07 mdi * elasticsearch-cdm-m2j2lxw9-1 10.131.0.20 55 69 18 0.33 0.54 0.71 mdi - elasticsearch-cdm-m2j2lxw9-3 10.129.2.5 34 42 18 0.75 0.77 0.68 mdi - elasticsearch-cdm-m2j2lxw9-2 $ oc exec elasticsearch-cdm-m2j2lxw9-1-5cd8fb7c9-9n26s -- es_util --query=_cat/shards?v Defaulting container name to elasticsearch. Use 'oc describe pod/elasticsearch-cdm-m2j2lxw9-1-5cd8fb7c9-9n26s -n openshift-logging' to see all of the containers in this pod. index shard prirep state docs store ip node project.qitang.5682515f-a751-4bb1-a98c-a1ca0b381376.2020.05.13 2 p STARTED 767 1.5mb 10.131.0.20 elasticsearch-cdm-m2j2lxw9-3 project.qitang.5682515f-a751-4bb1-a98c-a1ca0b381376.2020.05.13 2 r UNASSIGNED project.qitang.5682515f-a751-4bb1-a98c-a1ca0b381376.2020.05.13 1 p STARTED 707 1.4mb 10.129.2.5 elasticsearch-cdm-m2j2lxw9-2 project.qitang.5682515f-a751-4bb1-a98c-a1ca0b381376.2020.05.13 1 r UNASSIGNED project.qitang.5682515f-a751-4bb1-a98c-a1ca0b381376.2020.05.13 0 p STARTED 763 988.5kb 10.131.0.20 elasticsearch-cdm-m2j2lxw9-3 project.qitang.5682515f-a751-4bb1-a98c-a1ca0b381376.2020.05.13 0 r UNASSIGNED .kibana 0 p STARTED 1 3.2kb 10.128.2.18 elasticsearch-cdm-m2j2lxw9-1 .kibana 0 r UNASSIGNED .searchguard 0 p STARTED 5 168.1kb 10.131.0.20 elasticsearch-cdm-m2j2lxw9-3 .searchguard 0 r UNASSIGNED .kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac 0 p STARTED 2 27.2kb 10.129.2.5 elasticsearch-cdm-m2j2lxw9-2 .kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac 0 r UNASSIGNED .operations.2020.05.13 2 p STARTED 470875 559.8mb 10.131.0.20 elasticsearch-cdm-m2j2lxw9-3 .operations.2020.05.13 2 r UNASSIGNED .operations.2020.05.13 1 p STARTED 469501 548.9mb 10.129.2.5 elasticsearch-cdm-m2j2lxw9-2 .operations.2020.05.13 1 r UNASSIGNED .operations.2020.05.13 0 p STARTED 469184 536.1mb 10.129.2.5 elasticsearch-cdm-m2j2lxw9-2 .operations.2020.05.13 0 r UNASSIGNED $ oc logs -c elasticsearch elasticsearch-cdm-m2j2lxw9-2-f9f58676-cbtnn [2020-05-13 01:27:42,830][INFO ][container.run ] Begin Elasticsearch startup script [2020-05-13 01:27:42,843][INFO ][container.run ] Comparing the specified RAM to the maximum recommended for Elasticsearch... [2020-05-13 01:27:42,845][INFO ][container.run ] Inspecting the maximum RAM available... [2020-05-13 01:27:42,848][INFO ][container.run ] ES_JAVA_OPTS: ' -Xms2048m -Xmx2048m' [2020-05-13 01:27:42,850][INFO ][container.run ] Copying certs from /etc/openshift/elasticsearch/secret to /etc/elasticsearch/secret [2020-05-13 01:27:42,863][INFO ][container.run ] Building required jks files and truststore Importing keystore /etc/elasticsearch/secret/admin.p12 to /etc/elasticsearch/secret/admin.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/admin.jks -destkeystore /etc/elasticsearch/secret/admin.jks -deststoretype pkcs12". Importing keystore /etc/elasticsearch/secret/elasticsearch.p12 to /etc/elasticsearch/secret/elasticsearch.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/elasticsearch.jks -destkeystore /etc/elasticsearch/secret/elasticsearch.jks -deststoretype pkcs12". Importing keystore /etc/elasticsearch/secret/logging-es.p12 to /etc/elasticsearch/secret/logging-es.jks... Entry for alias 1 successfully imported. Import command completed: 1 entries successfully imported, 0 entries failed or cancelled Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Certificate was added to keystore Warning: The JKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/elasticsearch/secret/logging-es.jks -destkeystore /etc/elasticsearch/secret/logging-es.jks -deststoretype pkcs12". Certificate was added to keystore Certificate was added to keystore [2020-05-13 01:27:45,114][INFO ][container.run ] Setting heap dump location /elasticsearch/persistent/heapdump.hprof [2020-05-13 01:27:45,115][INFO ][container.run ] ES_JAVA_OPTS: ' -Xms2048m -Xmx2048m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Dsg.display_lic_none=false -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.type=unpooled' [2020-05-13 01:27:45,164][INFO ][container.run ] Checking if Elasticsearch is ready ### LICENSE NOTICE Search Guard ### If you use one or more of the following features in production make sure you have a valid Search Guard license (See https://floragunn.com/searchguard-validate-license) * Kibana Multitenancy * LDAP authentication/authorization * Active Directory authentication/authorization * REST Management API * JSON Web Token (JWT) authentication/authorization * Kerberos authentication/authorization * Document- and Fieldlevel Security (DLS/FLS) * Auditlogging In case of any doubt mail to <sales> ################################### ### LICENSE NOTICE Search Guard ### If you use one or more of the following features in production make sure you have a valid Search Guard license (See https://floragunn.com/searchguard-validate-license) * Kibana Multitenancy * LDAP authentication/authorization * Active Directory authentication/authorization * REST Management API * JSON Web Token (JWT) authentication/authorization * Kerberos authentication/authorization * Document- and Fieldlevel Security (DLS/FLS) * Auditlogging In case of any doubt mail to <sales> ################################### Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation. Consider setting -Djdk.tls.rejectClientInitiatedRenegotiation=true to prevent DoS attacks through client side initiated TLS renegotiation. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [2020-05-13 01:27:57,880][INFO ][container.run ] Elasticsearch is ready and listening /usr/share/elasticsearch/init ~ [2020-05-13 01:27:57,925][INFO ][container.run ] Starting init script: 0001-jaeger [2020-05-13 01:27:57,942][INFO ][container.run ] Completed init script: 0001-jaeger [2020-05-13 01:27:58,230][INFO ][container.run ] Forcing the seeding of ACL documents [2020-05-13 01:27:58,240][INFO ][container.run ] Seeding the searchguard ACL index. Will wait up to 604800 seconds. [2020-05-13 01:27:58,345][INFO ][container.run ] Seeding the searchguard ACL index. Will wait up to 604800 seconds. /etc/elasticsearch /usr/share/elasticsearch/init Search Guard Admin v5 Will connect to localhost:9300 ... done ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2 Elasticsearch Version: 5.6.16 Search Guard Version: <unknown> Contacting elasticsearch cluster 'elasticsearch' ... Clustername: elasticsearch Clusterstate: RED Number of nodes: 3 Number of data nodes: 3 .searchguard index already exists, so we do not need to create one. INFO: .searchguard index state is YELLOW, it seems you miss some replicas Populate config from /opt/app-root/src/sgconfig/ Will update 'config' with /opt/app-root/src/sgconfig/sg_config.yml SUCC: Configuration for 'config' created or updated Will update 'roles' with /opt/app-root/src/sgconfig/sg_roles.yml SUCC: Configuration for 'roles' created or updated Will update 'rolesmapping' with /opt/app-root/src/sgconfig/sg_roles_mapping.yml SUCC: Configuration for 'rolesmapping' created or updated Will update 'internalusers' with /opt/app-root/src/sgconfig/sg_internal_users.yml SUCC: Configuration for 'internalusers' created or updated Will update 'actiongroups' with /opt/app-root/src/sgconfig/sg_action_groups.yml SUCC: Configuration for 'actiongroups' created or updated Done with success /usr/share/elasticsearch/init [2020-05-13 01:28:04,919][INFO ][container.run ] Seeded the searchguard ACL index [2020-05-13 01:28:04,924][INFO ][container.run ] Disabling auto replication /etc/elasticsearch /usr/share/elasticsearch/init Search Guard Admin v5 Will connect to localhost:9300 ... done ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2 Elasticsearch Version: 5.6.16 Search Guard Version: <unknown> Reload config on all nodes Auto-expand replicas disabled /usr/share/elasticsearch/init [2020-05-13 01:28:09,844][INFO ][container.run ] Updating replica count to 1 /etc/elasticsearch /usr/share/elasticsearch/init Search Guard Admin v5 Will connect to localhost:9300 ... done ERROR StatusLogger No Log4j 2 configuration file found. Using default configuration (logging only errors to the console), or user programmatically provided configurations. Set system property 'log4j2.debug' to show Log4j 2 internal initialization logging. See https://logging.apache.org/log4j/2.x/manual/configuration.html for instructions on how to configure Log4j 2 Elasticsearch Version: 5.6.16 Search Guard Version: <unknown> Reload config on all nodes Update number of replicas to 1 with result: true /usr/share/elasticsearch/init [2020-05-13 01:28:16,438][INFO ][container.run ] Adding index templates [2020-05-13 01:28:16,861][INFO ][container.run ] Index template 'com.redhat.viaq-openshift-operations.template.json' found in the cluster, overriding it {"acknowledged":true}[2020-05-13 01:28:17,670][INFO ][container.run ] Index template 'com.redhat.viaq-openshift-orphaned.template.json' found in the cluster, overriding it {"acknowledged":true}[2020-05-13 01:28:18,286][INFO ][container.run ] Index template 'com.redhat.viaq-openshift-project.template.json' found in the cluster, overriding it {"acknowledged":true}[2020-05-13 01:28:19,058][INFO ][container.run ] Index template 'common.settings.kibana.template.json' found in the cluster, overriding it {"acknowledged":true}[2020-05-13 01:28:19,574][INFO ][container.run ] Index template 'common.settings.operations.orphaned.json' found in the cluster, overriding it {"acknowledged":true}[2020-05-13 01:28:20,292][INFO ][container.run ] Index template 'common.settings.operations.template.json' found in the cluster, overriding it {"acknowledged":true}[2020-05-13 01:28:20,921][INFO ][container.run ] Index template 'common.settings.project.template.json' found in the cluster, overriding it {"acknowledged":true}[2020-05-13 01:28:21,417][INFO ][container.run ] Index template 'jaeger-service.json' found in the cluster, overriding it {"acknowledged":true}[2020-05-13 01:28:22,135][INFO ][container.run ] Index template 'jaeger-span.json' found in the cluster, overriding it {"acknowledged":true}[2020-05-13 01:28:22,841][INFO ][container.run ] Index template 'org.ovirt.viaq-collectd.template.json' found in the cluster, overriding it {"acknowledged":true}[2020-05-13 01:28:23,068][INFO ][container.run ] Finished adding index templates [2020-05-13 01:28:23,102][INFO ][container.run ] Starting init script: 0500-remove-index-patterns-without-uid [2020-05-13 01:28:23,450][INFO ][container.run ] Found 1 index-patterns to evaluate for removal [2020-05-13 01:28:23,954][INFO ][container.run ] Completed init script: 0500-remove-index-patterns-without-uid with 0 successful and 0 failed bulk requests [2020-05-13 01:28:23,969][INFO ][container.run ] Starting init script: 0510-bz1656086-remove-index-patterns-with-bad-title [2020-05-13 01:28:24,207][INFO ][container.run ] Found 0 index-patterns to remove [2020-05-13 01:28:24,502][INFO ][container.run ] Completed init script: 0510-bz1656086-remove-index-patterns-with-bad-title [2020-05-13 01:28:24,562][INFO ][container.run ] Starting init script: 0520-bz1658632-remove-old-sg-indices [2020-05-13 01:28:25,162][WARN ][container.run ] Found .searchguard setting 'index.routing.allocation.include._name' to be null [2020-05-13 01:28:25,179][INFO ][container.run ] Updating .searchguard setting 'index.routing.allocation.include._name' to be null [2020-05-13 01:28:25,554][INFO ][container.run ] Completed init script: 0520-bz1658632-remove-old-sg-indices [2020-05-13 01:28:25,589][INFO ][container.run ] Starting init script: 0530-bz1667801-fix-kibana-replica-shards [2020-05-13 01:28:25,977][INFO ][container.run ] Found 0 Kibana indices with replica count not equal to 1 [2020-05-13 01:28:26,016][INFO ][container.run ] Completed init script: 0530-bz1667801-fix-kibana-replica-shards ~ May 13, 2020 1:33:02 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:33:03 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:33:03 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:33:03 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:35:04 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:35:04 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:35:04 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:35:04 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:37:30 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:37:30 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:37:30 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:37:30 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:39:34 AM okhttp3.internal.platform.Platform log WARNING: A connection to https://kubernetes.default.svc/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE); May 13, 2020 1:39:34 AM okhttp3.internal.platform.Platform log shardAllocationEnabled is none in the elasticseach CR instance: spec: managementState: Managed nodeSpec: image: quay.io/openshift-qe-optional-operators/ose-logging-elasticsearch5@sha256:52ff8ea1971f59351876ed59d55413e6911848b3578d4254f813fd9e5f53d203 resources: requests: cpu: "1" memory: 4Gi nodes: - genUUID: m2j2lxw9 nodeCount: 3 resources: {} roles: - client - data - master storage: size: 20Gi storageClassName: gp2 redundancyPolicy: SingleRedundancy status: cluster: activePrimaryShards: 9 activeShards: 9 initializingShards: 0 numDataNodes: 3 numNodes: 3 pendingTasks: 0 relocatingShards: 0 status: yellow unassignedShards: 9 clusterHealth: "" conditions: [] nodes: - deploymentName: elasticsearch-cdm-m2j2lxw9-1 upgradeStatus: scheduledUpgrade: "True" upgradePhase: controllerUpdated - deploymentName: elasticsearch-cdm-m2j2lxw9-2 upgradeStatus: upgradePhase: controllerUpdated - deploymentName: elasticsearch-cdm-m2j2lxw9-3 upgradeStatus: scheduledUpgrade: "True" upgradePhase: controllerUpdated pods: client: failed: [] notReady: [] ready: - elasticsearch-cdm-m2j2lxw9-1-5cd8fb7c9-9n26s - elasticsearch-cdm-m2j2lxw9-2-f9f58676-cbtnn - elasticsearch-cdm-m2j2lxw9-3-78df69dcf-wjg7l data: failed: [] notReady: [] ready: - elasticsearch-cdm-m2j2lxw9-1-5cd8fb7c9-9n26s - elasticsearch-cdm-m2j2lxw9-2-f9f58676-cbtnn - elasticsearch-cdm-m2j2lxw9-3-78df69dcf-wjg7l master: failed: [] notReady: [] ready: - elasticsearch-cdm-m2j2lxw9-1-5cd8fb7c9-9n26s - elasticsearch-cdm-m2j2lxw9-2-f9f58676-cbtnn - elasticsearch-cdm-m2j2lxw9-3-78df69dcf-wjg7l shardAllocationEnabled: none $ oc logs -n openshift-operators-redhat elasticsearch-operator-549f7dcfbc-f7w7z time="2020-05-13T01:24:55Z" level=warning msg="Unable to parse loglevel \"\"" {"level":"info","ts":1589333095.497479,"logger":"cmd","msg":"Go Version: go1.13.4"} {"level":"info","ts":1589333095.497508,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1589333095.4975128,"logger":"cmd","msg":"Version of operator-sdk: v0.8.2"} {"level":"info","ts":1589333095.498378,"logger":"leader","msg":"Trying to become the leader."} {"level":"info","ts":1589333096.0980136,"logger":"leader","msg":"No pre-existing lock was found."} {"level":"info","ts":1589333096.1228838,"logger":"leader","msg":"Became the leader."} {"level":"info","ts":1589333096.404264,"logger":"cmd","msg":"Registering Components."} {"level":"info","ts":1589333096.4047568,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"elasticsearch-controller","source":"kind source: /, Kind="} {"level":"info","ts":1589333096.6361701,"logger":"cmd","msg":"failed to create or get service for metrics: services \"elasticsearch-operator\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>"} {"level":"info","ts":1589333096.636197,"logger":"cmd","msg":"Starting the Cmd."} {"level":"info","ts":1589333096.738808,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"} {"level":"info","ts":1589333096.859998,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1} time="2020-05-13T01:24:59Z" level=info msg="Requested to update node 'elasticsearch-cdm-m2j2lxw9-3', which is unschedulable. Skipping rolling restart scenario and performing redeploy now" time="2020-05-13T01:25:32Z" level=warning msg="GetClusterHealthStatus error: Get https://elasticsearch.openshift-logging.svc:9200/_cluster/health: dial tcp 172.30.202.177:9200: i/o timeout\n" time="2020-05-13T01:25:32Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-2: / green" time="2020-05-13T01:25:32Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-2: Cluster not in green state before beginning upgrade: " time="2020-05-13T01:25:36Z" level=info msg="Requested to update node 'elasticsearch-cdm-m2j2lxw9-2', which is unschedulable. Skipping rolling restart scenario and performing redeploy now" time="2020-05-13T01:25:44Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-2: red / green" time="2020-05-13T01:25:44Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-2: Cluster not in green state before beginning upgrade: red" time="2020-05-13T01:26:09Z" level=info msg="Requested to update node 'elasticsearch-cdm-m2j2lxw9-2', which is unschedulable. Skipping rolling restart scenario and performing redeploy now" time="2020-05-13T01:26:39Z" level=info msg="Timed out waiting for node elasticsearch-cdm-m2j2lxw9-2 to rollout" time="2020-05-13T01:26:40Z" level=warning msg="Failed to progress update of unschedulable node 'elasticsearch-cdm-m2j2lxw9-2': timed out waiting for the condition" time="2020-05-13T01:26:42Z" level=info msg="Requested to update node 'elasticsearch-cdm-m2j2lxw9-2', which is unschedulable. Skipping rolling restart scenario and performing redeploy now" time="2020-05-13T01:27:12Z" level=info msg="Timed out waiting for node elasticsearch-cdm-m2j2lxw9-2 to rollout" time="2020-05-13T01:27:12Z" level=warning msg="Failed to progress update of unschedulable node 'elasticsearch-cdm-m2j2lxw9-2': timed out waiting for the condition" time="2020-05-13T01:27:12Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-1: red / green" time="2020-05-13T01:27:12Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-1: Cluster not in green state before beginning upgrade: red" time="2020-05-13T01:27:12Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-3: red / green" time="2020-05-13T01:27:12Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-3: Cluster not in green state before beginning upgrade: red" time="2020-05-13T01:27:13Z" level=info msg="Requested to update node 'elasticsearch-cdm-m2j2lxw9-2', which is unschedulable. Skipping rolling restart scenario and performing redeploy now" E0513 01:27:16.385547 1 reflector.go:251] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to watch *v1.Service: Get https://172.30.0.1:443/api/v1/services?resourceVersion=82093&timeoutSeconds=420&watch=true: dial tcp 172.30.0.1:443: connect: connection refused E0513 01:27:16.387918 1 reflector.go:251] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to watch *v1.ClusterRoleBinding: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?resourceVersion=82675&timeoutSeconds=305&watch=true: dial tcp 172.30.0.1:443: connect: connection refused E0513 01:27:16.393165 1 reflector.go:251] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: Failed to watch *v1.ConfigMap: Get https://172.30.0.1:443/api/v1/configmaps?resourceVersion=82686&timeoutSeconds=493&watch=true: dial tcp 172.30.0.1:443: connect: connection refused W0513 01:27:16.623106 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.ClusterRole ended with: too old resource version: 76975 (80136) W0513 01:27:16.623251 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:196: watch of *v1.PersistentVolumeClaim ended with: too old resource version: 77503 (80119) W0513 01:27:17.507113 1 reflector.go:270] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: watch of *v1.Elasticsearch ended with: too old resource version: 82214 (82692) time="2020-05-13T01:27:43Z" level=info msg="Timed out waiting for node elasticsearch-cdm-m2j2lxw9-2 to rollout" time="2020-05-13T01:27:43Z" level=warning msg="Failed to progress update of unschedulable node 'elasticsearch-cdm-m2j2lxw9-2': timed out waiting for the condition" time="2020-05-13T01:27:43Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-1: red / green" time="2020-05-13T01:27:43Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-1: Cluster not in green state before beginning upgrade: red" time="2020-05-13T01:27:48Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-3: red / green" time="2020-05-13T01:27:48Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-3: Cluster not in green state before beginning upgrade: red" time="2020-05-13T01:27:59Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-1: red / green" time="2020-05-13T01:27:59Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-1: Cluster not in green state before beginning upgrade: red" time="2020-05-13T01:27:59Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-3: red / green" time="2020-05-13T01:27:59Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-3: Cluster not in green state before beginning upgrade: red" time="2020-05-13T01:28:14Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-1: yellow / green" time="2020-05-13T01:28:14Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-1: Cluster not in green state before beginning upgrade: yellow" time="2020-05-13T01:28:14Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-3: yellow / green" time="2020-05-13T01:28:14Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-3: Cluster not in green state before beginning upgrade: yellow" time="2020-05-13T01:28:15Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-1: yellow / green" time="2020-05-13T01:28:15Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-1: Cluster not in green state before beginning upgrade: yellow" time="2020-05-13T01:28:15Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-3: yellow / green" time="2020-05-13T01:28:15Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-3: Cluster not in green state before beginning upgrade: yellow" time="2020-05-13T01:28:29Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-1: yellow / green" time="2020-05-13T01:28:29Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-1: Cluster not in green state before beginning upgrade: yellow" time="2020-05-13T01:28:29Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-3: yellow / green" time="2020-05-13T01:28:29Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-3: Cluster not in green state before beginning upgrade: yellow" time="2020-05-13T01:29:00Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-1: yellow / green" time="2020-05-13T01:29:00Z" level=warning msg="Error occurred while updating node elasticsearch-cdm-m2j2lxw9-1: Cluster not in green state before beginning upgrade: yellow" time="2020-05-13T01:29:00Z" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-3: yellow / green" Version-Release number of selected component (if applicable): clusterlogging.4.4.0-202005120551 elasticsearch-operator.4.4.0-202005120551 cluster version: from 4.4.3 to 4.5.0-0.nightly-2020-05-12-035058 How reproducible: 2/3 Steps to Reproduce: 1. deploy logging 4.4 on a 4.4 cluster 2. upgrade cluster to 4.5 3. check ES status Actual results: Expected results: Additional info: Workaround: manually set shardAllocationEnabled to all after the cluster upgrade finished: es_util --query=_cluster/settings -XPUT -d '{ "transient" : { "cluster.routing.allocation.enable" : "all" } }'
@Periklis, I think that is different. For example : The status is Yellow when cluster.routing.allocation.enable=all. ( https://bugzilla.redhat.com/show_bug.cgi?id=1838153) #oc rsh elasticsearch-cdm-xv9zo8gz-1-cbbd47549-5ksk4 sh-4.2$ es_cluster_health { "cluster_name" : "elasticsearch", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 16, "active_shards" : 16, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 10, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 61.53846153846154 } $ es_util --query=_cluster/settings {"persistent":{"discovery":{"zen":{"minimum_master_nodes":"1"}}},"transient":{"cluster":{"routing":{"allocation":{"enable":"all"}}}}}sh-4.2$ sh-4.2$ sh-4.2$ sh-4.2$ es_util --query=_cat/shards .kibana 0 p STARTED 1 3.2kb 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 project.logjsonx.9b993d19-0818-4232-8940-cf06a750e965.2020.05.23 0 p STARTED 746 485.2kb 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 infra-write 1 p STARTED 0 162b 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 infra-write 1 r UNASSIGNED infra-write 4 p STARTED 0 162b 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 infra-write 4 r UNASSIGNED infra-write 2 p STARTED 0 162b 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 infra-write 2 r UNASSIGNED infra-write 3 p STARTED 0 162b 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 infra-write 3 r UNASSIGNED infra-write 0 p STARTED 0 162b 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 infra-write 0 r UNASSIGNED app-write 1 p STARTED 0 162b 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 app-write 1 r UNASSIGNED app-write 4 p STARTED 0 162b 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 app-write 4 r UNASSIGNED app-write 2 p STARTED 0 162b 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 app-write 2 r UNASSIGNED app-write 3 p STARTED 0 162b 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 app-write 3 r UNASSIGNED app-write 0 p STARTED 0 162b 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 app-write 0 r UNASSIGNED project.logflatx.b1c056f0-4405-45bd-8cea-76338862d9ed.2020.05.23 0 p STARTED 746 542.9kb 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 .searchguard 0 p STARTED 5 145.3kb 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 .kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac 0 p STARTED 4 64.1kb 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1 .operations.2020.05.23 0 p STARTED 31341 30.9mb 10.129.2.58 elasticsearch-cdm-xv9zo8gz-1
Based on your EO logs, it seems like the pods were originally unable to be deployed (by the scheduler) out so two of them bypassed the normal upgrade path: level=info msg="Requested to update node 'elasticsearch-cdm-m2j2lxw9-3', which is unschedulable. Skipping rolling restart scenario and performing redeploy now" level=info msg="Requested to update node 'elasticsearch-cdm-m2j2lxw9-2', which is unschedulable. Skipping rolling restart scenario and performing redeploy now" Then we see we timed out waiting on the second one to rollout, but eventually succeed and moved to do a normal upgrade on the last node: level=info msg="Timed out waiting for node elasticsearch-cdm-m2j2lxw9-2 to rollout" level=warning msg="Failed to progress update of unschedulable node 'elasticsearch-cdm-m2j2lxw9-2': timed out waiting for the condition" level=info msg="Waiting for cluster to be fully recovered before upgrading elasticsearch-cdm-m2j2lxw9-1: red / green" The odd thing is, that bypassing logic doesn't do anything with changing the shard allocation for the cluster. So it's possible it was leftover from a prior upgrade? Also, looking at the elasticsearch CR only one of the nodes it noted to be upgraded in the status... I'll see if I can recreate this.
Per my understanding, the pods were originally unable to be deployed is because during upgrading cluster from 4.4 to 4.5, it should do upgrading on every node, and the nodes were schedule disabled when they were under upgrading. I had checked the `shardAllocationEnabled` status before the cluster was upgraded, it was `all` and seems everything worked well.
Can you retest this? We have since updated the way we do our upgrades to not use the shard allocation of "none" per https://github.com/openshift/elasticsearch-operator/pull/355
It seems the transient.cluster.routing.allocation.enable is none default . If there are new indices, the CLO will change it to all momentarily.
Move to verified. The Logging can be upgraded event transient.cluster.routing.allocation.enable is none. and 4.5, the transient.cluster.routing.allocation.enable=all.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409