Bug 1845118 - The ES pods couldn't be READY during upgrade.
Summary: The ES pods couldn't be READY during upgrade.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.5
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.5.0
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
Depends On: 1844097 1845964
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-08 13:51 UTC by OpenShift BugZilla Robot
Modified: 2024-03-25 16:01 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-13 17:43:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Elasticsearch pod logs (33.10 KB, text/plain)
2020-06-10 06:14 UTC, Anping Li
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin-aggregated-logging pull 1922 0 None closed [release-4.5] Bug 1845118: Removing check that keeps only elected master seeding 2021-02-15 18:52:20 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:43:46 UTC

Comment 3 Anping Li 2020-06-10 06:03:38 UTC
The PR has been merged. The ES pods are ready. The ES logs show '[elasticsearch-cdm-gz6llair-1] Not yet initialized '.
The EO logs show 'time="2020-06-10T05:59:10Z" level=error msg="Error creating index template for mapping app: Put https://elasticsearch.openshift-logging.svc:9200/_template/ocp-gen-app: dial tcp 172.30.32.237:9200: i/o timeout'



#oc get pods
NAME                                            READY   STATUS      RESTARTS   AGE
cluster-logging-operator-75549d88c6-lwtl2       1/1     Running     0          13m
curator-1591765200-2bqxh                        0/1     Completed   0          60m
curator-1591765800-zdqlc                        1/1     Running     0          50m
curator-1591766400-hth4p                        1/1     Running     0          40m
curator-1591767000-mfgw8                        1/1     Running     0          31m
curator-1591767600-xkztn                        1/1     Running     0          21m
curator-1591768200-dxlmb                        1/1     Running     0          10m
curator-1591768800-xbxwv                        1/1     Running     0          57s
elasticsearch-cdm-gz6llair-1-678d697457-pjwdv   1/2     Running     0          59m
elasticsearch-cdm-gz6llair-2-c89c57df6-mfvg5    1/2     Running     0          59m
elasticsearch-cdm-gz6llair-3-5669858467-txtjk   1/2     Running     0          60m
fluentd-5w7k7                                   1/1     Running     0          12m
fluentd-849h7                                   1/1     Running     0          10m
fluentd-dq9bv                                   1/1     Running     0          11m
fluentd-f4hkp                                   1/1     Running     0          12m
fluentd-snxxs                                   1/1     Running     0          10m
fluentd-wsdgb                                   1/1     Running     0          10m
kibana-6dd868cdf9-9n8dw                         2/2     Running     0          12m



[2020-06-10T05:56:57,518][ERROR][c.a.o.s.a.BackendRegistry] [elasticsearch-cdm-gz6llair-1] Not yet initialized 
[2020-06-10T05:57:00,160][ERROR][c.a.o.s.a.BackendRegistry] [elasticsearch-cdm-gz6llair-1] Not yet initialized 
[2020-06-10T05:57:01,693][WARN ][r.suppressed             ] [elasticsearch-cdm-gz6llair-1] path: /.security/security/roles, params: {index=.security, id=roles, type=security}
org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [.security][security][roles]: routing [null]]
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.perform(TransportSingleShardAction.java:230) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.start(TransportSingleShardAction.java:209) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:100) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:62) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]


time="2020-06-10T05:59:10Z" level=error msg="Error creating index template for mapping app: Put https://elasticsearch.openshift-logging.svc:9200/_template/ocp-gen-app: dial tcp 172.30.32.237:9200: i/o timeout"
{"level":"error","ts":1591768750.254178,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"elasticsearch-controller","request":"openshift-logging/elasticsearch","error":"Failed to reconcile IndexMangement for Elasticsearch cluster: Put https://elasticsearch.openshift-logging.svc:9200/_template/ocp-gen-app: dial tcp 172.30.32.237:9200: i/o timeout","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/openshift/elasticsearch-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

Comment 4 Anping Li 2020-06-10 06:14:05 UTC
Created attachment 1696434 [details]
Elasticsearch pod logs

[2020-06-10T05:02:25,233][WARN ][o.e.n.Node               ] [elasticsearch-cdm-gz6llair-1] timed out while waiting for initial discovery state - timeout: 30s
[2020-06-10T05:02:25,249][INFO ][o.e.h.n.Netty4HttpServerTransport] [elasticsearch-cdm-gz6llair-1] publish_address {10.129.2.26:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}, {10.129.2.26:9200}
[2020-06-10T05:02:25,249][INFO ][o.e.n.Node               ] [elasticsearch-cdm-gz6llair-1] started
[2020-06-10T05:02:25,250][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-gz6llair-1] 4 Open Distro Security modules loaded so far: [Module [type=DLSFLS, implementing class=com.amazon.opendistroforelasticsearch.security.configuration.OpenDistroSecurityFlsDlsIndexSearcherWrapper], Module [type=MULTITENANCY, implementing class=com.amazon.opendistroforelasticsearch.security.configuration.PrivilegesInterceptorImpl], Module [type=AUDITLOG, implementing class=com.amazon.opendistroforelasticsearch.security.auditlog.impl.AuditLogImpl], Module [type=REST_MANAGEMENT_API, implementing class=com.amazon.opendistroforelasticsearch.security.dlic.rest.api.OpenDistroSecurityRestApiActions]]
[2020-06-10T05:02:27,358][WARN ][o.e.c.NodeConnectionsService] [elasticsearch-cdm-gz6llair-1] failed to connect to node {elasticsearch-cdm-gz6llair-2}{aSdD6kvwSoKMhItLEt59QQ}{ocfiaXSUScWV-CsTn56_Kg}{10.131.0.25}{10.131.0.25:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elasticsearch-cdm-gz6llair-2][10.131.0.25:9300] connect_exception
	at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1309) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:100) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_252]
	at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-6.8.1.redhat-6.jar:6.8.1.redhat-6]
	at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$new$1(Netty4TcpChannel.java:72) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: 10.131.0.25/10.131.0.25:9300
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:?]
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
	... 6 more
Caused by: java.net.NoRouteToHostException: No route to host
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:?]
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
	... 6 more
[2020-06-10T05:02:27,371][INFO ][o.e.c.s.ClusterSettings  ] [elasticsearch-cdm-gz6llair-1] updating [cluster.routing.allocation.enable] from [all] to [none]
[2020-06-10T05:02:27,439][INFO ][c.a.o.s.c.IndexBaseConfigurationRepository] [elasticsearch-cdm-gz6llair-1] .security index does not exist yet, use either securityadmin to initialize cluster or wait until cluster is fully formed and up
[2020-06-10T05:02:27,536][ERROR][c.a.o.s.a.BackendRegistry] [elasticsearch-cdm-gz6llair-1] Not yet initialized 
[2020-06-10 05:02:28,145][INFO ][container.run            ] Elasticsearch is ready and listening

Comment 5 Anping Li 2020-06-10 06:20:20 UTC
                      
#oc exec -c elasticsearch elasticsearch-cdm-gz6llair-1-678d697457-pjwdv -- es_util --query=_cat/shards
.kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac 0 p STARTED         2  52.6kb 10.129.2.26 elasticsearch-cdm-gz6llair-1
.kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac 0 r UNASSIGNED                            
.security                                        0 p UNASSIGNED                            
.security                                        0 r UNASSIGNED                            
.searchguard                                     0 p STARTED         5  82.8kb 10.131.0.28 elasticsearch-cdm-gz6llair-2
.searchguard                                     0 r UNASSIGNED                            

+ oc exec -c elasticsearch elasticsearch-cdm-gz6llair-1-678d697457-pjwdv -- es_util '--query=_cluster/settings?pretty'
{
  "persistent" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "enable" : "primaries"
        }
      }
    },
    "discovery" : {
      "zen" : {
        "minimum_master_nodes" : "2"
      }
    }
  },
  "transient" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "enable" : "none"
        }
      }
    }
  }
}



.security are UNASSIGNED.  the transient.cluster.routing.allocation.enable=none. After I set transient.cluster.routing.allocation.enable=All. The cluster became ready.

Comment 6 Jeff Cantrill 2020-06-11 17:11:53 UTC
AMoving to MODIFIED.  All dependent PRs are merged

Comment 8 Anping Li 2020-06-16 04:59:53 UTC
Verfied on elasticsearch-operator.4.5.0-202006101717

Comment 9 errata-xmlrpc 2020-07-13 17:43:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.