Bug 1845118
| Summary: | The ES pods couldn't be READY during upgrade. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> | ||||
| Component: | Logging | Assignee: | ewolinet | ||||
| Status: | CLOSED ERRATA | QA Contact: | Anping Li <anli> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 4.5 | CC: | aos-bugs, cruhm, jcantril, lvlcek, scuppett | ||||
| Target Milestone: | --- | Keywords: | Upgrades | ||||
| Target Release: | 4.5.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-07-13 17:43:31 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1844097, 1845964 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Comment 3
Anping Li
2020-06-10 06:03:38 UTC
Created attachment 1696434 [details]
Elasticsearch pod logs
[2020-06-10T05:02:25,233][WARN ][o.e.n.Node ] [elasticsearch-cdm-gz6llair-1] timed out while waiting for initial discovery state - timeout: 30s
[2020-06-10T05:02:25,249][INFO ][o.e.h.n.Netty4HttpServerTransport] [elasticsearch-cdm-gz6llair-1] publish_address {10.129.2.26:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}, {10.129.2.26:9200}
[2020-06-10T05:02:25,249][INFO ][o.e.n.Node ] [elasticsearch-cdm-gz6llair-1] started
[2020-06-10T05:02:25,250][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-cdm-gz6llair-1] 4 Open Distro Security modules loaded so far: [Module [type=DLSFLS, implementing class=com.amazon.opendistroforelasticsearch.security.configuration.OpenDistroSecurityFlsDlsIndexSearcherWrapper], Module [type=MULTITENANCY, implementing class=com.amazon.opendistroforelasticsearch.security.configuration.PrivilegesInterceptorImpl], Module [type=AUDITLOG, implementing class=com.amazon.opendistroforelasticsearch.security.auditlog.impl.AuditLogImpl], Module [type=REST_MANAGEMENT_API, implementing class=com.amazon.opendistroforelasticsearch.security.dlic.rest.api.OpenDistroSecurityRestApiActions]]
[2020-06-10T05:02:27,358][WARN ][o.e.c.NodeConnectionsService] [elasticsearch-cdm-gz6llair-1] failed to connect to node {elasticsearch-cdm-gz6llair-2}{aSdD6kvwSoKMhItLEt59QQ}{ocfiaXSUScWV-CsTn56_Kg}{10.131.0.25}{10.131.0.25:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elasticsearch-cdm-gz6llair-2][10.131.0.25:9300] connect_exception
at org.elasticsearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1309) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
at org.elasticsearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:100) ~[elasticsearch-6.8.1.redhat-6.jar:6.8.1.redhat-6]
at org.elasticsearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:42) ~[elasticsearch-core-6.8.1.redhat-6.jar:6.8.1.redhat-6]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_252]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_252]
at org.elasticsearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:57) ~[elasticsearch-core-6.8.1.redhat-6.jar:6.8.1.redhat-6]
at org.elasticsearch.transport.netty4.Netty4TcpChannel.lambda$new$1(Netty4TcpChannel.java:72) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424) ~[?:?]
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) ~[?:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) ~[?:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) ~[?:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) ~[?:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: 10.131.0.25/10.131.0.25:9300
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
... 6 more
Caused by: java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:?]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327) ~[?:?]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340) ~[?:?]
... 6 more
[2020-06-10T05:02:27,371][INFO ][o.e.c.s.ClusterSettings ] [elasticsearch-cdm-gz6llair-1] updating [cluster.routing.allocation.enable] from [all] to [none]
[2020-06-10T05:02:27,439][INFO ][c.a.o.s.c.IndexBaseConfigurationRepository] [elasticsearch-cdm-gz6llair-1] .security index does not exist yet, use either securityadmin to initialize cluster or wait until cluster is fully formed and up
[2020-06-10T05:02:27,536][ERROR][c.a.o.s.a.BackendRegistry] [elasticsearch-cdm-gz6llair-1] Not yet initialized
[2020-06-10 05:02:28,145][INFO ][container.run ] Elasticsearch is ready and listening
#oc exec -c elasticsearch elasticsearch-cdm-gz6llair-1-678d697457-pjwdv -- es_util --query=_cat/shards
.kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac 0 p STARTED 2 52.6kb 10.129.2.26 elasticsearch-cdm-gz6llair-1
.kibana.647a750f1787408bf50088234ec0edd5a6a9b2ac 0 r UNASSIGNED
.security 0 p UNASSIGNED
.security 0 r UNASSIGNED
.searchguard 0 p STARTED 5 82.8kb 10.131.0.28 elasticsearch-cdm-gz6llair-2
.searchguard 0 r UNASSIGNED
+ oc exec -c elasticsearch elasticsearch-cdm-gz6llair-1-678d697457-pjwdv -- es_util '--query=_cluster/settings?pretty'
{
"persistent" : {
"cluster" : {
"routing" : {
"allocation" : {
"enable" : "primaries"
}
}
},
"discovery" : {
"zen" : {
"minimum_master_nodes" : "2"
}
}
},
"transient" : {
"cluster" : {
"routing" : {
"allocation" : {
"enable" : "none"
}
}
}
}
}
.security are UNASSIGNED. the transient.cluster.routing.allocation.enable=none. After I set transient.cluster.routing.allocation.enable=All. The cluster became ready.
AMoving to MODIFIED. All dependent PRs are merged Verfied on elasticsearch-operator.4.5.0-202006101717 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |