Description of problem: Deploy logging with journald log driver enabled, encountered the following exceptions during ES initialization phase: .[2016-11-08 00:48:46,849][WARN ][io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider] [Fashima] Exception caught during discovery: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/xiazhao/endpoints/logging-es-cluster. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked.. io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/xiazhao/endpoints/logging-es-cluster. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked.. at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:290) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:241) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:212) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:205) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:510) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:118) at io.fabric8.elasticsearch.cloud.kubernetes.KubernetesAPIServiceImpl.endpoints(KubernetesAPIServiceImpl.java:35) at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.readNodes(KubernetesUnicastHostsProvider.java:112) at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.lambda$buildDynamicNodes$0(KubernetesUnicastHostsProvider.java:80) at java.security.AccessController.doPrivileged(Native Method) at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.buildDynamicNodes(KubernetesUnicastHostsProvider.java:79) at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:335) at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2$1.doRun(UnicastZenPing.java:253) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) [2016-11-08 00:48:48,983][INFO ][cluster.routing.allocation] [Fashima] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.searchguard.logging-es-574e4loc-1-mv9vx][0]] ...]). ... [2016-11-08 00:48:49,520][INFO ][cluster.metadata ] [Fashima] [.searchguard.logging-es-574e4loc-1-mv9vx] create_mapping [actiongroups] [2016-11-08 00:48:49,580][WARN ][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [Proteus] exception caught on transport layer [[id: 0x508bfe10, /127.0.0.1:57584 => localhost/127.0.0.1:9300]], closing connection java.lang.IllegalStateException: Message not fully read (response) for requestId [9], handler [org.elasticsearch.action.TransportActionNodeProxy$1@575cf0b8], error [false]; resetting at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:145) at com.floragunn.searchguard.ssl.transport.SearchGuardMessageChannelHandler.messageReceived(SearchGuardMessageChannelHandler.java:50) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) Version-Release number of selected component (if applicable): logging images on brew registry: openshift3/logging-deployer 5f4ddb8cf40d openshift3/logging-elasticsearch 9b9452c0f8c2 openshift3/logging-kibana 7fc9916eea4d openshift3/logging-auth-proxy 0d8e3202af61 openshift3/logging-fluentd 22aea57e7576 openshift3/logging-curator 9af78fc06248 # openshift version openshift v3.4.0.22+5c56720 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 How reproducible: Always Steps to Reproduce: 1.Deploy logging 3.4.0 with journald log driver enabled 2.Check for logging stacks status: # oc get po NAME READY STATUS RESTARTS AGE logging-curator-1-o8s4q 1/1 Running 0 2h logging-deployer-as85s 0/1 Completed 0 2h logging-es-574e4loc-1-mv9vx 1/1 Running 0 2h logging-fluentd-pbwys 1/1 Running 0 2h logging-kibana-1-1kn2u 1/2 CrashLoopBackOff 31 2h 3.Check exceptions for ES during ES initialization phase Actual results: Encountered exception in ES: "Configured service account doesn't have access" and "java.lang.IllegalStateException: Message not fully read" Expected results: Should not contain exception in ES logs Additional info: EFK logs attached
Created attachment 1218438 [details] es_log
Created attachment 1218439 [details] fluentd_log
Created attachment 1218440 [details] kibana log(container: kibana-proxy)
Note: The problem of fluentd and kibana-proxy container are separately reflected by https://bugzilla.redhat.com/show_bug.cgi?id=1392104 and https://bugzilla.redhat.com/show_bug.cgi?id=1392104
Xia, The "Configured service account doesn't have access" exception is most likely caused by a missed step with setting up the ES service account. Can you confirm that the following role binding for the ES service account had been created? `oc get rolebindings`. Within template/logging-deployer-account-template there should be the following. apiVersion: v1 kind: RoleBinding metadata: name: logging-elasticsearch-view-role roleRef: kind: ClusterRole name: view subjects: - kind: ServiceAccount name: aggregated-logging-elasticsearch The other 'error' comes from Searchguard and it looks like its connection closed and it printed out a stacktrace. It later opens the connection back up, so I don't believe this to be a blocker. It can be addressed either as tech-debt, or the logging level for searchguard can be bumped from WARN to ERROR.
Using image "openshift3/logging-fluentd 125ed17f96bc" to deploy logging 3.4 with json-file log driver, pods are start up successfully, but afterr checking es pod logs, "java.lang.IllegalStateException: Message not fully read" still exists, the trace log is the same with xiazhao provided before. oc get po NAME READY STATUS RESTARTS AGE log4j2-ex-1-fwrbu 1/1 Running 5 1h logging-curator-1-9s204 1/1 Running 0 1h logging-curator-ops-1-13bie 1/1 Running 0 1h logging-deployer-8153d 0/1 Completed 0 1h logging-es-ops-5h0r7mx8-1-imba0 1/1 Running 0 1h logging-es-s07ikxrt-1-ebixj 1/1 Running 0 1h logging-fluentd-phgk5 1/1 Running 0 1h logging-kibana-4-r4vcq 2/2 Running 0 1h logging-kibana-ops-4-9y872 2/2 Running 0 1h
(In reply to ewolinet from comment #5) > Xia, > > The "Configured service account doesn't have access" exception is most > likely caused by a missed step with setting up the ES service account. > > Can you confirm that the following role binding for the ES service account > had been created? `oc get rolebindings`. > > Within template/logging-deployer-account-template there should be the > following. > > apiVersion: v1 > kind: RoleBinding > metadata: > name: logging-elasticsearch-view-role > roleRef: > kind: ClusterRole > name: view > subjects: > - kind: ServiceAccount > name: aggregated-logging-elasticsearch > > > The other 'error' comes from Searchguard and it looks like its connection > closed and it printed out a stacktrace. It later opens the connection back > up, so I don't believe this to be a blocker. It can be addressed either as > tech-debt, or the logging level for searchguard can be bumped from WARN to > ERROR. Checked that role binding logging-elasticsearch-view-role is in place in my env
So somehow using the journald log driver is causing a stacktrace in elasticsearch? This doesn't make much sense. If you do a fresh, clean deploy of everything with the json-file log driver, do you still see the es stacktrace?
I'm unable to recreate this on my environment. What is the output from: $ oc get rolebindings logging-elasticsearch-view-role -o yaml What is the login information for the environment you are seeing this at? I see the following as output on my environment: $ oc get rolebindings logging-elasticsearch-view-role -o yaml apiVersion: v1 groupNames: null kind: RoleBinding metadata: annotations: openshift.io/generated-by: OpenShiftNewApp creationTimestamp: 2016-11-09T16:32:08Z labels: app: logging-deployer-account-template name: logging-elasticsearch-view-role namespace: logging resourceVersion: "655" selfLink: /oapi/v1/namespaces/logging/rolebindings/logging-elasticsearch-view-role uid: 0ed9a976-a69a-11e6-9c30-0e780dd603fe roleRef: name: view subjects: - kind: ServiceAccount name: aggregated-logging-elasticsearch namespace: logging userNames: - system:serviceaccount:logging:aggregated-logging-elasticsearch
(In reply to ewolinet from comment #14) > Per comment #12, can we confirm if this is reproducible with the latest > images? It doesn't appear that this issue has been recreated within other > BZs/tests. Hi Eric, With journal log driver, I can still repo the issue inside es and es-ops logs with the latest images on ops registry, the logging-elasticsearch-view-role exist on my env. The issue seemed to happen at the very beginning before .kibana does not exist in cluster metadata, and es become stable , changed status to GREEN at last. This issue bring no impact to my testing, and please let me know if you think this is expected, thanks: # oc get rolebindings logging-elasticsearch-view-role -o yamlapiVersion: v1 groupNames: null kind: RoleBinding metadata: annotations: openshift.io/generated-by: OpenShiftNewApp creationTimestamp: 2016-11-15T06:56:26Z labels: app: logging-deployer-account-template name: logging-elasticsearch-view-role namespace: logging resourceVersion: "14301" selfLink: /oapi/v1/namespaces/logging/rolebindings/logging-elasticsearch-view-role uid: a0ca2c9e-ab00-11e6-b7dc-0eae142e41ae roleRef: name: view subjects: - kind: ServiceAccount name: aggregated-logging-elasticsearch namespace: logging userNames: - system:serviceaccount:logging:aggregated-logging-elasticsearch # oc logs -f logging-es-05pwlpqw-1-k3jz1 ... [2016-11-15 02:00:29,546][INFO ][cluster.metadata ] [Arthur Parks] [.searchguard.logging-es-05pwlpqw-1-k3jz1] create_mapping [rolesmapping] [2016-11-15 02:00:29,923][INFO ][cluster.metadata ] [Arthur Parks] [.searchguard.logging-es-05pwlpqw-1-k3jz1] create_mapping [actiongroups] [2016-11-15 02:00:30,421][WARN ][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [Mad Jack] exception caught on transport layer [[id: 0x4ca0ff3f, /127.0.0.1:60038 => localhost/127.0.0.1:9300]], closing connection java.lang.IllegalStateException: Message not fully read (response) for requestId [13], handler [org.elasticsearch.action.TransportActionNodeProxy$1@43e01214], error [false]; resetting at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:145) at com.floragunn.searchguard.ssl.transport.SearchGuardMessageChannelHandler.messageReceived(SearchGuardMessageChannelHandler.java:50) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) [2016-11-15 02:00:30,869][WARN ][io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter] [Arthur Parks] Trying to seed ACL when ES has not not yet started: 'None of the configured nodes were available: [{Arthur Parks}{v4BuyZyDTq21tFucjZ2A8Q}{172.30.238.162}{localhost/127.0.0.1:9300}{master=true}]' Create index template 'com.redhat.viaq-openshift-operations.template.json' {"acknowledged":true}Create index template 'com.redhat.viaq-openshift-project.template.json' .... [2016-11-15 02:00:53,225][INFO ][cluster.routing.allocation] [Arthur Parks] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[project.logging.99da9038-ab00-11e6-b7dc-0eae142e41ae.2016.11.15][0]] ...]). [2016-11-15 02:00:53,656][INFO ][cluster.metadata ] [Arthur Parks] [project.logging.99da9038-ab00-11e6-b7dc-0eae142e41ae.2016.11.15] update_mapping [com.redhat.viaq.common] [2016-11-15 02:03:36,255][INFO ][cluster.metadata ] [Arthur Parks] [project.logging.99da9038-ab00-11e6-b7dc-0eae142e41ae.2016.11.15] update_mapping [com.redhat.viaq.common]
Xia, The stack trace you are seeing is a non-issue. It is something in Searchguard we will investigate as technical debt. We noticed it when we switched to a later version of Searchguard. My apologies, since comment #5 I had thought we were not looking at that stack trace with "Message not fully read" and were only focusing on "Configured service account doesn't have access". If we are no longer receiving the second message, I think this can be marked as verified.
@Eric, Thanks for the info. -- Yes, I no longer received the second message "Configured service account doesn't have access" by using the latest images these days. Closing this as verified.
Prerelease issue, no docs needed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066