Bug 1392767 - Encountered exception in ES: "Configured service account doesn't have access"
Summary: Encountered exception in ES: "Configured service account doesn't have access"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: ewolinet
QA Contact: Xia Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-08 08:25 UTC by Xia Zhao
Modified: 2017-03-08 18:43 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-01-18 12:50:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
es_log (16.30 KB, text/plain)
2016-11-08 08:31 UTC, Xia Zhao
no flags Details
fluentd_log (10.59 KB, text/plain)
2016-11-08 08:31 UTC, Xia Zhao
no flags Details
kibana log(container: kibana-proxy) (94 bytes, text/plain)
2016-11-08 08:32 UTC, Xia Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Description Xia Zhao 2016-11-08 08:25:12 UTC
Description of problem:
Deploy logging with journald log driver enabled, encountered the following exceptions during ES initialization phase:
.[2016-11-08 00:48:46,849][WARN ][io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider] [Fashima] Exception caught during discovery: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/xiazhao/endpoints/logging-es-cluster. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked..
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/xiazhao/endpoints/logging-es-cluster. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked..
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:290)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:241)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:212)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:205)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:510)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:118)
    at io.fabric8.elasticsearch.cloud.kubernetes.KubernetesAPIServiceImpl.endpoints(KubernetesAPIServiceImpl.java:35)
    at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.readNodes(KubernetesUnicastHostsProvider.java:112)
    at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.lambda$buildDynamicNodes$0(KubernetesUnicastHostsProvider.java:80)
    at java.security.AccessController.doPrivileged(Native Method)
    at io.fabric8.elasticsearch.discovery.kubernetes.KubernetesUnicastHostsProvider.buildDynamicNodes(KubernetesUnicastHostsProvider.java:79)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:335)
    at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2$1.doRun(UnicastZenPing.java:253)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)


[2016-11-08 00:48:48,983][INFO ][cluster.routing.allocation] [Fashima] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[.searchguard.logging-es-574e4loc-1-mv9vx][0]] ...]).
...
[2016-11-08 00:48:49,520][INFO ][cluster.metadata         ] [Fashima] [.searchguard.logging-es-574e4loc-1-mv9vx] create_mapping [actiongroups]
[2016-11-08 00:48:49,580][WARN ][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [Proteus] exception caught on transport layer [[id: 0x508bfe10, /127.0.0.1:57584 => localhost/127.0.0.1:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (response) for requestId [9], handler [org.elasticsearch.action.TransportActionNodeProxy$1@575cf0b8], error [false]; resetting
    at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:145)
    at com.floragunn.searchguard.ssl.transport.SearchGuardMessageChannelHandler.messageReceived(SearchGuardMessageChannelHandler.java:50)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)

Version-Release number of selected component (if applicable):
logging images on brew registry:
openshift3/logging-deployer    5f4ddb8cf40d
openshift3/logging-elasticsearch    9b9452c0f8c2
openshift3/logging-kibana    7fc9916eea4d
openshift3/logging-auth-proxy    0d8e3202af61
openshift3/logging-fluentd    22aea57e7576
openshift3/logging-curator    9af78fc06248

# openshift version
openshift v3.4.0.22+5c56720
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0


How reproducible:
Always

Steps to Reproduce:
1.Deploy logging 3.4.0 with journald log driver enabled
2.Check for logging stacks status:
# oc get po
NAME                          READY     STATUS             RESTARTS   AGE
logging-curator-1-o8s4q       1/1       Running            0          2h
logging-deployer-as85s        0/1       Completed          0          2h
logging-es-574e4loc-1-mv9vx   1/1       Running            0          2h
logging-fluentd-pbwys         1/1       Running            0          2h
logging-kibana-1-1kn2u        1/2       CrashLoopBackOff   31         2h
3.Check exceptions for ES during ES initialization phase

Actual results:
Encountered exception in ES: "Configured service account doesn't have access" and "java.lang.IllegalStateException: Message not fully read"

Expected results:
Should not contain exception in ES logs

Additional info:
EFK logs attached

Comment 1 Xia Zhao 2016-11-08 08:31:22 UTC
Created attachment 1218438 [details]
es_log

Comment 2 Xia Zhao 2016-11-08 08:31:49 UTC
Created attachment 1218439 [details]
fluentd_log

Comment 3 Xia Zhao 2016-11-08 08:32:27 UTC
Created attachment 1218440 [details]
kibana log(container: kibana-proxy)

Comment 4 Xia Zhao 2016-11-08 08:34:56 UTC
Note: The problem of fluentd and kibana-proxy container are separately reflected by https://bugzilla.redhat.com/show_bug.cgi?id=1392104 and https://bugzilla.redhat.com/show_bug.cgi?id=1392104

Comment 5 ewolinet 2016-11-08 14:43:50 UTC
Xia,

The "Configured service account doesn't have access" exception is most likely caused by a missed step with setting up the ES service account.

Can you confirm that the following role binding for the ES service account had been created? `oc get rolebindings`.

Within template/logging-deployer-account-template there should be the following.

    apiVersion: v1
    kind: RoleBinding
    metadata:
      name: logging-elasticsearch-view-role
    roleRef:
      kind: ClusterRole
      name: view
    subjects:
    - kind: ServiceAccount
      name: aggregated-logging-elasticsearch


The other 'error' comes from Searchguard and it looks like its connection closed and it printed out a stacktrace. It later opens the connection back up, so I don't believe this to be a blocker. It can be addressed either as tech-debt, or the logging level for searchguard can be bumped from WARN to ERROR.

Comment 7 Junqi Zhao 2016-11-10 07:13:47 UTC
Using image "openshift3/logging-fluentd    125ed17f96bc" to deploy logging 3.4 with json-file log driver, pods are start up successfully, but afterr checking es pod logs, "java.lang.IllegalStateException: Message not fully read" still exists, the trace log is the same with xiazhao provided before.

oc get po
NAME                              READY     STATUS      RESTARTS   AGE
log4j2-ex-1-fwrbu                 1/1       Running     5          1h
logging-curator-1-9s204           1/1       Running     0          1h
logging-curator-ops-1-13bie       1/1       Running     0          1h
logging-deployer-8153d            0/1       Completed   0          1h
logging-es-ops-5h0r7mx8-1-imba0   1/1       Running     0          1h
logging-es-s07ikxrt-1-ebixj       1/1       Running     0          1h
logging-fluentd-phgk5             1/1       Running     0          1h
logging-kibana-4-r4vcq            2/2       Running     0          1h
logging-kibana-ops-4-9y872        2/2       Running     0          1h

Comment 8 Xia Zhao 2016-11-10 08:16:07 UTC
(In reply to ewolinet from comment #5)
> Xia,
> 
> The "Configured service account doesn't have access" exception is most
> likely caused by a missed step with setting up the ES service account.
> 
> Can you confirm that the following role binding for the ES service account
> had been created? `oc get rolebindings`.
> 
> Within template/logging-deployer-account-template there should be the
> following.
> 
>     apiVersion: v1
>     kind: RoleBinding
>     metadata:
>       name: logging-elasticsearch-view-role
>     roleRef:
>       kind: ClusterRole
>       name: view
>     subjects:
>     - kind: ServiceAccount
>       name: aggregated-logging-elasticsearch
> 
> 
> The other 'error' comes from Searchguard and it looks like its connection
> closed and it printed out a stacktrace. It later opens the connection back
> up, so I don't believe this to be a blocker. It can be addressed either as
> tech-debt, or the logging level for searchguard can be bumped from WARN to
> ERROR.
Checked that role binding logging-elasticsearch-view-role is in place in my env

Comment 9 Rich Megginson 2016-11-10 15:26:36 UTC
So somehow using the journald log driver is causing a stacktrace in elasticsearch?  This doesn't make much sense.  If you do a fresh, clean deploy of everything with the json-file log driver, do you still see the es stacktrace?

Comment 11 ewolinet 2016-11-10 16:14:46 UTC
I'm unable to recreate this on my environment. What is the output from:
  $ oc get rolebindings logging-elasticsearch-view-role -o yaml


What is the login information for the environment you are seeing this at?


I see the following as output on my environment:

$ oc get rolebindings logging-elasticsearch-view-role -o yaml
apiVersion: v1
groupNames: null
kind: RoleBinding
metadata:
  annotations:
    openshift.io/generated-by: OpenShiftNewApp
  creationTimestamp: 2016-11-09T16:32:08Z
  labels:
    app: logging-deployer-account-template
  name: logging-elasticsearch-view-role
  namespace: logging
  resourceVersion: "655"
  selfLink: /oapi/v1/namespaces/logging/rolebindings/logging-elasticsearch-view-role
  uid: 0ed9a976-a69a-11e6-9c30-0e780dd603fe
roleRef:
  name: view
subjects:
- kind: ServiceAccount
  name: aggregated-logging-elasticsearch
  namespace: logging
userNames:
- system:serviceaccount:logging:aggregated-logging-elasticsearch

Comment 15 Xia Zhao 2016-11-15 07:14:29 UTC
(In reply to ewolinet from comment #14)
> Per comment #12, can we confirm if this is reproducible with the latest
> images? It doesn't appear that this issue has been recreated within other
> BZs/tests.

Hi Eric,

With journal log driver, I can still repo the issue inside es and es-ops logs with the latest images on ops registry, the logging-elasticsearch-view-role exist on my env. The issue seemed to happen at the very beginning before  .kibana does not exist in cluster metadata, and es become stable , changed status to GREEN at last. This issue bring no impact to my testing, and please let me know if you think this is expected, thanks:

# oc get rolebindings logging-elasticsearch-view-role -o yamlapiVersion: v1
groupNames: null
kind: RoleBinding
metadata:
  annotations:
    openshift.io/generated-by: OpenShiftNewApp
  creationTimestamp: 2016-11-15T06:56:26Z
  labels:
    app: logging-deployer-account-template
  name: logging-elasticsearch-view-role
  namespace: logging
  resourceVersion: "14301"
  selfLink: /oapi/v1/namespaces/logging/rolebindings/logging-elasticsearch-view-role
  uid: a0ca2c9e-ab00-11e6-b7dc-0eae142e41ae
roleRef:
  name: view
subjects:
- kind: ServiceAccount
  name: aggregated-logging-elasticsearch
  namespace: logging
userNames:
- system:serviceaccount:logging:aggregated-logging-elasticsearch



# oc logs -f logging-es-05pwlpqw-1-k3jz1

...
[2016-11-15 02:00:29,546][INFO ][cluster.metadata         ] [Arthur Parks] [.searchguard.logging-es-05pwlpqw-1-k3jz1] create_mapping [rolesmapping]
[2016-11-15 02:00:29,923][INFO ][cluster.metadata         ] [Arthur Parks] [.searchguard.logging-es-05pwlpqw-1-k3jz1] create_mapping [actiongroups]
[2016-11-15 02:00:30,421][WARN ][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [Mad Jack] exception caught on transport layer [[id: 0x4ca0ff3f, /127.0.0.1:60038 => localhost/127.0.0.1:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (response) for requestId [13], handler [org.elasticsearch.action.TransportActionNodeProxy$1@43e01214], error [false]; resetting
	at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:145)
	at com.floragunn.searchguard.ssl.transport.SearchGuardMessageChannelHandler.messageReceived(SearchGuardMessageChannelHandler.java:50)
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[2016-11-15 02:00:30,869][WARN ][io.fabric8.elasticsearch.plugin.acl.DynamicACLFilter] [Arthur Parks] Trying to seed ACL when ES has not not yet started: 'None of the configured nodes were available: [{Arthur Parks}{v4BuyZyDTq21tFucjZ2A8Q}{172.30.238.162}{localhost/127.0.0.1:9300}{master=true}]'
Create index template 'com.redhat.viaq-openshift-operations.template.json'
{"acknowledged":true}Create index template 'com.redhat.viaq-openshift-project.template.json'



....


[2016-11-15 02:00:53,225][INFO ][cluster.routing.allocation] [Arthur Parks] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[project.logging.99da9038-ab00-11e6-b7dc-0eae142e41ae.2016.11.15][0]] ...]).
[2016-11-15 02:00:53,656][INFO ][cluster.metadata         ] [Arthur Parks] [project.logging.99da9038-ab00-11e6-b7dc-0eae142e41ae.2016.11.15] update_mapping [com.redhat.viaq.common]
[2016-11-15 02:03:36,255][INFO ][cluster.metadata         ] [Arthur Parks] [project.logging.99da9038-ab00-11e6-b7dc-0eae142e41ae.2016.11.15] update_mapping [com.redhat.viaq.common]

Comment 16 ewolinet 2016-11-15 14:13:14 UTC
Xia,

The stack trace you are seeing is a non-issue. It is something in Searchguard we will investigate as technical debt. We noticed it when we switched to a later version of Searchguard.

My apologies, since comment #5 I had thought we were not looking at that stack trace with "Message not fully read" and were only focusing on "Configured service account doesn't have access".

If we are no longer receiving the second message, I think this can be marked as verified.

Comment 17 Xia Zhao 2016-11-16 01:33:43 UTC
@Eric,  Thanks for the info. -- Yes, I no longer received the second message "Configured service account doesn't have access" by using the latest images these days. Closing this as verified.

Comment 18 ewolinet 2016-12-12 15:46:46 UTC
Prerelease issue, no docs needed.

Comment 20 errata-xmlrpc 2017-01-18 12:50:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066


Note You need to log in before you can comment on or make changes to this bug.