Bug 1859004 - Sometimes the eventrouter couldn't gather event logs.
Summary: Sometimes the eventrouter couldn't gather event logs.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.z
Assignee: Vitalii Parfonov
QA Contact: Anping Li
URL:
Whiteboard: logging-core
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-21 04:05 UTC by Qiaoling Tang
Modified: 2021-05-19 13:55 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-04-27 08:58:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift eventrouter pull 13 0 None open Bug 1859004: Sometimes the eventrouter couldn't gather event logs: switch to go mod and update k8s apimatcher version 2021-03-10 08:56:03 UTC
Red Hat Knowledge Base (Solution) 6058431 0 None None None 2021-05-19 13:55:21 UTC

Description Qiaoling Tang 2020-07-21 04:05:02 UTC
Description of problem:
Deploy logging and eventrouter, the eventrouter can't gather event logs, lots of error message in the eventrouter pod:

I0721 00:23:52.890323       1 reflector.go:240] Listing and watching *v1.Event from github.com/openshift/eventrouter/vendor/k8s.io/client-go/informers/factory.go:73
E0721 00:23:52.904347       1 reflector.go:205] github.com/openshift/eventrouter/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Event: v1.EventList: Items: []v1.Event: v1.Event: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, but found u, error found in #10 byte of ...|:{},"k:{\"uid\":\"30|..., bigger context ...|},"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"303e9c69-80bf-4001-9ccf-25c8f1f4c14e\"}":{|...
I0721 00:23:53.904461       1 reflector.go:240] Listing and watching *v1.Event from github.com/openshift/eventrouter/vendor/k8s.io/client-go/informers/factory.go:73
E0721 00:23:53.922332       1 reflector.go:205] github.com/openshift/eventrouter/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Event: v1.EventList: Items: []v1.Event: v1.Event: ObjectMeta: v1.ObjectMeta: readObjectFieldAsBytes: expect : after object field, but found u, error found in #10 byte of ...|:{},"k:{\"uid\":\"30|..., bigger context ...|},"f:metadata":{"f:ownerReferences":{".":{},"k:{\"uid\":\"303e9c69-80bf-4001-9ccf-25c8f1f4c14e\"}":{|...



Version-Release number of selected component (if applicable):
ose-logging-eventrouter-v4.5.0-202007172106.p0
cluster version: 4.5.0-0.nightly-2020-07-20-152128 


How reproducible:
In some clusters, it's 100% reproducible, in some clusters, no such issue

Steps to Reproduce:
1.deploy logging
2.deploy eventrouter in openshift-logging namespace with:
kind: Template
apiVersion: v1
metadata:
  name: eventrouter-template
  annotations:
    description: "A pod forwarding kubernetes events to cluster logging stack."
    tags: "events,EFK,logging, cluster-logging"
objects:
  - kind: ServiceAccount 
    apiVersion: v1
    metadata:
      name: cluster-logging-eventrouter
      namespace: ${NAMESPACE}
  - kind: ClusterRole 
    apiVersion: v1
    metadata:
      name: event-reader
    rules:             
    - apiGroups: [""]
      resources: ["events"]
      verbs: ["get", "watch", "list"]
  - kind: ClusterRoleBinding  
    apiVersion: v1
    metadata:
      name: event-reader-binding
    subjects:
    - kind: ServiceAccount
      name: cluster-logging-eventrouter
      namespace: ${NAMESPACE}
    roleRef:
      kind: ClusterRole
      name: event-reader
  - kind: ConfigMap
    apiVersion: v1
    metadata:
      name: cluster-logging-eventrouter
      namespace: ${NAMESPACE}
    data:
      config.json: |-
        {
          "sink": "stdout"
        }
  - kind: Deployment
    apiVersion: apps/v1
    metadata:
      name: cluster-logging-eventrouter
      namespace: ${NAMESPACE}
      labels:
        component: eventrouter
        logging-infra: eventrouter
        provider: openshift
    spec:
      selector:
        matchLabels:
          component: eventrouter
          logging-infra: eventrouter
          provider: openshift
      replicas: 1
      template:
        metadata:
          labels:
            component: eventrouter
            logging-infra: eventrouter
            provider: openshift
          name: cluster-logging-eventrouter
        spec:
          serviceAccount: cluster-logging-eventrouter
          containers:
            - name: kube-eventrouter
              image: ${IMAGE}
              imagePullPolicy: IfNotPresent
              resources:
                limits:
                  memory: ${MEMORY}
                requests:
                  cpu: ${CPU}
                  memory: ${MEMORY}
              volumeMounts:
              - name: config-volume
                mountPath: /etc/eventrouter
          volumes:
            - name: config-volume
              configMap:
                name: cluster-logging-eventrouter
parameters:
  - name: IMAGE  
    displayName: Image
    value: "image-registry.openshift-image-registry.svc:5000/openshift/ose-logging-eventrouter:latest"
  - name: MEMORY 
    displayName: Memory
    value: "128Mi"
  - name: CPU  
    displayName: CPU
    value: "100m"
  - name: NAMESPACE  
    displayName: Namespace
    value: "openshift-logging"
3.check eventrouter pod logs

Actual results:


Expected results:


Additional info:
must-gather: http://file.apac.redhat.com/~qitang/must-gather.tar.gz

Comment 1 Anping Li 2020-07-21 06:16:53 UTC
The issue couldn't be reproduced.

Comment 2 Periklis Tsirakidis 2020-07-22 12:35:26 UTC
I cannot reproduce this issue. This is not really a bug. I suspect something went wrong on the transfer side and the json transfer was incomplete. I will close for now, until someone can reproduce this.

Comment 8 David Hernández Fernández 2020-11-10 09:12:50 UTC
Also happening in 4.6 for reference, same logs after I tried the template.

Comment 9 Vitalii Parfonov 2020-11-24 21:52:44 UTC
I can't reproduce this bug on 4.7 (current master).

Comment 13 Jeff Cantrill 2021-01-29 22:05:37 UTC
I am unable to reproduce this issue after deploying the pod as described [1]:

Using 4.6 dev cluster:

$ oc version
Client Version: 4.5.0-0.nightly-2020-04-21-103613
Server Version: 4.6.0-0.nightly-2020-09-28-110510
Kubernetes Version: v1.19.0+e465e66

Image info:

    Image:          registry.redhat.io/openshift4/ose-logging-eventrouter:latest
    Image ID:       registry.redhat.io/openshift4/ose-logging-eventrouter@sha256:40433a3b3eaf34126c81d62ca7755675a5e25bc4489793dff7924abe447005ca




[1] https://docs.openshift.com/container-platform/4.6/logging/cluster-logging-eventrouter.html

Comment 14 David Hernández Fernández 2021-02-03 08:23:32 UTC
It's happening sometimes, so I'm expecting that reproducing is not 100% easily. Is there any other information that could affect to the use case that could be causing the issue? Not sure if we can help with the reproducer, if you need any kind of extra log, let us know.

Comment 15 Jeff Cantrill 2021-02-03 21:57:22 UTC
(In reply to David Hernández Fernández from comment #14)
> It's happening sometimes, 

Can you clarify this statement?  Does it run for several days or hours and then stops producing events?
Is it transient in that its a short blip where the pod recovers and continues to collect events?
Is it possible maybe the watch expires and is not cleaned up properly?

Comment 17 Andreas Bleischwitz 2021-02-26 16:17:51 UTC
It works a few hours but it stops working and the following messages appear in the log:

E0226 07:09:06.235292       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=295, ErrCode=NO_ERROR, debug=""
E0226 07:09:06.235562       1 reflector.go:315] github.com/openshift/eventrouter/vendor/k8s.io/client-go/informers/factory.go:73: Failed to watch *v1.Event: Get "https://10.83.0.1:443/api/v1/events?resourceVersion=351123363&timeoutSeconds=578&watch=true": http2: no cached connection was available
I0226 07:09:07.235684       1 reflector.go:240] Listing and watching *v1.Event from github.com/openshift/eventrouter/vendor/k8s.io/client-go/informers/factory.go:73
E0226 07:09:07.235894       1 reflector.go:205] github.com/openshift/eventrouter/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Event: Get "https://10.83.0.1:443/api/v1/events?resourceVersion=0": http2: no cached connection was available
I0226 07:09:08.236123       1 reflector.go:240] Listing and watching *v1.Event from github.com/openshift/eventrouter/vendor/k8s.io/client-go/informers/factory.go:73
E0226 07:09:08.236424       1 reflector.go:205] github.com/openshift/eventrouter/vendor/k8s.io/client-go/informers/factory.go:73: Failed to list *v1.Event: Get "https://10.83.0.1:443/api/v1/events?resourceVersion=0": http2: no cached connection was available

So it looks like the initial connection to the API won't get re-established, once it may get dropped.

Hope it helps to resolve that issue.

/Andreas

Comment 21 benedikt.wagener 2021-03-08 10:34:44 UTC
We are facing the exact same issue. Including the 
E0306 01:39:01.630417       1 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=297, ErrCode=NO_ERROR, debug=""
error.

Hoping this problem will receive some attention soon.

Comment 22 Vitalii Parfonov 2021-03-09 08:04:53 UTC
Thanks this info help us. We already work on solution

Comment 25 Jeff Cantrill 2021-03-22 14:02:15 UTC
Moved to assigned to resolve for 4.6.  Created https://issues.redhat.com/browse/LOG-1230 to address in 5.x

Comment 34 Anping Li 2021-03-31 07:36:39 UTC
No such issue with image provided by Vitalii Parfonov. Either the PR wasn't merged or the lack some packages when building the downstream image.

Comment 41 Anping Li 2021-04-22 02:17:37 UTC
Verified and passed using the image v4.6.26.

Comment 43 errata-xmlrpc 2021-04-27 08:58:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.6.26 security and extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1230


Note You need to log in before you can comment on or make changes to this bug.