Bug 1907370 - Log Forwarding to kafka Fails with error rror_class=Kafka::DeliveryFailed error="Failed to send messages to flux-openshift-v4/1"
Summary: Log Forwarding to kafka Fails with error rror_class=Kafka::DeliveryFailed err...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Rolfe Dlugy-Hegwer
QA Contact: Xiaoli Tian
Vikram Goyal
URL:
Whiteboard: logging-core
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-14 11:11 UTC by puraut
Modified: 2024-03-25 17:31 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
* Fluentd pods with the ruby-kafka-1.1.0 and fluent-plugin-kafka-0.13.1 gems are not compatible with Apache Kafka version 0.10.1.0. + As a result, log forwarding to Kafka fails with a message: error_class=Kafka::DeliveryFailed error="Failed to send messages to flux-openshift-v4/1" + The ruby-kafka-0.7 gem dropped support for Kafka 0.10 in favor of native support for Kafka 0.11. The ruby-kafka-1.0.0 gem added support for Kafka 2.3 and 2.4. The current version of OpenShift Logging tests and therefore supports Kafka version 2.4.1. + To work around this issue, upgrade to a supported version of Apache Kafka. + (link:https://bugzilla.redhat.com/show_bug.cgi?id=1907370[*BZ#1907370*])
Clone Of:
Environment:
Last Closed: 2021-02-09 19:52:20 UTC
Target Upstream Version:
Embargoed:
rdlugyhe: needinfo-
rdlugyhe: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5820051 0 None None None 2021-02-19 19:17:01 UTC

Comment 1 Jeff Cantrill 2020-12-14 17:09:58 UTC
(In reply to puraut from comment #0)
> Description of problem:
> The customer is trying to forward the logs to kafka broker using an official
> document.[1] 

Please provide the link to the documentation you are referencing.
 
> Steps to Reproduce:
> 1.Install Kafka 0.10  and try to forward logs  from ES to kafka

OpenShift logging does not support forwarding logs from ES to Kafka.  Please provide more information on how this error is produced

> Expected results:
> It should able to forward  logs to Kafka 
> 
> Additional info:
> 1] I  tried to reproduce the issue with kafka 2.5.0 with help of AWS AMI  at
> the start I faced the same error but after changing the configuration on
> Kafka side I am able to see the logs.
> 2]customer has tried to configuring fluentd agent on a legacy server. This
> fluentd agent sends logs to Kafka.

Please clarify what is meant by "fluentd" agent and "legacy" server?  Do you mean fluentd provided by the customer or provided by OpenShift?  The OpenShift fluentd image with the Kafka plugin is a relatively recent addition and full support via logforwarding has only been in GA since 4.6 release.

Whereas it fails to do the same with
> RHOCP4.6.
> 3] I tried to reproduce the same with Kafka
> 0.10[bitnami-kafka-0.10.1.1-0-linux-ubuntu-14.04.3-x86_64-hvm-ebs] with aws
> community AMI but got the same error

Comment 4 Jeff Cantrill 2020-12-18 12:03:26 UTC
(In reply to Yogiraj Karajagi from comment #3)

> --- 
> To be more clear, on my openshift v3.11 cluster, Fluentd pods send logs to a
> fluent td-agent installed on a redhat vmware server. This td-agent push log
> to kafka.
> The td-agent version is td-agent-3.0.1-0.el7.x86_64.rpm.
> It can be downloaded on fluentd site at
> https://td-agent-package-browser.herokuapp.com/3/redhat/7/x86_64.
>

The Openshift logging team does not support the configuration you describe.  You are referencing issues with a component and deployment that are both administered by the customer and delivered by at least a different engineering group and possibly a different company.  'td-agent' is the commercial product name Treasure Data gives to fluentd.  I recommend opening an issue directly with Treasure Data.

Closing this issue CANTFIX

Comment 9 Oscar Casal Sanchez 2021-01-12 11:14:16 UTC
Hello, 

I was able to reproduce this issue in our labs. For doing it:


[Steps to Reproduce]

1. Deploy an OCP 4.6 cluster
2. Deploy the Cluster Logging stack following the documentation [1]
3. Configure a kafka broker using the version 0.10.1.1. This version it's supported as it's indicated here [2] where it says:

    Ruby 2.1 or later
    Input plugins work with kafka v0.9 or later
    Output plugins work with kafka v0.8 or later

4. Configure the cluster logging stack to forwarding logs to a Kafka broker following the documentation
5. Check the fluent logs with the error sending to the Kafka broker outside:

~~~
2020-12-10 15:24:06 +0000 [warn]: Send exception occurred: Failed to send messages to flux-openshift-v4/1
2020-12-10 15:24:06 +0000 [warn]: Exception Backtrace : /usr/local/share/gems/gems/fluent-plugin-kafka-0.13.1/lib/fluent/plugin/kafka_producer_ext.rb:240:in `deliver_messages_with_retries'
/usr/local/share/gems/gems/fluent-plugin-kafka-0.13.1/lib/fluent/plugin/kafka_producer_ext.rb:126:in `deliver_messages'
/usr/local/share/gems/gems/fluent-plugin-kafka-0.13.1/lib/fluent/plugin/out_kafka2.rb:270:in `write'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1125:in `try_flush'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-12-10 15:24:06 +0000 [warn]: failed to flush the buffer. retry_time=13 next_retry_seconds=2020-12-10 15:29:26 +0000 chunk="5b5907447d28804082b18b56af32c054" error_class=Kafka::DeliveryFailed error="Failed to send messages to flux-openshift-v4/1"
  2020-12-10 15:24:06 +0000 [warn]: suppressed same stacktrace
~~~~
6. Check the kafka broker logs giving error:

~~~
[2020-12-30 15:48:05,250] ERROR Closing socket for 172.31.29.210:9092-209.132.189.136:2153 because of error (kafka.network.Processor)
org.apache.kafka.common.errors.InvalidRequestException: Error getting request for apiKey: 0 and apiVersion: 3
Caused by: java.lang.IllegalArgumentException: Invalid version for API key 0: 3
        at org.apache.kafka.common.protocol.ProtoUtils.schemaFor(ProtoUtils.java:31)
        at org.apache.kafka.common.protocol.ProtoUtils.requestSchema(ProtoUtils.java:44)
        at org.apache.kafka.common.protocol.ProtoUtils.parseRequest(ProtoUtils.java:60)
        at org.apache.kafka.common.requests.ProduceRequest.parse(ProduceRequest.java:134)
        at org.apache.kafka.common.requests.AbstractRequest.getRequest(AbstractRequest.java:42)
        at kafka.network.RequestChannel$Request.liftedTree2$1(RequestChannel.scala:96)
        at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:91)
        at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:492)
        at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:487)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at kafka.network.Processor.processCompletedReceives(SocketServer.scala:487)
        at kafka.network.Processor.run(SocketServer.scala:417)
        at java.lang.Thread.run(Thread.java:745)
[2020-12-30 15:48:07,695] ERROR Closing socket for 172.31.29.210:9092-209.132.189.136:21620 because of error (kafka.network.Processor)
~~~

This is always reproducible using in the Kafka broker the version 0.10.1.1 and it seems that it's an issue with incompatibility versions if we browse the error, we can find these references to the same:

- https://github.com/fluent/fluent-plugin-kafka/issues/232
- https://github.com/zendesk/ruby-kafka/issues/731

Then, it seems as it was indicated before an incompatibility version between some components. Could you confirm this? and if this is the case, perhaps, we should indicate it on the requirements for:

- fluent-plugin-kafka pluging here [2]
- OpenShift logging documentation 

[1] https://docs.openshift.com/container-platform/4.6/logging/cluster-logging-deploying.html
[2] https://github.com/fluent/fluent-plugin-kafka/tree/v0.13.1#requirements
[3] https://docs.openshift.com/container-platform/4.6/logging/cluster-logging-external.html#cluster-logging-collector-log-forward-kafka_cluster-logging-external

Comment 11 Jeff Cantrill 2021-01-21 22:33:16 UTC
Moving this to a documentation issue so we can properly expose compatibility. This [1] looks like one of the dependencies will support 0.11.0 and later.  We test [2] 2.4.1. What should we document?

[1] https://github.com/zendesk/ruby-kafka/blob/master/CHANGELOG.md#070
[2] https://github.com/openshift/cluster-logging-operator/blob/master/test/helpers/kafka/broker.go#L141


Note You need to log in before you can comment on or make changes to this bug.