Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1907370

Summary: Log Forwarding to kafka Fails with error rror_class=Kafka::DeliveryFailed error="Failed to send messages to flux-openshift-v4/1"
Product: OpenShift Container Platform Reporter: puraut
Component: DocumentationAssignee: Rolfe Dlugy-Hegwer <rdlugyhe>
Status: CLOSED CURRENTRELEASE QA Contact: Xiaoli Tian <xtian>
Severity: medium Docs Contact: Vikram Goyal <vigoyal>
Priority: medium    
Version: 4.6CC: aos-bugs, jcantril, jokerman, ocasalsa, periklis, rdlugyhe, ykarajag
Target Milestone: ---Keywords: Reopened
Target Release: 4.7.0Flags: rdlugyhe: needinfo-
rdlugyhe: needinfo-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: logging-core
Fixed In Version: Doc Type: Known Issue
Doc Text:
* Fluentd pods with the ruby-kafka-1.1.0 and fluent-plugin-kafka-0.13.1 gems are not compatible with Apache Kafka version 0.10.1.0. + As a result, log forwarding to Kafka fails with a message: error_class=Kafka::DeliveryFailed error="Failed to send messages to flux-openshift-v4/1" + The ruby-kafka-0.7 gem dropped support for Kafka 0.10 in favor of native support for Kafka 0.11. The ruby-kafka-1.0.0 gem added support for Kafka 2.3 and 2.4. The current version of OpenShift Logging tests and therefore supports Kafka version 2.4.1. + To work around this issue, upgrade to a supported version of Apache Kafka. + (link:https://bugzilla.redhat.com/show_bug.cgi?id=1907370[*BZ#1907370*])
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-09 19:52:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Jeff Cantrill 2020-12-14 17:09:58 UTC
(In reply to puraut from comment #0)
> Description of problem:
> The customer is trying to forward the logs to kafka broker using an official
> document.[1] 

Please provide the link to the documentation you are referencing.
 
> Steps to Reproduce:
> 1.Install Kafka 0.10  and try to forward logs  from ES to kafka

OpenShift logging does not support forwarding logs from ES to Kafka.  Please provide more information on how this error is produced

> Expected results:
> It should able to forward  logs to Kafka 
> 
> Additional info:
> 1] I  tried to reproduce the issue with kafka 2.5.0 with help of AWS AMI  at
> the start I faced the same error but after changing the configuration on
> Kafka side I am able to see the logs.
> 2]customer has tried to configuring fluentd agent on a legacy server. This
> fluentd agent sends logs to Kafka.

Please clarify what is meant by "fluentd" agent and "legacy" server?  Do you mean fluentd provided by the customer or provided by OpenShift?  The OpenShift fluentd image with the Kafka plugin is a relatively recent addition and full support via logforwarding has only been in GA since 4.6 release.

Whereas it fails to do the same with
> RHOCP4.6.
> 3] I tried to reproduce the same with Kafka
> 0.10[bitnami-kafka-0.10.1.1-0-linux-ubuntu-14.04.3-x86_64-hvm-ebs] with aws
> community AMI but got the same error

Comment 4 Jeff Cantrill 2020-12-18 12:03:26 UTC
(In reply to Yogiraj Karajagi from comment #3)

> --- 
> To be more clear, on my openshift v3.11 cluster, Fluentd pods send logs to a
> fluent td-agent installed on a redhat vmware server. This td-agent push log
> to kafka.
> The td-agent version is td-agent-3.0.1-0.el7.x86_64.rpm.
> It can be downloaded on fluentd site at
> https://td-agent-package-browser.herokuapp.com/3/redhat/7/x86_64.
>

The Openshift logging team does not support the configuration you describe.  You are referencing issues with a component and deployment that are both administered by the customer and delivered by at least a different engineering group and possibly a different company.  'td-agent' is the commercial product name Treasure Data gives to fluentd.  I recommend opening an issue directly with Treasure Data.

Closing this issue CANTFIX

Comment 9 Oscar Casal Sanchez 2021-01-12 11:14:16 UTC
Hello, 

I was able to reproduce this issue in our labs. For doing it:


[Steps to Reproduce]

1. Deploy an OCP 4.6 cluster
2. Deploy the Cluster Logging stack following the documentation [1]
3. Configure a kafka broker using the version 0.10.1.1. This version it's supported as it's indicated here [2] where it says:

    Ruby 2.1 or later
    Input plugins work with kafka v0.9 or later
    Output plugins work with kafka v0.8 or later

4. Configure the cluster logging stack to forwarding logs to a Kafka broker following the documentation
5. Check the fluent logs with the error sending to the Kafka broker outside:

~~~
2020-12-10 15:24:06 +0000 [warn]: Send exception occurred: Failed to send messages to flux-openshift-v4/1
2020-12-10 15:24:06 +0000 [warn]: Exception Backtrace : /usr/local/share/gems/gems/fluent-plugin-kafka-0.13.1/lib/fluent/plugin/kafka_producer_ext.rb:240:in `deliver_messages_with_retries'
/usr/local/share/gems/gems/fluent-plugin-kafka-0.13.1/lib/fluent/plugin/kafka_producer_ext.rb:126:in `deliver_messages'
/usr/local/share/gems/gems/fluent-plugin-kafka-0.13.1/lib/fluent/plugin/out_kafka2.rb:270:in `write'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1125:in `try_flush'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-12-10 15:24:06 +0000 [warn]: failed to flush the buffer. retry_time=13 next_retry_seconds=2020-12-10 15:29:26 +0000 chunk="5b5907447d28804082b18b56af32c054" error_class=Kafka::DeliveryFailed error="Failed to send messages to flux-openshift-v4/1"
  2020-12-10 15:24:06 +0000 [warn]: suppressed same stacktrace
~~~~
6. Check the kafka broker logs giving error:

~~~
[2020-12-30 15:48:05,250] ERROR Closing socket for 172.31.29.210:9092-209.132.189.136:2153 because of error (kafka.network.Processor)
org.apache.kafka.common.errors.InvalidRequestException: Error getting request for apiKey: 0 and apiVersion: 3
Caused by: java.lang.IllegalArgumentException: Invalid version for API key 0: 3
        at org.apache.kafka.common.protocol.ProtoUtils.schemaFor(ProtoUtils.java:31)
        at org.apache.kafka.common.protocol.ProtoUtils.requestSchema(ProtoUtils.java:44)
        at org.apache.kafka.common.protocol.ProtoUtils.parseRequest(ProtoUtils.java:60)
        at org.apache.kafka.common.requests.ProduceRequest.parse(ProduceRequest.java:134)
        at org.apache.kafka.common.requests.AbstractRequest.getRequest(AbstractRequest.java:42)
        at kafka.network.RequestChannel$Request.liftedTree2$1(RequestChannel.scala:96)
        at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:91)
        at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:492)
        at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:487)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at kafka.network.Processor.processCompletedReceives(SocketServer.scala:487)
        at kafka.network.Processor.run(SocketServer.scala:417)
        at java.lang.Thread.run(Thread.java:745)
[2020-12-30 15:48:07,695] ERROR Closing socket for 172.31.29.210:9092-209.132.189.136:21620 because of error (kafka.network.Processor)
~~~

This is always reproducible using in the Kafka broker the version 0.10.1.1 and it seems that it's an issue with incompatibility versions if we browse the error, we can find these references to the same:

- https://github.com/fluent/fluent-plugin-kafka/issues/232
- https://github.com/zendesk/ruby-kafka/issues/731

Then, it seems as it was indicated before an incompatibility version between some components. Could you confirm this? and if this is the case, perhaps, we should indicate it on the requirements for:

- fluent-plugin-kafka pluging here [2]
- OpenShift logging documentation 

[1] https://docs.openshift.com/container-platform/4.6/logging/cluster-logging-deploying.html
[2] https://github.com/fluent/fluent-plugin-kafka/tree/v0.13.1#requirements
[3] https://docs.openshift.com/container-platform/4.6/logging/cluster-logging-external.html#cluster-logging-collector-log-forward-kafka_cluster-logging-external

Comment 11 Jeff Cantrill 2021-01-21 22:33:16 UTC
Moving this to a documentation issue so we can properly expose compatibility. This [1] looks like one of the dependencies will support 0.11.0 and later.  We test [2] 2.4.1. What should we document?

[1] https://github.com/zendesk/ruby-kafka/blob/master/CHANGELOG.md#070
[2] https://github.com/openshift/cluster-logging-operator/blob/master/test/helpers/kafka/broker.go#L141