Bug 1907370
| Summary: | Log Forwarding to kafka Fails with error rror_class=Kafka::DeliveryFailed error="Failed to send messages to flux-openshift-v4/1" | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | puraut |
| Component: | Documentation | Assignee: | Rolfe Dlugy-Hegwer <rdlugyhe> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Xiaoli Tian <xtian> |
| Severity: | medium | Docs Contact: | Vikram Goyal <vigoyal> |
| Priority: | medium | ||
| Version: | 4.6 | CC: | aos-bugs, jcantril, jokerman, ocasalsa, periklis, rdlugyhe, ykarajag |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 4.7.0 | Flags: | rdlugyhe:
needinfo-
rdlugyhe: needinfo- |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | logging-core | ||
| Fixed In Version: | Doc Type: | Known Issue | |
| Doc Text: |
* Fluentd pods with the ruby-kafka-1.1.0 and fluent-plugin-kafka-0.13.1 gems are not compatible with Apache Kafka version 0.10.1.0.
+
As a result, log forwarding to Kafka fails with a message: error_class=Kafka::DeliveryFailed error="Failed to send messages to flux-openshift-v4/1"
+
The ruby-kafka-0.7 gem dropped support for Kafka 0.10 in favor of native support for Kafka 0.11. The ruby-kafka-1.0.0 gem added support for Kafka 2.3 and 2.4. The current version of OpenShift Logging tests and therefore supports Kafka version 2.4.1.
+
To work around this issue, upgrade to a supported version of Apache Kafka.
+
(link:https://bugzilla.redhat.com/show_bug.cgi?id=1907370[*BZ#1907370*])
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-09 19:52:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 1
Jeff Cantrill
2020-12-14 17:09:58 UTC
(In reply to Yogiraj Karajagi from comment #3) > --- > To be more clear, on my openshift v3.11 cluster, Fluentd pods send logs to a > fluent td-agent installed on a redhat vmware server. This td-agent push log > to kafka. > The td-agent version is td-agent-3.0.1-0.el7.x86_64.rpm. > It can be downloaded on fluentd site at > https://td-agent-package-browser.herokuapp.com/3/redhat/7/x86_64. > The Openshift logging team does not support the configuration you describe. You are referencing issues with a component and deployment that are both administered by the customer and delivered by at least a different engineering group and possibly a different company. 'td-agent' is the commercial product name Treasure Data gives to fluentd. I recommend opening an issue directly with Treasure Data. Closing this issue CANTFIX Hello,
I was able to reproduce this issue in our labs. For doing it:
[Steps to Reproduce]
1. Deploy an OCP 4.6 cluster
2. Deploy the Cluster Logging stack following the documentation [1]
3. Configure a kafka broker using the version 0.10.1.1. This version it's supported as it's indicated here [2] where it says:
Ruby 2.1 or later
Input plugins work with kafka v0.9 or later
Output plugins work with kafka v0.8 or later
4. Configure the cluster logging stack to forwarding logs to a Kafka broker following the documentation
5. Check the fluent logs with the error sending to the Kafka broker outside:
~~~
2020-12-10 15:24:06 +0000 [warn]: Send exception occurred: Failed to send messages to flux-openshift-v4/1
2020-12-10 15:24:06 +0000 [warn]: Exception Backtrace : /usr/local/share/gems/gems/fluent-plugin-kafka-0.13.1/lib/fluent/plugin/kafka_producer_ext.rb:240:in `deliver_messages_with_retries'
/usr/local/share/gems/gems/fluent-plugin-kafka-0.13.1/lib/fluent/plugin/kafka_producer_ext.rb:126:in `deliver_messages'
/usr/local/share/gems/gems/fluent-plugin-kafka-0.13.1/lib/fluent/plugin/out_kafka2.rb:270:in `write'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1125:in `try_flush'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:1431:in `flush_thread_run'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin/output.rb:461:in `block (2 levels) in start'
/usr/local/share/gems/gems/fluentd-1.7.4/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-12-10 15:24:06 +0000 [warn]: failed to flush the buffer. retry_time=13 next_retry_seconds=2020-12-10 15:29:26 +0000 chunk="5b5907447d28804082b18b56af32c054" error_class=Kafka::DeliveryFailed error="Failed to send messages to flux-openshift-v4/1"
2020-12-10 15:24:06 +0000 [warn]: suppressed same stacktrace
~~~~
6. Check the kafka broker logs giving error:
~~~
[2020-12-30 15:48:05,250] ERROR Closing socket for 172.31.29.210:9092-209.132.189.136:2153 because of error (kafka.network.Processor)
org.apache.kafka.common.errors.InvalidRequestException: Error getting request for apiKey: 0 and apiVersion: 3
Caused by: java.lang.IllegalArgumentException: Invalid version for API key 0: 3
at org.apache.kafka.common.protocol.ProtoUtils.schemaFor(ProtoUtils.java:31)
at org.apache.kafka.common.protocol.ProtoUtils.requestSchema(ProtoUtils.java:44)
at org.apache.kafka.common.protocol.ProtoUtils.parseRequest(ProtoUtils.java:60)
at org.apache.kafka.common.requests.ProduceRequest.parse(ProduceRequest.java:134)
at org.apache.kafka.common.requests.AbstractRequest.getRequest(AbstractRequest.java:42)
at kafka.network.RequestChannel$Request.liftedTree2$1(RequestChannel.scala:96)
at kafka.network.RequestChannel$Request.<init>(RequestChannel.scala:91)
at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:492)
at kafka.network.Processor$$anonfun$processCompletedReceives$1.apply(SocketServer.scala:487)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.network.Processor.processCompletedReceives(SocketServer.scala:487)
at kafka.network.Processor.run(SocketServer.scala:417)
at java.lang.Thread.run(Thread.java:745)
[2020-12-30 15:48:07,695] ERROR Closing socket for 172.31.29.210:9092-209.132.189.136:21620 because of error (kafka.network.Processor)
~~~
This is always reproducible using in the Kafka broker the version 0.10.1.1 and it seems that it's an issue with incompatibility versions if we browse the error, we can find these references to the same:
- https://github.com/fluent/fluent-plugin-kafka/issues/232
- https://github.com/zendesk/ruby-kafka/issues/731
Then, it seems as it was indicated before an incompatibility version between some components. Could you confirm this? and if this is the case, perhaps, we should indicate it on the requirements for:
- fluent-plugin-kafka pluging here [2]
- OpenShift logging documentation
[1] https://docs.openshift.com/container-platform/4.6/logging/cluster-logging-deploying.html
[2] https://github.com/fluent/fluent-plugin-kafka/tree/v0.13.1#requirements
[3] https://docs.openshift.com/container-platform/4.6/logging/cluster-logging-external.html#cluster-logging-collector-log-forward-kafka_cluster-logging-external
Moving this to a documentation issue so we can properly expose compatibility. This [1] looks like one of the dependencies will support 0.11.0 and later. We test [2] 2.4.1. What should we document? [1] https://github.com/zendesk/ruby-kafka/blob/master/CHANGELOG.md#070 [2] https://github.com/openshift/cluster-logging-operator/blob/master/test/helpers/kafka/broker.go#L141 |