Bug 1413147
Summary: | Size of the emitted data exceeds buffer_chunk_limit | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ricardo Lourenco <rlourenc> | |
Component: | Logging | Assignee: | Noriko Hosoi <nhosoi> | |
Status: | CLOSED ERRATA | QA Contact: | Mike Fiedler <mifiedle> | |
Severity: | low | Docs Contact: | ||
Priority: | high | |||
Version: | 3.4.0 | CC: | aos-bugs, ewolinet, jcantril, jeder, lizhou, lvlcek, mifiedle, misalunk, nkinder, pportant, rlourenc, rmeggins, tstclair, vlaad, wsun, xiazhao | |
Target Milestone: | --- | |||
Target Release: | 3.7.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | aos-scalability-34 | |||
Fixed In Version: | Doc Type: | Enhancement | ||
Doc Text: |
Feature: The value of the buffer_chunk_limit is now configurable. (default to 1m)
To configure buffer_chunk_limit, set the value to the environment variable BUFFER_SIZE_LIMIT or openshift_logging_fluentd_buffer_size_limit in the ansible inventory file.
Reason: To cover various types of input, we need the ability to make buffer_chunk_limit configurable.
Result: "Size of the emitted data exceeds buffer_chunk_limit" can be fixed by configuring buffer_chunk_limit.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1534588 (view as bug list) | Environment: | ||
Last Closed: | 2017-11-28 21:52:23 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1469859 | |||
Bug Blocks: | 1534588 |
Description
Ricardo Lourenco
2017-01-13 18:12:23 UTC
Looks like the enrichment pods need a bigger chunk size to match what the dispatcher pods are emitting. See https://groups.google.com/d/msg/fluentd/tb_kiN80cNk/_-OnzhGEcykJ Created https://trello.com/c/pHnZPpoE to make alterations. Modifying priority/severity based on conversation with @jerms Thanks Jeff -- the end of that thread was that we're not aware of this causing a loss of log data. If someone can prove that, then the prio of this would become urgent. If/until that's proven, this BZ is about correctness, optimization and cleaning up logs ... and can be dealt with as low severity. Noriko is working on this. (In reply to Peter Portante from comment #2) > Looks like the enrichment pods need a bigger chunk size to match what the > dispatcher pods are emitting. I agree with Peter. Could you find out the version(s) of fluentd? E.g., 0.12.29 Can we also have the buffer_chunk_size and buffer_queue_size values on the enrichment pods as well as on the dispatcher pods if they are configured? oc rsh $ENRICHMENT_FLUENTD_POD as well as $DISPATCHER_FLUENTD_POD grep buffer_chunk_limit /etc/fluent/configs.d/*/*.conf If you don't see the output, the default value in the version should be used... Note: In the upstream origin-aggregated-logging, we made the parameters configurable, that is, we could pass them via environment variables. Thanks. Could you rerun the test using these container images? Thanks. repositories: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:rhaos-3.6-rhel-7-docker-candidate-20170525163637 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.6.84-2 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.6.84 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:latest brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.6 koji_builds: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=560368 Is it ok to set "Target Release" to 3.6.0? *** Bug 1442951 has been marked as a duplicate of this bug. *** This bug is targeted for 3.6.0, but verification is blocked by bug 1469859 which is currently targeted for 3.6.1 tl;dr This message means that the size of a single log record is bigger than the internal fluentd chunk size. Fluentd will still accept this record. When this happens, it is really important to find out where this large record came from, exactly how large it is, and ideally, get a copy of the record, so we can see if this is just a fluke, or a case we need to handle regularly. Both the elasticsearch and secure_forward plugins are ObjectBufferedOutput plugins. emit method: https://github.com/fluent/fluentd/blob/v0.12.37/lib/fluent/output.rb#L463 if @buffer.emit(key, data, chain) submit_flush end If the @buffer.emit call returns true, it tries to flush immediately. The buffer.emit call returns true if the current chunk is full (buffer_chunk_limit reached) _and_ the queue is empty: https://github.com/fluent/fluentd/blob/v0.12.37/lib/fluent/buffer.rb#L190 # chunk unique id is generated in #new_chunk chunk = (@map[key] ||= new_chunk(key)) if storable?(chunk, data) chain.next chunk << data return false @map[key] is the current chunk being filled. Note that it _is not queued_ - it is not yet in the queue. If the new data fits in the current chunk, just add it and continue - return false - no flush yet. elsif @queue.size >= @buffer_queue_limit raise BufferQueueLimitError, "queue size exceeds limit" end if the data is not storable, that means we need to add the current chunk to the queue and start a new chunk. However, if we are already at the queue size limit, we raise an exception. rescue BufferQueueLimitError => e case @buffer_queue_full_action when :exception raise e when :block # This is rough implementation. New Buffer API should improve this routine by using wait/signal $log.debug "buffer queue is full. Wait 1 second to re-emit events" sleep 1 retry when :drop_oldest_chunk $log.debug "buffer queue is full. Dropping oldest chunk" pop(nil) end end The default case is :exception which means the input plugin must be able to handle it. if data.bytesize > @buffer_chunk_limit $log.warn "Size of the emitted data exceeds buffer_chunk_limit." $log.warn "This may occur problems in the output plugins ``at this server.``" $log.warn "To avoid problems, set a smaller number to the buffer_chunk_limit" $log.warn "in the forward output ``at the log forwarding server.``" ### TODO # raise BufferChunkLimitError, "received data too large" end This is the case when the size of a single record exceeds the buffer_chunk_limit - which so far has been very rare, and yet fluentd will accept it with the warning. Verified on 3.7.0-0.126.4. The buffer_chunk_limit message is no longer thrown. As logging-mux pods reach the limit of scalability (primarily driven by message rates), they exhibit high cpu utilization but no longer throw errors or lose messages. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |