Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1448951

Summary:	Fluentd stack traces complaining about undefined method 'status' for nil:NilClass
Product:	OpenShift Container Platform	Reporter:	Peter Portante <pportant>
Component:	Logging	Assignee:	Rich Megginson <rmeggins>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Anping Li <anli>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.4.1	CC:	aos-bugs, jcantril, pportant, pweil, rmeggins
Target Milestone:	---	Keywords:	Reopened
Target Release:	3.5.z
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-05-14 19:11:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Peter Portante 2017-05-08 18:56:13 UTC

Filed issue https://github.com/elastic/elasticsearch-ruby/issues/428 upstream.

This is probably hiding connection issues with Elasticsearch.

Comment 1 Rich Megginson 2017-06-16 17:04:17 UTC

I think this is either related to, or a dup of, https://bugzilla.redhat.com/show_bug.cgi?id=1399388

Comment 2 Jeff Cantrill 2017-09-13 17:54:44 UTC

Rich,

Is this resolved with our 3.6.1 changes to address some perf issues and dropped messages possibly?

Comment 3 Rich Megginson 2017-09-13 18:06:06 UTC

(In reply to Jeff Cantrill from comment #2)
> Rich,
> 
> Is this resolved with our 3.6.1 changes to address some perf issues and
> dropped messages possibly?

I don't know because it is incredibly difficult to reproduce this problem.

Comment 4 Jeff Cantrill 2017-10-06 15:07:18 UTC


*** This bug has been marked as a duplicate of bug 1489533 ***

Comment 5 Peter Portante 2017-10-06 23:59:10 UTC

(In reply to Jeff Cantrill from comment #4)
> 
> *** This bug has been marked as a duplicate of bug 1489533 ***

Earlier in comment 1 it was asserted this is might be a duplicate of, or at least related to, a different bug:

  https://bugzilla.redhat.com/show_bug.cgi?id=1399388

    Failed to ship logs by "Cannot get new connection from pool." to
    AWS Elasticsearch after start logging-fluentd pod for a while

  Resolution: change fluentd config to use:

    reload_connections false
    reload_on_failure false

This bug had been closed as a duplicate of:

  https://bugzilla.redhat.com/show_bug.cgi?id=1489533

  logging-fluentd needs to periodically reconnect to logging-mux
  or elasticsearch to help balance sessions

But it does not appear to be related to either bug described above.  Instead, this bug is likely related to the proper use of the Ruby API stack to talk to Elasticsearch 2.x.  We might be able to close this bug as resolved by that work to correct the use of the proper ruby gems.

Comment 6 Rich Megginson 2017-10-07 01:30:07 UTC

I think if we fix https://bugzilla.redhat.com/show_bug.cgi?id=1489533 by making the reload behavior work for our case, then this bug goes away.

Then, there may still be some underlying bug in elasticsearch-ruby, or in fluent-plugin-elasticsearch, related to connection reload, but we won't hit it because we won't be using that mechanism.

Otherwise, what is our resolution to this bug?  Are we going to submit a PR to fix https://github.com/elastic/elasticsearch-ruby/issues/428?

Comment 7 Peter Portante 2017-10-09 14:24:01 UTC

(In reply to Rich Megginson from comment #6)
> I think if we fix https://bugzilla.redhat.com/show_bug.cgi?id=1489533 by
> making the reload behavior work for our case, then this bug goes away.

Not sure I follow.  This BZ references an issue upstream which references to the use of gems:

  fluentd-0.12.29
  fluent-plugin-elasticsearch-1.9.1
  elasticsearch-api-1.0.18
  elasticsearch-transport-1.0.18

I believe this stack works for our 3.2 and 3.3 product versions, since they are based on 1.5.x of Elasticsearch, but won't work properly for 3.4 and later.

Since we have already corrected this stack to use the proper versions intended for Elasticsearch 2.x with releases 3.4.z and later, I am guessing this bug can be closed as a duplicate whatever BZ drove those original changes.

> Then, there may still be some underlying bug in elasticsearch-ruby, or in
> fluent-plugin-elasticsearch, related to connection reload, but we won't hit
> it because we won't be using that mechanism.

FWIW, the code path where we hit this bug does not appear to be on a reload case.  This looks like a simple bug in the error handling logic, where it expects a "response" object when a "ServerError" is raised, but no object is present.