1365422 – fluentd main process died unexpectedly on logs from deleted project

Bug 1365422 - fluentd main process died unexpectedly on logs from deleted project

Summary: fluentd main process died unexpectedly on logs from deleted project

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	ewolinet
QA Contact:	chunchen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-09 08:47 UTC by Xia Zhao
Modified:	2017-03-08 18:26 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: When a project was deleted, the plugin for Fluentd was not properly handling the fetching of metadata and would exit. Consequence: Fluentd pod would restart. Fix: Updating the kubeclient and rest-client gems for Fluentd. Result: Fluentd is able to properly handle case where project was deleted for logs it is processing.
Clone Of:
Environment:
Last Closed:	2016-09-27 09:42:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
fluentd_pod_log (29.36 KB, text/plain) 2016-08-09 08:47 UTC, Xia Zhao	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1933	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.3 Release Advisory	2016-09-27 13:24:36 UTC

Description Xia Zhao 2016-08-09 08:47:53 UTC

Created attachment 1189146 [details]
fluentd_pod_log

Problem description: 
Encounter such errors inside fluentd log (for some deleted namespaces) :
#<Kubeclient::Common::WatchNotice type="MODIFIED", object={"kind"=>"Namespace", "apiVersion"=>"v1", "metadata"=>{"name"=>"tobedeleted", "selfLink"=>"/api/v1/namespaces/tobedeleted", "uid"=>"98635a65-5dfd-11e6-a1f0-0e05cb0c5c85", "resourceVersion"=>"15740", "creationTimestamp"=>"2016-08-09T06:50:44Z", "annotations"=>{"openshift.io/description"=>"", "openshift.io/display-name"=>"", "openshift.io/requester"=>"xiazhao2015", "openshift.io/sa.scc.mcs"=>"s0:c13,c7", "openshift.io/sa.scc.supplemental-groups"=>"1000170000/10000", "openshift.io/sa.scc.uid-range"=>"1000170000/10000"}}, "spec"=>{"finalizers"=>["openshift.io/origin", "kubernetes"]}, "status"=>{"phase"=>"Active"}}>
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `initialize'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `open'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/timeout.rb:52:in `timeout'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:877:in `connect'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1445:in `begin_transport'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1402:in `transport_request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1376:in `request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/rest-client-1.6.7/lib/restclient/net_http_ext.rb:51:in `request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/kubeclient-0.7.0/lib/kubeclient/watch_stream.rb:21:in `each'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-0.24.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:362:in `start_watch'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-0.24.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:173:in `block in configure'
2016-08-09 02:57:37 -0400 [error]: fluentd main process died unexpectedly. restarting.


Version-Release number of selected component (if applicable):
registry.ops.../logging-fluentd         3.3.0               80847240fa91

# openshift version
openshift v3.3.0.17
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

How reproducible:
Always

Steps to Reproduce:
1. Create a new namespaces on openshift named "tobedeleted", make some app data inside and checked that log is visible on kibana UI
2. Delete namespace "tobedeleted" : 
 $oc delete project tobedeleted
--> checked  that the logs in step1 is still visible in .all index
3. Wait for a while, check fluentd pod log: 
$ oc logs -f logging-fluentd-hnltg

Actual Result:
fluentd main process died unexpectedly when attempt to pick logs for /api/v1/namespaces/tobedeleted

Expected Result:
fluentd should notice about the absence of deleted namespace, and stop collecting logs there

Additional info:
1.Fluentd is still able to collect logs for existing namespaces
2.Full pod log attached

Comment 1 Luke Meyer 2016-08-09 13:27:21 UTC

Good catch. We need to make the fluentd plugin resilient to namespace deletion.

Comment 3 Rich Megginson 2016-08-14 19:34:51 UTC

There are still some missing packages.  We need a puddle rebuild to pick up the latest versions of kubeclient, rest-client, http-cookie, domain_name, unf, and unf_ext, then we need to rebuild the image logging-fluentd-docker 3.3.0 Release 14

Comment 4 Luke Meyer 2016-08-17 18:51:33 UTC

@xzhao has this been observed with any version prior to 3.3? Just wondering if we need to note this bug being fixed from prior version. If not, we can just close it when it's fixed.

Comment 5 Xia Zhao 2016-08-18 02:15:16 UTC

@lmeyer I should not encounter it on OSE versions prior to 3.3, but will test again to confirm.

The verification work towards 3.3.0 is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1366137

Comment 6 Xia Zhao 2016-08-18 08:38:36 UTC

@lmeyer Have tested this scenario on 3.2.1 logging, did not get it reproduced.

Comment 7 Rich Megginson 2016-08-18 23:06:34 UTC

https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=509895

Should not be blocked any more

Comment 8 Xia Zhao 2016-08-19 03:31:35 UTC

The verification work is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1368301

Comment 11 errata-xmlrpc 2016-09-27 09:42:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.