Bug 1365422 - fluentd main process died unexpectedly on logs from deleted project
Summary: fluentd main process died unexpectedly on logs from deleted project
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: ewolinet
QA Contact: chunchen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-09 08:47 UTC by Xia Zhao
Modified: 2017-03-08 18:26 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When a project was deleted, the plugin for Fluentd was not properly handling the fetching of metadata and would exit. Consequence: Fluentd pod would restart. Fix: Updating the kubeclient and rest-client gems for Fluentd. Result: Fluentd is able to properly handle case where project was deleted for logs it is processing.
Clone Of:
Environment:
Last Closed: 2016-09-27 09:42:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
fluentd_pod_log (29.36 KB, text/plain)
2016-08-09 08:47 UTC, Xia Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1933 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.3 Release Advisory 2016-09-27 13:24:36 UTC

Description Xia Zhao 2016-08-09 08:47:53 UTC
Created attachment 1189146 [details]
fluentd_pod_log

Problem description: 
Encounter such errors inside fluentd log (for some deleted namespaces) :
#<Kubeclient::Common::WatchNotice type="MODIFIED", object={"kind"=>"Namespace", "apiVersion"=>"v1", "metadata"=>{"name"=>"tobedeleted", "selfLink"=>"/api/v1/namespaces/tobedeleted", "uid"=>"98635a65-5dfd-11e6-a1f0-0e05cb0c5c85", "resourceVersion"=>"15740", "creationTimestamp"=>"2016-08-09T06:50:44Z", "annotations"=>{"openshift.io/description"=>"", "openshift.io/display-name"=>"", "openshift.io/requester"=>"xiazhao2015", "openshift.io/sa.scc.mcs"=>"s0:c13,c7", "openshift.io/sa.scc.supplemental-groups"=>"1000170000/10000", "openshift.io/sa.scc.uid-range"=>"1000170000/10000"}}, "spec"=>{"finalizers"=>["openshift.io/origin", "kubernetes"]}, "status"=>{"phase"=>"Active"}}>
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `initialize'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `open'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/timeout.rb:52:in `timeout'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:877:in `connect'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1445:in `begin_transport'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1402:in `transport_request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1376:in `request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/rest-client-1.6.7/lib/restclient/net_http_ext.rb:51:in `request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/kubeclient-0.7.0/lib/kubeclient/watch_stream.rb:21:in `each'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-0.24.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:362:in `start_watch'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-0.24.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:173:in `block in configure'
2016-08-09 02:57:37 -0400 [error]: fluentd main process died unexpectedly. restarting.


Version-Release number of selected component (if applicable):
registry.ops.../logging-fluentd         3.3.0               80847240fa91

# openshift version
openshift v3.3.0.17
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

How reproducible:
Always

Steps to Reproduce:
1. Create a new namespaces on openshift named "tobedeleted", make some app data inside and checked that log is visible on kibana UI
2. Delete namespace "tobedeleted" : 
 $oc delete project tobedeleted
--> checked  that the logs in step1 is still visible in .all index
3. Wait for a while, check fluentd pod log: 
$ oc logs -f logging-fluentd-hnltg

Actual Result:
fluentd main process died unexpectedly when attempt to pick logs for /api/v1/namespaces/tobedeleted

Expected Result:
fluentd should notice about the absence of deleted namespace, and stop collecting logs there

Additional info:
1.Fluentd is still able to collect logs for existing namespaces
2.Full pod log attached

Comment 1 Luke Meyer 2016-08-09 13:27:21 UTC
Good catch. We need to make the fluentd plugin resilient to namespace deletion.

Comment 3 Rich Megginson 2016-08-14 19:34:51 UTC
There are still some missing packages.  We need a puddle rebuild to pick up the latest versions of kubeclient, rest-client, http-cookie, domain_name, unf, and unf_ext, then we need to rebuild the image logging-fluentd-docker 3.3.0 Release 14

Comment 4 Luke Meyer 2016-08-17 18:51:33 UTC
@xzhao has this been observed with any version prior to 3.3? Just wondering if we need to note this bug being fixed from prior version. If not, we can just close it when it's fixed.

Comment 5 Xia Zhao 2016-08-18 02:15:16 UTC
@lmeyer I should not encounter it on OSE versions prior to 3.3, but will test again to confirm.

The verification work towards 3.3.0 is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1366137

Comment 6 Xia Zhao 2016-08-18 08:38:36 UTC
@lmeyer Have tested this scenario on 3.2.1 logging, did not get it reproduced.

Comment 7 Rich Megginson 2016-08-18 23:06:34 UTC
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=509895

Should not be blocked any more

Comment 8 Xia Zhao 2016-08-19 03:31:35 UTC
The verification work is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1368301

Comment 11 errata-xmlrpc 2016-09-27 09:42:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933


Note You need to log in before you can comment on or make changes to this bug.