Bug 1365422

Summary: fluentd main process died unexpectedly on logs from deleted project
Product: OpenShift Container Platform Reporter: Xia Zhao <xiazhao>
Component: LoggingAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: chunchen <chunchen>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.3.0CC: aos-bugs, lmeyer, rmeggins, tdawson, wsun
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When a project was deleted, the plugin for Fluentd was not properly handling the fetching of metadata and would exit. Consequence: Fluentd pod would restart. Fix: Updating the kubeclient and rest-client gems for Fluentd. Result: Fluentd is able to properly handle case where project was deleted for logs it is processing.
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-27 09:42:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
fluentd_pod_log none

Description Xia Zhao 2016-08-09 08:47:53 UTC
Created attachment 1189146 [details]
fluentd_pod_log

Problem description: 
Encounter such errors inside fluentd log (for some deleted namespaces) :
#<Kubeclient::Common::WatchNotice type="MODIFIED", object={"kind"=>"Namespace", "apiVersion"=>"v1", "metadata"=>{"name"=>"tobedeleted", "selfLink"=>"/api/v1/namespaces/tobedeleted", "uid"=>"98635a65-5dfd-11e6-a1f0-0e05cb0c5c85", "resourceVersion"=>"15740", "creationTimestamp"=>"2016-08-09T06:50:44Z", "annotations"=>{"openshift.io/description"=>"", "openshift.io/display-name"=>"", "openshift.io/requester"=>"xiazhao2015", "openshift.io/sa.scc.mcs"=>"s0:c13,c7", "openshift.io/sa.scc.supplemental-groups"=>"1000170000/10000", "openshift.io/sa.scc.uid-range"=>"1000170000/10000"}}, "spec"=>{"finalizers"=>["openshift.io/origin", "kubernetes"]}, "status"=>{"phase"=>"Active"}}>
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `initialize'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `open'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/timeout.rb:52:in `timeout'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:877:in `connect'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1445:in `begin_transport'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1402:in `transport_request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1376:in `request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/rest-client-1.6.7/lib/restclient/net_http_ext.rb:51:in `request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/kubeclient-0.7.0/lib/kubeclient/watch_stream.rb:21:in `each'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-0.24.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:362:in `start_watch'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-0.24.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:173:in `block in configure'
2016-08-09 02:57:37 -0400 [error]: fluentd main process died unexpectedly. restarting.


Version-Release number of selected component (if applicable):
registry.ops.../logging-fluentd         3.3.0               80847240fa91

# openshift version
openshift v3.3.0.17
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

How reproducible:
Always

Steps to Reproduce:
1. Create a new namespaces on openshift named "tobedeleted", make some app data inside and checked that log is visible on kibana UI
2. Delete namespace "tobedeleted" : 
 $oc delete project tobedeleted
--> checked  that the logs in step1 is still visible in .all index
3. Wait for a while, check fluentd pod log: 
$ oc logs -f logging-fluentd-hnltg

Actual Result:
fluentd main process died unexpectedly when attempt to pick logs for /api/v1/namespaces/tobedeleted

Expected Result:
fluentd should notice about the absence of deleted namespace, and stop collecting logs there

Additional info:
1.Fluentd is still able to collect logs for existing namespaces
2.Full pod log attached

Comment 1 Luke Meyer 2016-08-09 13:27:21 UTC
Good catch. We need to make the fluentd plugin resilient to namespace deletion.

Comment 3 Rich Megginson 2016-08-14 19:34:51 UTC
There are still some missing packages.  We need a puddle rebuild to pick up the latest versions of kubeclient, rest-client, http-cookie, domain_name, unf, and unf_ext, then we need to rebuild the image logging-fluentd-docker 3.3.0 Release 14

Comment 4 Luke Meyer 2016-08-17 18:51:33 UTC
@xzhao has this been observed with any version prior to 3.3? Just wondering if we need to note this bug being fixed from prior version. If not, we can just close it when it's fixed.

Comment 5 Xia Zhao 2016-08-18 02:15:16 UTC
@lmeyer I should not encounter it on OSE versions prior to 3.3, but will test again to confirm.

The verification work towards 3.3.0 is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1366137

Comment 6 Xia Zhao 2016-08-18 08:38:36 UTC
@lmeyer Have tested this scenario on 3.2.1 logging, did not get it reproduced.

Comment 7 Rich Megginson 2016-08-18 23:06:34 UTC
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=509895

Should not be blocked any more

Comment 8 Xia Zhao 2016-08-19 03:31:35 UTC
The verification work is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1368301

Comment 11 errata-xmlrpc 2016-09-27 09:42:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933