Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1365422 - fluentd main process died unexpectedly on logs from deleted project
fluentd main process died unexpectedly on logs from deleted project
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging (Show other bugs)
3.3.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: ewolinet
chunchen
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-08-09 04:47 EDT by Xia Zhao
Modified: 2017-03-08 13 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When a project was deleted, the plugin for Fluentd was not properly handling the fetching of metadata and would exit. Consequence: Fluentd pod would restart. Fix: Updating the kubeclient and rest-client gems for Fluentd. Result: Fluentd is able to properly handle case where project was deleted for logs it is processing.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-09-27 05:42:57 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
fluentd_pod_log (29.36 KB, text/plain)
2016-08-09 04:47 EDT, Xia Zhao
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1933 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.3 Release Advisory 2016-09-27 09:24:36 EDT

  None (edit)
Description Xia Zhao 2016-08-09 04:47:53 EDT
Created attachment 1189146 [details]
fluentd_pod_log

Problem description: 
Encounter such errors inside fluentd log (for some deleted namespaces) :
#<Kubeclient::Common::WatchNotice type="MODIFIED", object={"kind"=>"Namespace", "apiVersion"=>"v1", "metadata"=>{"name"=>"tobedeleted", "selfLink"=>"/api/v1/namespaces/tobedeleted", "uid"=>"98635a65-5dfd-11e6-a1f0-0e05cb0c5c85", "resourceVersion"=>"15740", "creationTimestamp"=>"2016-08-09T06:50:44Z", "annotations"=>{"openshift.io/description"=>"", "openshift.io/display-name"=>"", "openshift.io/requester"=>"xiazhao2015", "openshift.io/sa.scc.mcs"=>"s0:c13,c7", "openshift.io/sa.scc.supplemental-groups"=>"1000170000/10000", "openshift.io/sa.scc.uid-range"=>"1000170000/10000"}}, "spec"=>{"finalizers"=>["openshift.io/origin", "kubernetes"]}, "status"=>{"phase"=>"Active"}}>
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `initialize'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `open'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/timeout.rb:52:in `timeout'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:877:in `connect'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1445:in `begin_transport'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1402:in `transport_request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1376:in `request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/rest-client-1.6.7/lib/restclient/net_http_ext.rb:51:in `request'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/kubeclient-0.7.0/lib/kubeclient/watch_stream.rb:21:in `each'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-0.24.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:362:in `start_watch'
  2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-0.24.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:173:in `block in configure'
2016-08-09 02:57:37 -0400 [error]: fluentd main process died unexpectedly. restarting.


Version-Release number of selected component (if applicable):
registry.ops.../logging-fluentd         3.3.0               80847240fa91

# openshift version
openshift v3.3.0.17
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

How reproducible:
Always

Steps to Reproduce:
1. Create a new namespaces on openshift named "tobedeleted", make some app data inside and checked that log is visible on kibana UI
2. Delete namespace "tobedeleted" : 
 $oc delete project tobedeleted
--> checked  that the logs in step1 is still visible in .all index
3. Wait for a while, check fluentd pod log: 
$ oc logs -f logging-fluentd-hnltg

Actual Result:
fluentd main process died unexpectedly when attempt to pick logs for /api/v1/namespaces/tobedeleted

Expected Result:
fluentd should notice about the absence of deleted namespace, and stop collecting logs there

Additional info:
1.Fluentd is still able to collect logs for existing namespaces
2.Full pod log attached
Comment 1 Luke Meyer 2016-08-09 09:27:21 EDT
Good catch. We need to make the fluentd plugin resilient to namespace deletion.
Comment 3 Rich Megginson 2016-08-14 15:34:51 EDT
There are still some missing packages.  We need a puddle rebuild to pick up the latest versions of kubeclient, rest-client, http-cookie, domain_name, unf, and unf_ext, then we need to rebuild the image logging-fluentd-docker 3.3.0 Release 14
Comment 4 Luke Meyer 2016-08-17 14:51:33 EDT
@xzhao has this been observed with any version prior to 3.3? Just wondering if we need to note this bug being fixed from prior version. If not, we can just close it when it's fixed.
Comment 5 Xia Zhao 2016-08-17 22:15:16 EDT
@lmeyer I should not encounter it on OSE versions prior to 3.3, but will test again to confirm.

The verification work towards 3.3.0 is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1366137
Comment 6 Xia Zhao 2016-08-18 04:38:36 EDT
@lmeyer Have tested this scenario on 3.2.1 logging, did not get it reproduced.
Comment 7 Rich Megginson 2016-08-18 19:06:34 EDT
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=509895

Should not be blocked any more
Comment 8 Xia Zhao 2016-08-18 23:31:35 EDT
The verification work is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1368301
Comment 11 errata-xmlrpc 2016-09-27 05:42:57 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.