Created attachment 1189146 [details] fluentd_pod_log Problem description: Encounter such errors inside fluentd log (for some deleted namespaces) : #<Kubeclient::Common::WatchNotice type="MODIFIED", object={"kind"=>"Namespace", "apiVersion"=>"v1", "metadata"=>{"name"=>"tobedeleted", "selfLink"=>"/api/v1/namespaces/tobedeleted", "uid"=>"98635a65-5dfd-11e6-a1f0-0e05cb0c5c85", "resourceVersion"=>"15740", "creationTimestamp"=>"2016-08-09T06:50:44Z", "annotations"=>{"openshift.io/description"=>"", "openshift.io/display-name"=>"", "openshift.io/requester"=>"xiazhao2015", "openshift.io/sa.scc.mcs"=>"s0:c13,c7", "openshift.io/sa.scc.supplemental-groups"=>"1000170000/10000", "openshift.io/sa.scc.uid-range"=>"1000170000/10000"}}, "spec"=>{"finalizers"=>["openshift.io/origin", "kubernetes"]}, "status"=>{"phase"=>"Active"}}> 2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `initialize' 2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `open' 2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:878:in `block in connect' 2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/timeout.rb:52:in `timeout' 2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:877:in `connect' 2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1445:in `begin_transport' 2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1402:in `transport_request' 2016-08-09 02:57:28 -0400 [error]: /usr/share/ruby/net/http.rb:1376:in `request' 2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/rest-client-1.6.7/lib/restclient/net_http_ext.rb:51:in `request' 2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/kubeclient-0.7.0/lib/kubeclient/watch_stream.rb:21:in `each' 2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-0.24.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:362:in `start_watch' 2016-08-09 02:57:28 -0400 [error]: /usr/share/gems/gems/fluent-plugin-kubernetes_metadata_filter-0.24.0/lib/fluent/plugin/filter_kubernetes_metadata.rb:173:in `block in configure' 2016-08-09 02:57:37 -0400 [error]: fluentd main process died unexpectedly. restarting. Version-Release number of selected component (if applicable): registry.ops.../logging-fluentd 3.3.0 80847240fa91 # openshift version openshift v3.3.0.17 kubernetes v1.3.0+507d3a7 etcd 2.3.0+git How reproducible: Always Steps to Reproduce: 1. Create a new namespaces on openshift named "tobedeleted", make some app data inside and checked that log is visible on kibana UI 2. Delete namespace "tobedeleted" : $oc delete project tobedeleted --> checked that the logs in step1 is still visible in .all index 3. Wait for a while, check fluentd pod log: $ oc logs -f logging-fluentd-hnltg Actual Result: fluentd main process died unexpectedly when attempt to pick logs for /api/v1/namespaces/tobedeleted Expected Result: fluentd should notice about the absence of deleted namespace, and stop collecting logs there Additional info: 1.Fluentd is still able to collect logs for existing namespaces 2.Full pod log attached
Good catch. We need to make the fluentd plugin resilient to namespace deletion.
There are still some missing packages. We need a puddle rebuild to pick up the latest versions of kubeclient, rest-client, http-cookie, domain_name, unf, and unf_ext, then we need to rebuild the image logging-fluentd-docker 3.3.0 Release 14
@xzhao has this been observed with any version prior to 3.3? Just wondering if we need to note this bug being fixed from prior version. If not, we can just close it when it's fixed.
@lmeyer I should not encounter it on OSE versions prior to 3.3, but will test again to confirm. The verification work towards 3.3.0 is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1366137
@lmeyer Have tested this scenario on 3.2.1 logging, did not get it reproduced.
https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=509895 Should not be blocked any more
The verification work is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1368301
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1933