Bug 1531157 - logging-fluentd v3.9.0-0.16.0.2 immediately starts flooding "missing namespace" errors on startup
Summary: logging-fluentd v3.9.0-0.16.0.2 immediately starts flooding "missing namespac...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.9.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.9.0
Assignee: Jeff Cantrill
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks: 1502764
TreeView+ depends on / blocked
 
Reported: 2018-01-04 17:01 UTC by Mike Fiedler
Modified: 2018-03-28 14:17 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-03-28 14:17:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logging-fluentd log (4.49 MB, text/plain)
2018-01-04 17:01 UTC, Mike Fiedler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin-aggregated-logging pull 883 0 None None None 2018-01-05 03:05:28 UTC
Github openshift origin-aggregated-logging pull 898 0 None None None 2018-01-17 23:42:53 UTC
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:17:57 UTC

Description Mike Fiedler 2018-01-04 17:01:54 UTC
Created attachment 1377024 [details]
logging-fluentd log

Description of problem:

The latest (as of 4 Jan) logging-fluentd image (v3.9.0-0.16.0.2) seems broken.  Immediately on startup, the fluentd pod starts flooding error messages complaining about missing namespaces with bad message content.   Partial message below, full log attached.   

There are no pods running on the system.  Docker is configured for json-file.
This issue was not seen with logging-fluentd v3.9.0-0.9.0

2018-01-04 16:43:04 +0000 [error]: record cannot use elasticsearch index name type project_full: record is missing kubernetes.namespace_id field: {"docker"=>{"container_id"=>"cdad990c2155e438df453d8caf4808424539ded32bba674536ff69df06b1e25e"}, "kubernetes"=>{"container_name"=>"fluentd-elasticsearch", "namespace_name"=>"logging", "pod_name"=>"logging-fluentd-7mcsn", "pod_id"=>"38f5d4c7-f16e-11e7-b343-024338e41dd2", "labels"=>{"component"=>"fluentd", "controller-revision-hash"=>"2355984793", "logging-infra"=>"fluentd", "pod-template-generation"=>"1", "provider"=>"openshift"}, "host"=>"ip-172-31-15-26.us-west-2.compute.internal", "master_url"=>"https://kubernetes.default.svc.cluster.local"}, "message"=>"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

<snip - see attached for full messages>

Version-Release number of selected component (if applicable): logging-fluentd v3.9.0-0.16.2


How reproducible: Always when starting logging-fluentd


Steps to Reproduce:
1. Deploy logging v3.9.0-0.16.2 normally using openshift-ansible  (docker configured on all nodes for json-file)
2. Verify elasticsearch starts correctly
3. oc logs <fluentd pod>  for a system where no other pods are running

Actual results:

See attached errors.  Additionally, no pod logs appear in Elasticsearch indices.   Operations logs are created.


Expected results:

fluentd normal startup


Additional info:

Comment 1 Anping Li 2018-01-17 07:06:32 UTC
323M fluentd.logs in 10 min. There are pods in Evicted. 

docker-registry-1-4nqhg       1/1       Running   0          20h
docker-registry-1-4sngb       0/1       Evicted   0          22h
docker-registry-1-78mz5       0/1       Evicted   0          22h
docker-registry-1-stnqn       0/1       Evicted   0          21h
docker-registry-1-tqsjk       0/1       Evicted   0          22h

Comment 3 Mike Fiedler 2018-01-22 19:18:30 UTC
Problem still occurs on registry.reg-aws.openshift.com:443/openshift3/logging-fluentd:v3.9.0-0.22.0.0

registry.reg-aws.openshift.com:443/openshift3/logging-fluentd             v3.9.0-0.22.0.0     35b4c7263b16        2 days ago          275.5 MB

Comment 4 Jeff Cantrill 2018-01-22 21:10:25 UTC
I don't see where [1] is in the latest puddles [2] which is the only way this issue will be resolved.  Can you help us out.  The gem [1] should be available in 3.6->3.9 puddles

[1] https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=646322
[2] http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/3.9/latest/x86_64/os/Packages/

Comment 5 Rich Megginson 2018-01-23 17:15:31 UTC
(In reply to Jeff Cantrill from comment #4)
> I don't see where [1] is in the latest puddles [2] which is the only way
> this issue will be resolved.  Can you help us out.  The gem [1] should be
> available in 3.6->3.9 puddles
> 
> [1] https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=646322
> [2]
> http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/
> AtomicOpenShift/3.9/latest/x86_64/os/Packages/

1.0.1 was tagged into 3.9, 3.8, 3.7, 3.6, and those puddles were rebuilt.  You should be good to go for rebuilding the fluentd images for those releases.

Comment 6 Anping Li 2018-01-24 04:35:49 UTC
The fix isn't in logging-fluentd/images/v3.9.0-0.23.0.0.

Comment 8 Mike Fiedler 2018-01-29 18:37:52 UTC
Verified on 3.9.0-0.31.0.  logging-fluentd is working normally in this puddle.

Comment 11 errata-xmlrpc 2018-03-28 14:17:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.