1531157 – logging-fluentd v3.9.0-0.16.0.2 immediately starts flooding "missing namespace" errors on startup

Bug 1531157 - logging-fluentd v3.9.0-0.16.0.2 immediately starts flooding "missing namespace" errors on startup

Summary: logging-fluentd v3.9.0-0.16.0.2 immediately starts flooding "missing namespac...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Logging
Sub Component:
Version:	3.9.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.9.0
Assignee:	Jeff Cantrill
QA Contact:	Mike Fiedler
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1502764
TreeView+	depends on / blocked

Reported:	2018-01-04 17:01 UTC by Mike Fiedler
Modified:	2018-03-28 14:17 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2018-03-28 14:17:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logging-fluentd log (4.49 MB, text/plain) 2018-01-04 17:01 UTC, Mike Fiedler	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift origin-aggregated-logging pull 883	None	None	None	2018-01-05 03:05:28 UTC
Github	openshift origin-aggregated-logging pull 898	None	None	None	2018-01-17 23:42:53 UTC
Red Hat Product Errata	RHBA-2018:0489	None	None	None	2018-03-28 14:17:57 UTC

Description Mike Fiedler 2018-01-04 17:01:54 UTC

Created attachment 1377024 [details]
logging-fluentd log

Description of problem:

The latest (as of 4 Jan) logging-fluentd image (v3.9.0-0.16.0.2) seems broken.  Immediately on startup, the fluentd pod starts flooding error messages complaining about missing namespaces with bad message content.   Partial message below, full log attached.   

There are no pods running on the system.  Docker is configured for json-file.
This issue was not seen with logging-fluentd v3.9.0-0.9.0

2018-01-04 16:43:04 +0000 [error]: record cannot use elasticsearch index name type project_full: record is missing kubernetes.namespace_id field: {"docker"=>{"container_id"=>"cdad990c2155e438df453d8caf4808424539ded32bba674536ff69df06b1e25e"}, "kubernetes"=>{"container_name"=>"fluentd-elasticsearch", "namespace_name"=>"logging", "pod_name"=>"logging-fluentd-7mcsn", "pod_id"=>"38f5d4c7-f16e-11e7-b343-024338e41dd2", "labels"=>{"component"=>"fluentd", "controller-revision-hash"=>"2355984793", "logging-infra"=>"fluentd", "pod-template-generation"=>"1", "provider"=>"openshift"}, "host"=>"ip-172-31-15-26.us-west-2.compute.internal", "master_url"=>"https://kubernetes.default.svc.cluster.local"}, "message"=>"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

<snip - see attached for full messages>

Version-Release number of selected component (if applicable): logging-fluentd v3.9.0-0.16.2


How reproducible: Always when starting logging-fluentd


Steps to Reproduce:
1. Deploy logging v3.9.0-0.16.2 normally using openshift-ansible  (docker configured on all nodes for json-file)
2. Verify elasticsearch starts correctly
3. oc logs <fluentd pod>  for a system where no other pods are running

Actual results:

See attached errors.  Additionally, no pod logs appear in Elasticsearch indices.   Operations logs are created.


Expected results:

fluentd normal startup


Additional info:

Comment 1 Anping Li 2018-01-17 07:06:32 UTC

323M fluentd.logs in 10 min. There are pods in Evicted. 

docker-registry-1-4nqhg       1/1       Running   0          20h
docker-registry-1-4sngb       0/1       Evicted   0          22h
docker-registry-1-78mz5       0/1       Evicted   0          22h
docker-registry-1-stnqn       0/1       Evicted   0          21h
docker-registry-1-tqsjk       0/1       Evicted   0          22h

Comment 3 Mike Fiedler 2018-01-22 19:18:30 UTC

Problem still occurs on registry.reg-aws.openshift.com:443/openshift3/logging-fluentd:v3.9.0-0.22.0.0

registry.reg-aws.openshift.com:443/openshift3/logging-fluentd             v3.9.0-0.22.0.0     35b4c7263b16        2 days ago          275.5 MB

Comment 4 Jeff Cantrill 2018-01-22 21:10:25 UTC

I don't see where [1] is in the latest puddles [2] which is the only way this issue will be resolved.  Can you help us out.  The gem [1] should be available in 3.6->3.9 puddles

[1] https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=646322
[2] http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/3.9/latest/x86_64/os/Packages/

Comment 5 Rich Megginson 2018-01-23 17:15:31 UTC

(In reply to Jeff Cantrill from comment #4)
> I don't see where [1] is in the latest puddles [2] which is the only way
> this issue will be resolved.  Can you help us out.  The gem [1] should be
> available in 3.6->3.9 puddles
> 
> [1] https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=646322
> [2]
> http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/
> AtomicOpenShift/3.9/latest/x86_64/os/Packages/

1.0.1 was tagged into 3.9, 3.8, 3.7, 3.6, and those puddles were rebuilt.  You should be good to go for rebuilding the fluentd images for those releases.

Comment 6 Anping Li 2018-01-24 04:35:49 UTC

The fix isn't in logging-fluentd/images/v3.9.0-0.23.0.0.

Comment 7 Jeff Cantrill 2018-01-24 14:29:39 UTC

https://github.com/openshift/origin-aggregated-logging/pull/898

Comment 8 Mike Fiedler 2018-01-29 18:37:52 UTC

Verified on 3.9.0-0.31.0.  logging-fluentd is working normally in this puddle.

Comment 11 errata-xmlrpc 2018-03-28 14:17:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489

Note You need to log in before you can comment on or make changes to this bug.