Created attachment 1377024 [details] logging-fluentd log Description of problem: The latest (as of 4 Jan) logging-fluentd image (v3.9.0-0.16.0.2) seems broken. Immediately on startup, the fluentd pod starts flooding error messages complaining about missing namespaces with bad message content. Partial message below, full log attached. There are no pods running on the system. Docker is configured for json-file. This issue was not seen with logging-fluentd v3.9.0-0.9.0 2018-01-04 16:43:04 +0000 [error]: record cannot use elasticsearch index name type project_full: record is missing kubernetes.namespace_id field: {"docker"=>{"container_id"=>"cdad990c2155e438df453d8caf4808424539ded32bba674536ff69df06b1e25e"}, "kubernetes"=>{"container_name"=>"fluentd-elasticsearch", "namespace_name"=>"logging", "pod_name"=>"logging-fluentd-7mcsn", "pod_id"=>"38f5d4c7-f16e-11e7-b343-024338e41dd2", "labels"=>{"component"=>"fluentd", "controller-revision-hash"=>"2355984793", "logging-infra"=>"fluentd", "pod-template-generation"=>"1", "provider"=>"openshift"}, "host"=>"ip-172-31-15-26.us-west-2.compute.internal", "master_url"=>"https://kubernetes.default.svc.cluster.local"}, "message"=>"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ <snip - see attached for full messages> Version-Release number of selected component (if applicable): logging-fluentd v3.9.0-0.16.2 How reproducible: Always when starting logging-fluentd Steps to Reproduce: 1. Deploy logging v3.9.0-0.16.2 normally using openshift-ansible (docker configured on all nodes for json-file) 2. Verify elasticsearch starts correctly 3. oc logs <fluentd pod> for a system where no other pods are running Actual results: See attached errors. Additionally, no pod logs appear in Elasticsearch indices. Operations logs are created. Expected results: fluentd normal startup Additional info:
323M fluentd.logs in 10 min. There are pods in Evicted. docker-registry-1-4nqhg 1/1 Running 0 20h docker-registry-1-4sngb 0/1 Evicted 0 22h docker-registry-1-78mz5 0/1 Evicted 0 22h docker-registry-1-stnqn 0/1 Evicted 0 21h docker-registry-1-tqsjk 0/1 Evicted 0 22h
Problem still occurs on registry.reg-aws.openshift.com:443/openshift3/logging-fluentd:v3.9.0-0.22.0.0 registry.reg-aws.openshift.com:443/openshift3/logging-fluentd v3.9.0-0.22.0.0 35b4c7263b16 2 days ago 275.5 MB
I don't see where [1] is in the latest puddles [2] which is the only way this issue will be resolved. Can you help us out. The gem [1] should be available in 3.6->3.9 puddles [1] https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=646322 [2] http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/AtomicOpenShift/3.9/latest/x86_64/os/Packages/
(In reply to Jeff Cantrill from comment #4) > I don't see where [1] is in the latest puddles [2] which is the only way > this issue will be resolved. Can you help us out. The gem [1] should be > available in 3.6->3.9 puddles > > [1] https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=646322 > [2] > http://download-node-02.eng.bos.redhat.com/rcm-guest/puddles/RHAOS/ > AtomicOpenShift/3.9/latest/x86_64/os/Packages/ 1.0.1 was tagged into 3.9, 3.8, 3.7, 3.6, and those puddles were rebuilt. You should be good to go for rebuilding the fluentd images for those releases.
The fix isn't in logging-fluentd/images/v3.9.0-0.23.0.0.
https://github.com/openshift/origin-aggregated-logging/pull/898
Verified on 3.9.0-0.31.0. logging-fluentd is working normally in this puddle.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489