Bug 1512028

Summary: operations logs lost if no ops cluster and no @OUTPUT label
Product: OpenShift Container Platform Reporter: Rich Megginson <rmeggins>
Component: LoggingAssignee: Rich Megginson <rmeggins>
Status: CLOSED ERRATA QA Contact: Anping Li <anli>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.6.1CC: anli, aos-bugs, jcantril, nhosoi, pportant, pweil, rmeggins, tkatarki
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: When upgrading to a new version of logging, if the deployment does not use a separate OPS logging cluster, and there is no @OUTPUT label in the fluent.conf, fluentd does not know how to route the operations logs. Consequence: The operations logs are not logged. User will see no new operations (.operations.*) logs after the upgrade. Fix: Change the fluentd configuration to correctly handle the case both with and without the @OUTPUT label, and for the OPS and non-OPS cases. Result: Operations logs flow uninterrupted after upgrade.
Story Points: ---
Clone Of: 1511719 Environment:
Last Closed: 2017-12-07 06:49:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1511719    
Bug Blocks:    

Description Rich Megginson 2017-11-10 17:21:52 UTC
+++ This bug was initially created as a clone of Bug #1511719 +++

Description of problem:
Deploy logging with no ops cluster and based on 3.6 or an earlier 3.7 pre-release which does not have the @OUTPUT label in fluent.conf

update to a recent fluentd image

no operations logs are stored

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Rich Megginson on 2017-11-10 12:20:45 EST ---

https://github.com/openshift/origin-aggregated-logging/commit/61d1196eeab4ac02b360b20f9338d3ab2b5d5eef

Comment 1 Rich Megginson 2017-11-11 00:34:28 UTC
koji_builds:
  https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=623860
repositories:
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:rhaos-3.6-rhel-7-docker-candidate-56826-20171111002326
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:latest
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.6.173.0.73
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.6.173.0.73-5
  brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/logging-fluentd:v3.6

Comment 3 Anping Li 2017-11-14 07:56:28 UTC
I can reproduced by deploy logging:3.6.173.0.49 using openshift-ansible-                       3.6.173.0.5.

Once we update to 3.6.173.0.63, no such issue.

logging-kibana/images/3.6.173.0.63-10
logging-elasticsearch/images/3.6.173.0.63-10
logging-auth-proxy/images/3.6.173.0.63-10
logging-fluentd/images/3.6.173.0.63-10
logging-curator/images/3.6.173.0.63-10

oc exec -c elasticsearch logging-es-data-master-foodc7mt-2-3bw56 -- curl -s -XGET --cacert /etc/elasticsearch/secret/admin-ca --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://localhost:9200/.operations*/_count
{"count":421,"_shards":{"total":1,"successful":1,"failed":0}}

Comment 6 errata-xmlrpc 2017-12-07 06:49:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3390