Bug 1481364

Summary: [online-int] fluentd cannot read docker json-file logs
Product: OpenShift Online Reporter: Mike Fiedler <mifiedle>
Component: LoggingAssignee: Stefanie Forrester <dakini>
Status: CLOSED CURRENTRELEASE QA Contact: Mike Fiedler <mifiedle>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.xCC: abhgupta, aos-bugs, dakini, juzhao, mifiedle, rmeggins, xtian
Target Milestone: ---Keywords: OnlinePro, TestBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-09 18:49:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1478821    
Bug Blocks:    

Description Mike Fiedler 2017-08-14 18:11:53 UTC
Description of problem:

In online-int the fluentd logs are full of the following error:

2017-08-14 18:07:07 +0000 [warn]: /var/log/containers/centos-logtest-v7504_logtest0_centos-logtest-303a533752b67c4999c650d3efa3f85d7bb7463c57de004a14251b895b525c68.log unreadable. It is excluded and would be examined next time.
2017-08-14 18:07:07 +0000 [warn]: /var/log/containers/centos-logtest-k2800_logtest0_centos-logtest-93ba027e20c38b149f136bb444d47f5b20445639189ce519dcf1cdc69666b150.log unreadable. It is excluded and would be examined next time.
2017-08-14 18:08:07 +0000 [warn]: /var/log/containers/logging-fluentd-jt352_logging_fluentd-elasticsearch-ebf5743bb0260e0d44af8d05471b2952df1b4c4dfea1de2a6d0664f89ebb4214.log unreadable. It is excluded and would be examined next time.


Version-Release number of selected component (if applicable): v3.6.171.   

registry.reg-aws.openshift.com:443/online/logging-fluentd           v3.6.171            95b0d3814f38        2 weeks ago         231.7 MB


How reproducible:  Always


Steps to Reproduce:
1. oc logs <fluentd pod> in online-int


Actual results:

error every minute stating that the json log for each running pod is unreadable

Comment 2 Mike Fiedler 2017-08-14 18:46:48 UTC
No pod logs are making it to elastic in this environment:

[root@online-int-master-05114 ~]# oc exec $POD -- curl --connect-timeout 2 -s -k --cert /etc/elasticsearch/secret/admin-cert --key /etc/elasticsearch/secret/admin-key https://logging-es:9200/_cat/indices?v
health status index                                        pri rep docs.count docs.deleted store.size pri.store.size 
green  open   .operations.2017.08.13                         1   0     746874            0      162mb          162mb 
green  open   .searchguard.logging-es-data-master-2hh0ejvf   1   1          5            0     56.1kb           28kb 
green  open   .operations.2017.08.14                         1   0     599386            0    130.6mb        130.6mb 
green  open   .operations.2017.08.11                         1   0     750537            0    162.5mb        162.5mb 
green  open   .operations.2017.08.12                         1   0     745054            0    161.5mb        161.5mb 
green  open   .kibana                                        1   0          1            0      3.1kb          3.1kb 
green  open   .searchguard.logging-es-data-master-a2v15tbo   1   1          5            0     50.8kb           28kb 
green  open   .operations.2017.08.10                         1   0     726725            0    207.1mb        207.1mb

Comment 3 Mike Fiedler 2017-08-17 15:37:01 UTC
This blocks logging testing for Online.   @abhgupta @jcantril - IMO this is configuration - is Logging the right component?

Comment 4 Rich Megginson 2017-08-22 17:41:04 UTC
Are there any AVCs in /var/log/audit/audit.log ?

Comment 5 Mike Fiedler 2017-08-22 18:24:01 UTC
Audit and pod logs attached.   I see no related AVCs from a quick look.

Comment 9 Junqi Zhao 2017-08-24 02:30:15 UTC
same issue in online-stg, it is the same issue with 
https://bugzilla.redhat.com/show_bug.cgi?id=1481934

Comment 10 Abhishek Gupta 2017-08-25 19:34:49 UTC
Moving this bug to ON_QA since https://bugzilla.redhat.com/show_bug.cgi?id=1481934 was fixed/verified.

Comment 11 Mike Fiedler 2017-09-07 11:44:26 UTC
Verified fluentd is indexing pod logs correctly in online-int environment 
 v3.6.173.0.7 (online version 3.6.0.14)