Description of problem: Seems the logging system don't support crio. there are mainly three issues in crio. 1) The container log are written to /var/log/containers/. For docker daemon, the format is json file. But for crio, It use rsyslog format. For example: 2017-11-26T20:29:53.149616443-05:00 stderr I1127 01:29:53.149602 1 round_trippers.go:436] POST https://172.30.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews 201 Created in 0 milliseconds 2017-11-26T20:29:53.149668334-05:00 stderr I1127 01:29:53.149661 1 round_trippers.go:442] Response Headers: 2017-11-26T20:29:53.149694943-05:00 stderr I1127 01:29:53.149688 1 round_trippers.go:445] Cache-Control: no-store 2017-11-26T20:29:53.149722042-05:00 stderr I1127 01:29:53.149710 1 round_trippers.go:445] Content-Type: application/json 2017-11-26T20:29:53.149749783-05:00 stderr I1127 01:29:53.149743 1 round_trippers.go:445] Content-Length: 538 2017-11-26T20:29:53.149770970-05:00 stderr I1127 01:29:53.149765 1 round_trippers.go:445] Date: Mon, 27 Nov 2017 01:29:53 GMT 2017-11-26T20:29:53.149804101-05:00 stderr I1127 01:29:53.149796 1 request.go:836] Response Body: {"kind":"SubjectAccessReview","apiVersion":"authorization.k8s.io/v1beta1","metadata":{"creationTimestamp":null},"spec":{"resourceAttributes":{"verb":"update","group":"servicecatalog.k8s.io","version":"v1beta1","resource":"clusterserviceclasses","name":"5247e02c-d30c-11e7-aaad-fa163e4d160c"},"user":"system:serviceaccount:kube-service-catalog:service-catalog-controller","group":["system:serviceaccounts","system:serviceaccounts:kube-service-catalog","system:authenticated"]},"status":{"allowed":true,"reason":"allowed by cluster rule"}} 2) kibana failed to connect to Elasticsearch, it report "Unable to connect to Elasticsearch at https://localhost:9200. " 3) curator was restarted many times, i think it couldn't connect to Elasticsearch too. Version-Release number of selected component (if applicable): openshift-ansible-3.7.9-1.git.4.d445616.el7.noarch How reproducible: always Steps to Reproduce: 1. install OCP-3.7 with crio openshift_use_crio=true openshift_crio_systemcontainer_image_override=registry.access.xxx.redhat.com/openshift3/cri-o:v3.7 2. deploy logging 3. Check the fluent, elastic-search and Kibana. Actual results: 1) fluentd print noisy messege 2) fluentd use docker configure file 3) the container logs can't be collected. 4) the kibana couldn't connected to elastic search 5) curator are restarted many times Expected results: Both system and container logs can be collected Additional info:
I am able to reproduce and tried to do a little bit more troubleshooting. There appear to be two issues as correctly observed by QE 1) not overriding image environment variables from kubernetes - default for an environment variable "ES_HOST=localhost" is in the image [1] - kubernetes DC overrides the value [2] - the override is not correctly propagated to the environment of the container And when I tried to not provide the default "ES_HOST" in the image, kubernetes was able to set the env variable correctly through a DC 2) default logging format is 'text' and the cri-o system container doesn't have it configurable - https://www.mankier.com/8/crio#--log-format allows to set the log format to 'json' - but it appears to work only for the 'crio' daemon logs, not the container logs I think both could be potentially issues rather with crio than with logging alone as I think they go beyond logging [1] https://github.com/openshift/origin-aggregated-logging/blob/master/curator/Dockerfile.centos7#L8 [2] https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_logging_curator/templates/curator.j2#L68-L69
By grepping the code of cri-o in rhel 7.4, it does not seem to pass the --log-format flag to runc. The only code touching log-format is: 368 switch c.GlobalString("log-format") { 369 case "text": 370 // retain logrus's default. 371 case "json": 372 logrus.SetFormatter(new(logrus.JSONFormatter)) 373 default: 374 return fmt.Errorf("unknown log-format %q", c.GlobalString("log-format")) 375 }
There is a way to make our fluentd pipeline able to parse cri-o logs. A workaround until the cri-o container logs respect the --log-format command line option is described in: https://trello.com/c/ktGIxQGf/585-5-online-crio-fluentd-understands-the-cri-log-format-loggingepic-crio https://github.com/kubernetes/kubernetes/pull/54777
cri-o team is investigating the non-propagating of env variables https://github.com/kubernetes-incubator/cri-o/issues/1293
Ref changes from upstream: https://github.com/kubernetes/kubernetes/commit/70a0cdfa8e05ac47d7dd04b032ceb79bead3fb5f
support for cri-o format in fluentd https://github.com/openshift/origin-aggregated-logging/pull/949 https://github.com/openshift/openshift-ansible/pull/7102
The crio works with logging-fluentd/images/v3.9.1
Moved to verified. The crio container logs can be collected. So no block on logging for our test/release now.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489