Description of problem: A number of clients who are currently deploying OpenShift environments are requesting integration with Splunk as their log aggregation mechanism. This is because they have already made an enterprise-wide decision to use Splunk. Using EFK therefore represents a fragmentation of tooling, and a potential barrier for adoption if we can't integrate. From an implementation point of view, a templated solution (in much the same manner as the current analytics stacks) would be ideal - however, KB articles explaining the same would suffice in most cases.
Added to backlog: https://trello.com/c/j9BAcanp
Can the customer just install splunk on the nodes and tell splunk to read from /var/log/containers/*.log?
(In reply to Rich Megginson from comment #5) > Can the customer just install splunk on the nodes and tell splunk to read > from /var/log/containers/*.log? Is this something we know works, at least to some extent? I am not super familiar with how splunk works and while I want to offer them this alternative, I want to make sure we've seen this work before I offer something that won't do anything!
This works the problem is that without the k8i tagging that happens later in the stack that OSE provides the data you get is not very useful. You have no idea what is coming from where and what is tied to what. SO you get data... its just not very useful data in a large cluster.
(In reply to Boris Kurktchiev from comment #9) > This works the problem is that without the k8i tagging that happens later in > the stack that OSE provides the data you get is not very useful. You have no > idea what is coming from where and what is tied to what. SO you get data... > its just not very useful data in a large cluster. Can https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter be ported to splunk? That would give the splunk collector the ability to annotate the logs with all of the k8s metadata. We were planning to support the fluentd secure_forward output plugin in openshift 3.4 - can splunk accept input from this?
From my readings the generally accepted/best way to do this is to have FluentD forward to a syslog server which splunk is very happy to work with. All the FlauntD plugins I have done reading on seem to be either half baked or do not truly provide all that you need. Again, from my reading the easiest way to get this to work with most outside collectors would be FluentD->tag->external syslog, at least at the moment.
(In reply to Boris Kurktchiev from comment #12) > From my readings the generally accepted/best way to do this is to have > FluentD forward to a syslog server which splunk is very happy to work with. > All the FlauntD plugins I have done reading on seem to be either half baked > or do not truly provide all that you need. Again, from my reading the > easiest way to get this to work with most outside collectors would be > FluentD->tag->external syslog, at least at the moment. Which fluentd output plugin is that? Would we use http://docs.fluentd.org/articles/out_forward or http://docs.fluentd.org/articles/out_secure_forward ?
Sorry I was looking at this list http://www.fluentd.org/plugins with splunk specific plugins.
(In reply to Boris Kurktchiev from comment #14) > Sorry I was looking at this list http://www.fluentd.org/plugins with splunk > specific plugins. That is also an option, but we don't have any experience with them. You said "the generally accepted/best way to do this is to have FluentD forward to a syslog server which splunk is very happy to work with" - how do we configure FluentD to do that? Using http://docs.fluentd.org/articles/out_forward or http://docs.fluentd.org/articles/out_secure_forward ?
I am trying to dig up the splunk KB article I ended up finding... a while back. I will post it once I manage to get to it again.
Ah here is what I was reading (note that I now remember what my thought process was: i assumed that the data would come from elastic as thats the closest I managed to get to seeing it with fully k8s tagged data) https://answers.splunk.com/answers/233105/forward-data-from-logstash-forwarder-to-splunk-ind.html and the article in the same KB http://www.georgestarcher.com/splunk-success-with-syslog/ Either way, I am just brain storming as our ISO office has a requirement we push the data to our Splunk cluster :)
(In reply to Boris Kurktchiev from comment #17) > Ah here is what I was reading (note that I now remember what my thought > process was: i assumed that the data would come from elastic as thats the > closest I managed to get to seeing it with fully k8s tagged data) > https://answers.splunk.com/answers/233105/forward-data-from-logstash- > forwarder-to-splunk-ind.html > > and the article in the same KB > http://www.georgestarcher.com/splunk-success-with-syslog/ > > Either way, I am just brain storming as our ISO office has a requirement we > push the data to our Splunk cluster :) Do you require all of the kubernetes metadata from the container logs? When using docker with the --log-driver=json-file (the default), the container logs are written to log files in the following format: /var/log/containers/(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$ https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L37 That will get you the pod_name, the namespace name, the docker container_name, and the docker container_id. If you don't need the pod_uuid and the namespace_uuid, then you can get all the information you need with the splunk collector. If docker is configured to use --log-driver=journald, and if splunk can use the systemd journal as the log input, you can get the same sort of metadata for container logs, which will have a field CONTAINER_NAME in the following format: ^k8s_(?<container_name>[^\.]+)\.[^_]+_(?<pod_name>[^_]+)_(?<namespace>[^_]+)_[^_]+_[a-f0-9]{8}$ That will get you the docker container name, pod name, and namespace name. The docker container id will be in the field CONTAINER_ID_FULL. For example, from the output of journalctl -o export: _SOURCE_REALTIME_TIMESTAMP=1470684673007128 __REALTIME_TIMESTAMP=1470684673007128 _BOOT_ID=0937011437e44850b3cb5a615345b50f ... CONTAINER_NAME=k8s_this-is-container-01.deadbeef_this-is-pod-01_this-is-project- 01_253be207-0d87-4f25-bf84-50e1e7f0c081_abcdef01 CONTAINER_ID_FULL=4355a46b19d348dc2f57c046f8ef63d4538ebb936000f3c9ee954a27460dd865 CONTAINER_ID=4355a46b19d3 https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L55 Is this enough k8s metadata, or do you require the pod uuid and namespace/project uuid?
Those fields may not be required but definitely needed at some point as the whole thing started from an audibility standpoint. How would that affect developer access to their own logs though? The "have cake and eat it too" scenario would be: developers get Kibana, while system/ISO get splunk all at the same time. Also, as it stands right now I dont think I have touched the log-driver options so I dont even know what it fires up with out of the box.
(In reply to Boris Kurktchiev from comment #19) > Those fields may not be required but definitely needed at some point as the > whole thing started from an audibility standpoint. How would that affect > developer access to their own logs though? The "have cake and eat it too" > scenario would be: developers get Kibana, while system/ISO get splunk all at > the same time. You want to have the system (.e.g. /var/log/messages) logs go to splunk only, and have the container/application logs go both to splunk, and to the OpenShift aggregated Elasticsearch/Kibana? > > Also, as it stands right now I dont think I have touched the log-driver > options so I dont even know what it fires up with out of the box. It uses json-file by default.
Essentially yes. I will point out that the other big thing the data in elastic has over raw docker is the automatic ability to filter/field the data. Still achievable with what you are suggesting it just will require work on the splunk end to get it to do the field extractions etc
(In reply to Boris Kurktchiev from comment #21) > Essentially yes. I will point out that the other big thing the data in > elastic has over raw docker is the automatic ability to filter/field the > data. Still achievable with what you are suggesting it just will require > work on the splunk end to get it to do the field extractions etc Right - you'll have to implement the fluentd filter configuration in splunk. I don't know how hard that will be to do in splunk. We rely on the ability to execute arbitrary ruby code in the fluentd configuration to properly format the data for elasticsearch. For example: https://github.com/openshift/origin-aggregated-logging/blob/v1.3.0-alpha.3/fluentd/configs.d/openshift/filter-syslog-record-transform.conf#L7 If we are going to have the OpenShift fluentd send/copy data to splunk, we need to know what protocol to use, and what format to use. For example, if we can output data from fluentd to splunk using the rfc5424 (the new "syslog") protocol, we'll also need to convert the data from the elasticsearch json output format to the rfc5424 format. Maybe there is a way to have the fluentd output plugin just shove the json blob into the CEE field.
Yeah so that's what I think the "best" solution here is to have fluentd format the data and then send along to syslog server along with elastic, which was what I was trying to say I guess in previous comment. Splunk can accept syslog data like a champ (most day network devices can't push to splunk directly so an intermediate syslog server is used). So docker -> fluentd -> tag -> elastic + syslog will get me what I want which is kibana for devs and splunk for audit and archive purposes.
@Boris - Thanks - this is very useful information.
Whatever moves this along :), we are more than willing to test if it's needed
Hello all, I managed to solve it [to be further fine-tuned but works already] by: 1] docker logging - updated on global level in /etc/sysconfig/docker: OPTIONS=' --selinux-enabled --insecure-registry=172.30.0.0/16 --log-driver=splunk --log-opt splunk-url=http://our_splunk:8088 --log-opt splunk-token=our_splunk_token --log-opt tag="{{.Name}};{{.ImageName}}" --log-opt splunk-index=docker' 2] openshift / system logs to Splunk: [root@guerilla gto-splunk-rsyslog]# pwd /root/ansible/roles/gto-splunk-rsyslog [root@guerilla gto-splunk-rsyslog]# ll total 0 drwxr-xr-x. 2 root root 21 Sep 7 21:51 defaults drwxr-xr-x. 2 root root 78 Sep 7 21:11 files drwxr-xr-x. 2 root root 21 Sep 7 21:58 tasks drwxr-xr-x. 2 root root 21 Sep 7 21:34 vars [root@guerilla gto-splunk-rsyslog]# cat defaults/main.yml --- # forward all system messages to splunk rsyslog # can be overridden from ../vars or from master playbook splunk_string: '*.* @@our_splunk:1514' # no need to change below, just in case defaults are changed: rsyslog_config: /etc/rsyslog.conf systemd_lib: /usr/lib/systemd systemd_services: /lib/systemd/system [root@guerilla gto-splunk-rsyslog]# cat files/enable-rsyslog-remote-log #!/bin/bash case $1 in start) setsebool -P allow_ypbind=1 ;; stop) setsebool -P allow_ypbind=0 ; esac [root@guerilla gto-splunk-rsyslog]# cat files/enable-rsyslog-remote-log.service [Unit] Description=Enable rsyslog remote logging After=network.target Before=rsyslog.service [Service] Type=oneshot ExecStart=/usr/lib/systemd/enable-rsyslog-remote-log start ExecStop=/usr/lib/systemd/enable-rsyslog-remote-log stop RemainAfterExit=true User=root [Install] WantedBy=multi-user.target [root@guerilla gto-splunk-rsyslog]# cat tasks/main.yml --- - name: copy enable-rsyslog-remote-log copy: src=enable-rsyslog-remote-log dest={{ systemd_lib }} owner=root group=root mode=0744 - name: copy enable-rsyslog-remote-log.service copy: src=enable-rsyslog-remote-log.service dest={{ systemd_services }} owner=root group=root mode=0644 - name: update "{{ rsyslog_config }}" to forward all messages to Splunk lineinfile: dest={{ rsyslog_config }} line={{ splunk_string }} - name: reload system deamon shell: systemctl daemon-reload - name: start and enable enable-rsyslog-remote-log.service service: name=enable-rsyslog-remote-log enabled=yes state=started - name: restart and enable rsyslog service to load new config service: name=rsyslog enabled=yes state=restarted .. so then we get system logs [including openshift, but not sure if all] in Splunk. Q1: Can you please confirm that whatever is being logged to /var/log/messages or directly to local rsyslog is containing all logs from OpenShift? Q2: is there any way how to update the log level of OpenShift [to e.g. WARN]?
> 1] docker logging - updated on global level in /etc/sysconfig/docker: > OPTIONS=' --selinux-enabled --insecure-registry=172.30.0.0/16 --log-driver=splunk --log-opt splunk-url=http://our_splunk:8088 --log-opt splunk-token=our_splunk_token --log-opt tag="{{.Name}};{{.ImageName}}" --log-opt splunk-index=docker' If you do this, does `docker logs` work? What about `oc logs`? What about viewing container logs in the OpenShift console? > Q1: Can you please confirm that whatever is being logged to /var/log/messages or directly to local rsyslog is containing all logs from OpenShift? Container logs? No. Everything else? Yes - everything else goes through rsyslog, which it looks as though you are gathering, before it gets to /var/log/messages.
Reassigning to Eric as he has provided the changes. The PR to provide a mechanism for users to forward logs to alternate aggregation mechanisms: https://github.com/openshift/origin-aggregated-logging/pull/245
I am confused by the 3.1 version tag, does that mean you guys are going to release the integration before OCP 3.4?
dashing my hopes in an instant :)
Tested the fluentd secure forward function on the latest OCP 3.4.0, passed verification. Image tested with (ops registry): openshift3/logging-auth-proxy e96b37a99960 openshift3/logging-kibana 27f978fc2946 openshift3/logging-fluentd c493f8b4553b openshift3/logging-elasticsearch 3ca95a8a9433 openshift3/logging-curator e39988877cd9 openshift3/logging-deployer 1033ccb0557b openshift version: openshift v3.4.0.38 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066
FYI. Our company developed a solution for Monitoring OpenShift clusters in Splunk, this solution forwards logs (from containers, openshift components and hosts), and sends metrics (CPU, Memory, IO, etc). https://www.outcoldsolutions.com/#monitoring-openshift