1327101 – [RFE] Provide KB / Tooling to integrate OpenShift Logging into Splunk rather than the supplied EFK stack

Bug 1327101 - [RFE] Provide KB / Tooling to integrate OpenShift Logging into Splunk rather than the supplied EFK stack

Summary: [RFE] Provide KB / Tooling to integrate OpenShift Logging into Splunk rather ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RFE
Sub Component:
Version:	3.1.0
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	ewolinet
QA Contact:	Xia Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1510276
TreeView+	depends on / blocked

Reported:	2016-04-14 09:45 UTC by Benjamin Holmes
Modified:	2021-08-30 13:02 UTC (History)
CC List:	20 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1510276 (view as bug list)
Environment:
Last Closed:	2017-01-18 12:39:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	2849321	0	None	None	None	2018-02-09 03:18:34 UTC
Red Hat Product Errata	RHBA-2017:0066	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.4 RPM Release Advisory	2017-01-18 17:23:26 UTC

Description Benjamin Holmes 2016-04-14 09:45:56 UTC

Description of problem:

A number of clients who are currently deploying OpenShift environments are requesting integration with Splunk as their log aggregation mechanism. This is because they have already made an enterprise-wide decision to use Splunk. Using EFK therefore represents a fragmentation of tooling, and a potential barrier for adoption if we can't integrate.

From an implementation point of view, a templated solution (in much the same manner as the current analytics stacks) would be ideal - however, KB articles explaining the same would suffice in most cases.

Comment 1 Jeff Cantrill 2016-04-14 13:23:52 UTC

Added to backlog: https://trello.com/c/j9BAcanp

Comment 5 Rich Megginson 2016-08-09 22:29:50 UTC

Can the customer just install splunk on the nodes and tell splunk to read from /var/log/containers/*.log?

Comment 8 Steven Walter 2016-08-10 13:45:56 UTC

(In reply to Rich Megginson from comment #5)
> Can the customer just install splunk on the nodes and tell splunk to read
> from /var/log/containers/*.log?

Is this something we know works, at least to some extent? I am not super familiar with how splunk works and while I want to offer them this alternative, I want to make sure we've seen this work before I offer something that won't do anything!

Comment 9 Boris Kurktchiev 2016-08-10 13:52:12 UTC

This works the problem is that without the k8i tagging that happens later in the stack that OSE provides the data you get is not very useful. You have no idea what is coming from where and what is tied to what. SO you get data... its just not very useful data in a large cluster.

Comment 11 Rich Megginson 2016-08-10 14:00:30 UTC

(In reply to Boris Kurktchiev from comment #9)
> This works the problem is that without the k8i tagging that happens later in
> the stack that OSE provides the data you get is not very useful. You have no
> idea what is coming from where and what is tied to what. SO you get data...
> its just not very useful data in a large cluster.

Can https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter be ported to splunk?  That would give the splunk collector the ability to annotate the logs with all of the k8s metadata.

We were planning to support the fluentd secure_forward output plugin in openshift 3.4 - can splunk accept input from this?

Comment 12 Boris Kurktchiev 2016-08-10 14:05:53 UTC

From my readings the generally accepted/best way to do this is to have FluentD forward to a syslog server which splunk is very happy to work with. All the FlauntD plugins I have done reading on seem to be either half baked or do not truly provide all that you need. Again, from my reading the easiest way to get this to work with most outside collectors would be FluentD->tag->external syslog, at least at the moment.

Comment 13 Rich Megginson 2016-08-10 14:21:18 UTC

(In reply to Boris Kurktchiev from comment #12)
> From my readings the generally accepted/best way to do this is to have
> FluentD forward to a syslog server which splunk is very happy to work with.
> All the FlauntD plugins I have done reading on seem to be either half baked
> or do not truly provide all that you need. Again, from my reading the
> easiest way to get this to work with most outside collectors would be
> FluentD->tag->external syslog, at least at the moment.

Which fluentd output plugin is that?  Would we use http://docs.fluentd.org/articles/out_forward or http://docs.fluentd.org/articles/out_secure_forward ?

Comment 14 Boris Kurktchiev 2016-08-10 14:28:33 UTC

Sorry I was looking at this list http://www.fluentd.org/plugins with splunk specific plugins.

Comment 15 Rich Megginson 2016-08-10 14:31:07 UTC

(In reply to Boris Kurktchiev from comment #14)
> Sorry I was looking at this list http://www.fluentd.org/plugins with splunk
> specific plugins.

That is also an option, but we don't have any experience with them.

You said "the generally accepted/best way to do this is to have FluentD forward to a syslog server which splunk is very happy to work with" - how do we configure FluentD to do that?  Using http://docs.fluentd.org/articles/out_forward or http://docs.fluentd.org/articles/out_secure_forward ?

Comment 16 Boris Kurktchiev 2016-08-10 14:37:07 UTC

I am trying to dig up the splunk KB article I ended up finding... a while back. I will post it once I manage to get to it again.

Comment 17 Boris Kurktchiev 2016-08-10 14:45:57 UTC

Ah here is what I was reading (note that I now remember what my thought process was: i assumed that the data would come from elastic as thats the closest I managed to get to seeing it with fully k8s tagged data)
https://answers.splunk.com/answers/233105/forward-data-from-logstash-forwarder-to-splunk-ind.html

and the article in the same KB http://www.georgestarcher.com/splunk-success-with-syslog/

Either way, I am just brain storming as our ISO office has a requirement we push the data to our Splunk cluster :)

Comment 18 Rich Megginson 2016-08-10 19:00:56 UTC

(In reply to Boris Kurktchiev from comment #17)
> Ah here is what I was reading (note that I now remember what my thought
> process was: i assumed that the data would come from elastic as thats the
> closest I managed to get to seeing it with fully k8s tagged data)
> https://answers.splunk.com/answers/233105/forward-data-from-logstash-
> forwarder-to-splunk-ind.html
> 
> and the article in the same KB
> http://www.georgestarcher.com/splunk-success-with-syslog/
> 
> Either way, I am just brain storming as our ISO office has a requirement we
> push the data to our Splunk cluster :)

Do you require all of the kubernetes metadata from the container logs?  When using docker with the --log-driver=json-file (the default), the container logs are written to log files in the following format:

/var/log/containers/(?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace>[^_]+)_(?<container_name>.+)-(?<docker_id>[a-z0-9]{64})\.log$

https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L37

That will get you the pod_name, the namespace name, the docker container_name, and the docker container_id.  If you don't need the pod_uuid and the namespace_uuid, then you can get all the information you need with the splunk collector.

If docker is configured to use --log-driver=journald, and if splunk can use the systemd journal as the log input, you can get the same sort of metadata for container logs, which will have a field CONTAINER_NAME in the following format:

^k8s_(?<container_name>[^\.]+)\.[^_]+_(?<pod_name>[^_]+)_(?<namespace>[^_]+)_[^_]+_[a-f0-9]{8}$

That will get you the docker container name, pod name, and namespace name.  The docker container id will be in the field CONTAINER_ID_FULL.  For example, from the output of journalctl -o export:

_SOURCE_REALTIME_TIMESTAMP=1470684673007128
__REALTIME_TIMESTAMP=1470684673007128
_BOOT_ID=0937011437e44850b3cb5a615345b50f
...
CONTAINER_NAME=k8s_this-is-container-01.deadbeef_this-is-pod-01_this-is-project-
01_253be207-0d87-4f25-bf84-50e1e7f0c081_abcdef01
CONTAINER_ID_FULL=4355a46b19d348dc2f57c046f8ef63d4538ebb936000f3c9ee954a27460dd865
CONTAINER_ID=4355a46b19d3

https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/master/lib/fluent/plugin/filter_kubernetes_metadata.rb#L55

Is this enough k8s metadata, or do you require the pod uuid and namespace/project uuid?

Comment 19 Boris Kurktchiev 2016-08-10 19:19:00 UTC

Those fields may not be required but definitely needed at some point as the whole thing started from an audibility standpoint. How would that affect developer access to their own logs though? The "have cake and eat it too" scenario would be: developers get Kibana, while system/ISO get splunk all at the same time. 

Also, as it stands right now I dont think I have touched the log-driver options so I dont even know what it fires up with out of the box.

Comment 20 Rich Megginson 2016-08-10 20:15:50 UTC

(In reply to Boris Kurktchiev from comment #19)
> Those fields may not be required but definitely needed at some point as the
> whole thing started from an audibility standpoint. How would that affect
> developer access to their own logs though? The "have cake and eat it too"
> scenario would be: developers get Kibana, while system/ISO get splunk all at
> the same time. 

You want to have the system (.e.g. /var/log/messages) logs go to splunk only, and have the container/application logs go both to splunk, and to the OpenShift aggregated Elasticsearch/Kibana?

> 
> Also, as it stands right now I dont think I have touched the log-driver
> options so I dont even know what it fires up with out of the box.

It uses json-file by default.

Comment 21 Boris Kurktchiev 2016-08-10 20:22:29 UTC

Essentially yes. I will point out that the other big thing the data in elastic has over raw docker is the automatic ability to filter/field the data. Still achievable with what you are suggesting it just will require work on the splunk end to get it to do the field extractions etc

Comment 22 Rich Megginson 2016-08-10 20:38:17 UTC

(In reply to Boris Kurktchiev from comment #21)
> Essentially yes. I will point out that the other big thing the data in
> elastic has over raw docker is the automatic ability to filter/field the
> data. Still achievable with what you are suggesting it just will require
> work on the splunk end to get it to do the field extractions etc

Right - you'll have to implement the fluentd filter configuration in splunk.  I don't know how hard that will be to do in splunk.  We rely on the ability to execute arbitrary ruby code in the fluentd configuration to properly format the data for elasticsearch.  For example: https://github.com/openshift/origin-aggregated-logging/blob/v1.3.0-alpha.3/fluentd/configs.d/openshift/filter-syslog-record-transform.conf#L7

If we are going to have the OpenShift fluentd send/copy data to splunk, we need to know what protocol to use, and what format to use.  For example, if we can output data from fluentd to splunk using the rfc5424 (the new "syslog") protocol, we'll also need to convert the data from the elasticsearch json output format to the rfc5424 format.  Maybe there is a way to have the fluentd output plugin just shove the json blob into the CEE field.

Comment 23 Boris Kurktchiev 2016-08-10 20:42:58 UTC

Yeah so that's what I think the "best" solution here is to have fluentd format the data and then send along to syslog server along with elastic, which was what I was trying to say I guess in previous comment. Splunk can accept syslog data like a champ (most day network devices can't push to splunk directly so an intermediate syslog server is used). So docker -> fluentd -> tag -> elastic + syslog will get me what I want which is kibana for devs and splunk for audit and archive purposes.

Comment 24 Rich Megginson 2016-08-10 20:53:41 UTC

@Boris - Thanks - this is very useful information.

Comment 25 Boris Kurktchiev 2016-08-10 21:00:36 UTC

Whatever moves this along :), we are more than willing to test if it's needed

Comment 31 Ondrej Kunc 2016-09-08 09:23:09 UTC

Hello all,

I managed to solve it [to be further fine-tuned but works already] by:

1] docker logging - updated on global level in /etc/sysconfig/docker:

OPTIONS=' --selinux-enabled --insecure-registry=172.30.0.0/16 --log-driver=splunk --log-opt splunk-url=http://our_splunk:8088 --log-opt splunk-token=our_splunk_token --log-opt tag="{{.Name}};{{.ImageName}}" --log-opt splunk-index=docker'

2] openshift / system logs to Splunk:

[root@guerilla gto-splunk-rsyslog]# pwd
/root/ansible/roles/gto-splunk-rsyslog
[root@guerilla gto-splunk-rsyslog]# ll
total 0
drwxr-xr-x. 2 root root 21 Sep  7 21:51 defaults
drwxr-xr-x. 2 root root 78 Sep  7 21:11 files
drwxr-xr-x. 2 root root 21 Sep  7 21:58 tasks
drwxr-xr-x. 2 root root 21 Sep  7 21:34 vars
[root@guerilla gto-splunk-rsyslog]# cat defaults/main.yml 
---
# forward all system messages to splunk rsyslog
# can be overridden from ../vars or from master playbook
splunk_string: '*.* @@our_splunk:1514'

# no need to change below, just in case defaults are changed:
rsyslog_config: /etc/rsyslog.conf
systemd_lib: /usr/lib/systemd
systemd_services: /lib/systemd/system
[root@guerilla gto-splunk-rsyslog]# cat files/enable-rsyslog-remote-log
#!/bin/bash

case $1 in
	start) setsebool -P allow_ypbind=1 ;;
	stop) setsebool -P allow_ypbind=0 ;
esac
[root@guerilla gto-splunk-rsyslog]# cat files/enable-rsyslog-remote-log.service 
[Unit]
Description=Enable rsyslog remote logging
After=network.target
Before=rsyslog.service

[Service]
Type=oneshot
ExecStart=/usr/lib/systemd/enable-rsyslog-remote-log start
ExecStop=/usr/lib/systemd/enable-rsyslog-remote-log stop
RemainAfterExit=true
User=root

[Install]
WantedBy=multi-user.target
[root@guerilla gto-splunk-rsyslog]# cat tasks/main.yml 
---
- name: copy enable-rsyslog-remote-log
  copy: src=enable-rsyslog-remote-log dest={{ systemd_lib }} owner=root group=root mode=0744

- name: copy enable-rsyslog-remote-log.service
  copy: src=enable-rsyslog-remote-log.service dest={{ systemd_services }} owner=root group=root mode=0644

- name: update "{{ rsyslog_config }}" to forward all messages to Splunk
  lineinfile: dest={{ rsyslog_config }} line={{ splunk_string }}

- name: reload system deamon
  shell: systemctl daemon-reload

- name: start and enable enable-rsyslog-remote-log.service
  service: name=enable-rsyslog-remote-log enabled=yes state=started

- name: restart and enable rsyslog service to load new config
  service: name=rsyslog enabled=yes state=restarted

.. so then we get system logs [including openshift, but not sure if all] in Splunk.

Q1: Can you please confirm that whatever is being logged to /var/log/messages or directly to local rsyslog is containing all logs from OpenShift?
Q2: is there any way how to update the log level of OpenShift [to e.g. WARN]?

Comment 32 Rich Megginson 2016-09-08 15:29:40 UTC

> 1] docker logging - updated on global level in /etc/sysconfig/docker:

> OPTIONS=' --selinux-enabled --insecure-registry=172.30.0.0/16 --log-driver=splunk --log-opt splunk-url=http://our_splunk:8088 --log-opt splunk-token=our_splunk_token --log-opt tag="{{.Name}};{{.ImageName}}" --log-opt splunk-index=docker'

If you do this, does `docker logs` work?  What about `oc logs`?  What about viewing container logs in the OpenShift console?

> Q1: Can you please confirm that whatever is being logged to /var/log/messages or directly to local rsyslog is containing all logs from OpenShift?

Container logs?  No.  Everything else?  Yes - everything else goes through rsyslog, which it looks as though you are gathering, before it gets to /var/log/messages.

Comment 33 Jeff Cantrill 2016-09-26 12:52:47 UTC

Reassigning to Eric as he has provided the changes.  The PR to provide a mechanism for users to forward logs to alternate aggregation mechanisms: https://github.com/openshift/origin-aggregated-logging/pull/245

Comment 41 Boris Kurktchiev 2016-10-28 13:47:18 UTC

I am confused by the 3.1 version tag, does that mean you guys are going to release the integration before OCP 3.4?

Comment 42 Boris Kurktchiev 2016-10-28 14:08:54 UTC

dashing my hopes in an instant :)

Comment 46 Xia Zhao 2016-12-26 06:18:32 UTC

Tested the fluentd secure forward function on the latest OCP 3.4.0, passed verification. 

Image tested with (ops registry):
openshift3/logging-auth-proxy    e96b37a99960
openshift3/logging-kibana    27f978fc2946
openshift3/logging-fluentd    c493f8b4553b
openshift3/logging-elasticsearch    3ca95a8a9433
openshift3/logging-curator    e39988877cd9
openshift3/logging-deployer    1033ccb0557b

openshift version:
openshift v3.4.0.38
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Comment 48 errata-xmlrpc 2017-01-18 12:39:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066

Comment 49 Denis Gladkikh 2018-01-06 05:58:09 UTC

FYI. Our company developed a solution for Monitoring OpenShift clusters in Splunk, this solution forwards logs (from containers, openshift components and hosts), and sends metrics (CPU, Memory, IO, etc). https://www.outcoldsolutions.com/#monitoring-openshift

Note You need to log in before you can comment on or make changes to this bug.

aos-bugs
bholmes
denis
erich
jcantril
jkaur
jokerman
kurktchiev
mark.vinkx
mbarrett
misalunk
mmccomas
nmalik
ondrej.kunc
rmeggins
stwalter
szobair
tdawson
tthrockm
wsun