Description of problem: After successful deployment of engine metrics, nothing is being sent from fluentd. Tried with custom metrics store, as well as viaq setup. I am not sure which logs would you like to see as these nor syslog shows any hint of error. I got the environment can provide anything you need. Version-Release number of selected component (if applicable): ovirt-engine-metrics-1.0.4.3-1.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. Create config.yml with your metrics store 2. run /usr/share/ovirt-engine-metrics/setup/ansible/configure_ovirt_machines_for_metrics.sh 3. Check fluentd and collectd are successfully running Actual results: # No incoming packets to metrics store # tcpdump -n dst port 24284 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes Additional info: [root@ls-engine1 ~]# date && systemctl status collectd fluentd Fri Jun 23 12:36:50 CEST 2017 ● collectd.service - Collectd statistics daemon Loaded: loaded (/usr/lib/systemd/system/collectd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/collectd.service.d └─postgresql.conf Active: active (running) since Fri 2017-06-23 11:15:11 CEST; 1h 21min ago Docs: man:collectd(1) man:collectd.conf(5) Main PID: 709 (collectd) CGroup: /system.slice/collectd.service └─709 /usr/sbin/collectd Jun 23 11:15:06 ls-engine1.example.com collectd[709]: plugin_load: plugin "swap" successfully loaded. Jun 23 11:15:06 ls-engine1.example.com collectd[709]: plugin_load: plugin "df" successfully loaded. Jun 23 11:15:06 ls-engine1.example.com collectd[709]: plugin_load: plugin "aggregation" successfully loaded. Jun 23 11:15:06 ls-engine1.example.com collectd[709]: plugin_load: plugin "processes" successfully loaded. Jun 23 11:15:06 ls-engine1.example.com collectd[709]: plugin_load: plugin "postgresql" successfully loaded. Jun 23 11:15:06 ls-engine1.example.com collectd[709]: plugin_load: plugin "write_http" successfully loaded. Jun 23 11:15:11 ls-engine1.example.com collectd[709]: Systemd detected, trying to signal readyness. Jun 23 11:15:11 ls-engine1.example.com systemd[1]: Started Collectd statistics daemon. Jun 23 11:15:11 ls-engine1.example.com collectd[709]: Initialization complete, entering read-loop. Jun 23 11:15:11 ls-engine1.example.com collectd[709]: Successfully connected to database engine (user engine) at server localhost:5432 (server version: 9.2.18, protocol version: 3, pid: 739) ● fluentd.service - Fluentd Loaded: loaded (/usr/lib/systemd/system/fluentd.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2017-06-23 11:14:58 CEST; 1h 21min ago Docs: http://www.fluentd.org/ Main PID: 649 (fluentd) CGroup: /system.slice/fluentd.service ├─649 /usr/bin/ruby /usr/bin/fluentd -c /etc/fluentd/fluent.conf └─710 /usr/bin/ruby /usr/bin/fluentd -c /etc/fluentd/fluent.conf Jun 23 11:15:09 ls-engine1.example.com fluentd[649]: </match> Jun 23 11:15:09 ls-engine1.example.com fluentd[649]: </ROOT> Jun 23 11:15:09 ls-engine1.example.com fluentd[649]: 2017-06-23 11:15:09 +0200 [debug]: listening http on localhost:9880 Jun 23 11:15:09 ls-engine1.example.com fluentd[649]: 2017-06-23 11:15:09 +0200 [info]: following tail of /var/log/ovirt-engine/engine.log Jun 23 11:15:14 ls-engine1.example.com fluentd[649]: 2017-06-23 11:15:14 +0200 [warn]: dead connection found: lsvaty-vm1.example.com, reconnecting... Jun 23 11:15:14 ls-engine1.example.com fluentd[649]: 2017-06-23 11:15:14 +0200 fluent.warn: {"message":"dead connection found: lsvaty-vm1.example.com, reconnecting..."} Jun 23 11:15:14 ls-engine1.example.com fluentd[649]: 2017-06-23 11:15:14 +0200 [info]: connection established to lsvaty-vm1.example.com Jun 23 11:15:14 ls-engine1.example.com fluentd[649]: 2017-06-23 11:15:14 +0200 fluent.info: {"message":"connection established to lsvaty-vm1.example.com"} Jun 23 11:15:19 ls-engine1.example.com fluentd[649]: 2017-06-23 11:15:19 +0200 [warn]: recovered connection to dead node: lsvaty-vm1.example.com Jun 23 11:15:19 ls-engine1.example.com fluentd[649]: 2017-06-23 11:15:19 +0200 fluent.warn: {"message":"recovered connection to dead node: lsvaty-vm1.example.com"}0200 [info]: connection established to lsvaty-vm1.example.com.com Jun 23 11:15:14 ls-engine1.example.com.com fluentd[649]: 2017-06-23 11:15:14 +0200 fluent.info: {"message":"connection established to lsvaty-vm1.example.com.com"} Jun 23 11:15:19 ls-engine1.example.com.com fluentd[649]: 2017-06-23 11:15:19 +0200 [warn]: recovered connection to dead node: lsvaty-vm1.example.com.com Jun 23 11:15:19 ls-engine1.example.com.com fluentd[649]: 2017-06-23 11:15:19 +0200 fluent.warn: {"message":"recovered connection to dead node: lsvaty-vm1.example.com.com"}
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
These errors occur when the fluentd does not manage to connect to the remote fluentd. I don't believe this is a blocker to the other bug. Rich should check why the remote fluentd is having these errors: "2017-06-26 11:15:48 +0200 [warn]: emit transaction failed: error_class=Fluent::BufferQueueLimitError error=\"queue size exceeds limit\" tag=\"project.ovirt-metrics-lsvaty_test-@kibana-highlighted-field@ovirt@/kibana-highlighted-field@\""
So in the end this was 2 errors, that at least I am aware of. 1. Misconfiguration of Viaq setup, not all hostnames were resolvable from all the machines 2. Misconfiguration of non-ViaQ setup, fluentd was not able to establish connection due to outdated certificate. Due to these closing this issue.