1475795 – [RFE] Parameterize fluentd Verbosity logs Level

Bug 1475795 - [RFE] Parameterize fluentd Verbosity logs Level

Summary: [RFE] Parameterize fluentd Verbosity logs Level

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine-metrics
Classification:	oVirt
Component:	RFEs
Sub Component:
Version:	1.0.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-4.2.0
Target Release:	1.0.7.1
Assignee:	Shirly Radco
QA Contact:	Lukas Svaty
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	oVirt-Metrics-and-Logs 1475135
TreeView+	depends on / blocked

Reported:	2017-07-27 11:40 UTC by Shirly Radco
Modified:	2017-12-20 11:09 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-12-20 11:09:15 UTC
oVirt Team:	Metrics
Embargoed:
Flags:	rule-engine: ovirt-4.2? lsvaty: testing_plan_complete+ rule-engine: planning_ack? sradco: devel_ack+ lsvaty: testing_ack+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
oVirt gerrit	81551	0	master	MERGED	fluentd: add role to set system configurations	2017-09-28 20:08:41 UTC

Description Shirly Radco 2017-07-27 11:40:36 UTC

Description of problem:
The default log level is 'info', and Fluentd outputs 'info', 'warn', 'error' and 'fatal' logs by default.
We want to decrease verbosity level to log only 'error' messages.
We will make this value as a parameter that can be updated for debugging proposes.

Comment 1 Yaniv Lavi 2017-07-31 08:54:00 UTC

Please check with the viaq team why this doesn't come out of the box with the log level you are suggesting.

Comment 2 Yaniv Lavi 2017-07-31 08:54:39 UTC

Also, add the use case to why we need to change this.

Comment 3 Shirly Radco 2017-07-31 09:38:31 UTC

Description of problem:
Currently when the client fluentd fails to connect to the remote fluentd(mux) it starts logging warning messages every retry.

Suggested solution is to decrease verbosity level to log only 'error' messages.

Comment 4 Yaniv Kaul 2017-09-05 08:32:08 UTC

Shirly - status? devel-ack?

Comment 5 Lukas Svaty 2017-09-25 10:21:52 UTC

Don't think this is an RFE. Please consider removing FutureFeature keyword.

Comment 6 Shirly Radco 2017-09-28 20:13:18 UTC

I left the default log level as info and added the log_level as an ansible parameter.

Now user can set it in the config.yml as needed.

Comment 7 Lukas Svaty 2017-10-04 16:12:40 UTC

2 issues found with this RFE that needs fixing: ovirt-engine-metrics-1.1.1-0.0.master.20171001113530.el7.centos.noarch

1. Missing documentation in /etc/ovirt-engine-metrics/config.yml.example
Please add it so we have all options mentioned here consistently.

2. This option needs data validation, at the moment you can add any value ('wrong-value') and break fluentd in all your machines.

[root@10-37-138-17 ~]# grep log_level /etc/ovirt-engine-metrics/config.yml
fluentd_log_level: wrong-value

... run configure-playbook....

[root@10-37-138-17 ~]# grep -R log_level /etc/fluentd/config.d/
/etc/fluentd/config.d/05-ovirt-fluentd-system-configurations.conf:  log_level wrong-value
/etc/fluentd/config.d/10-http-input.conf:  @log_level debug

Comment 8 Shirly Radco 2017-10-16 08:18:59 UTC

(In reply to Lukas Svaty from comment #7)
> 2 issues found with this RFE that needs fixing:
> ovirt-engine-metrics-1.1.1-0.0.master.20171001113530.el7.centos.noarch
> 
> 1. Missing documentation in /etc/ovirt-engine-metrics/config.yml.example
> Please add it so we have all options mentioned here consistently.

Added a patch with README files for all metrics roles.

> 
> 2. This option needs data validation, at the moment you can add any value
> ('wrong-value') and break fluentd in all your machines.
> 
> [root@10-37-138-17 ~]# grep log_level /etc/ovirt-engine-metrics/config.yml
> fluentd_log_level: wrong-value
> 
> ... run configure-playbook....
> 
> [root@10-37-138-17 ~]# grep -R log_level /etc/fluentd/config.d/
> /etc/fluentd/config.d/05-ovirt-fluentd-system-configurations.conf: 
> log_level wrong-value
> /etc/fluentd/config.d/10-http-input.conf:  @log_level debug

We do not validate every parameter.

I don't think this is required.

This would mean the user will see that the fluentd failed to start.

According to what I see, the log message is very clear 
"unexpected error error="Unknown log level: level = wrong-value""

Comment 9 Shirly Radco 2017-10-16 08:21:26 UTC

In my opinion, We can move this back to ON_QA and add an RFE if needed to validate the values.

But please note that there are many parameters that we don't validate and can break fluentd in all your machines.

Comment 10 Lukas Svaty 2017-10-16 08:35:22 UTC

For QA part I believe verification is required if we add a way to set these parameters, otherwise, admin can just change it in a fluentd configuration file with the same effect and outcome. Don't see a reason to check some parameters (such as env_name) and not to check the others.

If you believe this is out of scope 4.2 we can add RFE as you suggested otherwise from the QA perspective I would suggest fixing this.

Comment 11 Shirly Radco 2017-10-16 09:31:54 UTC

env_name is a parameter unrelated to fluentd.
It has much more complexity since it requires to be a valid openshift namespace.

Didi, what is your take on this?

Comment 12 Yedidyah Bar David 2017-10-18 07:25:02 UTC

Re validation:

So far, we have these in config.yml.example - and please note that you can add there many other things, these are only the documented ones:

file/host names:
================
- fluentd_fluentd_host
- local_fluentd_ca_cert_path

It will be nice to verify them, but IMO bad values should be quite obvious
to understand/debug, so not that important. Can still open future RFEs if you want.

Arbitrary data:
===============
- fluentd_shared_key

We can't do any syntax validation, as it has no syntax.

Only validation we can do is to try to connect to the remote fluentd and see if we succeed. This is not trivial, but might still be useful. So can also open RFE if you want.

- ovirt_env_name

Does have syntax, already handled. Not sure we can do more than that, but if we can, might be useful, so perhaps consider opening an RFE. Example: Might be possible to somehow connect to remote fluentd/elastic and ask them what they think about the name - is it in use (so avoid collisions), is it valid (so handle cases where our validation is wrong/not-up-to-date), etc.

enum:
=====
(Not sure this is the best title, but I think you get what I mean)

- fluentd_log_level

I agree it should be in config.yml.example. Perhaps we should consider trying to create the file automatically, no idea how. If it's possible to attach some kind  of "tag" to ansible vars, we can tag the vars that should be there, and try to write code that will loop over all vars with the tag and create a file with the associated doc comment. Might be cool as a kind of ansible-hacking-experience, not that important...

I agree it makes sense to validate it, I'd open another RFE for this. If we add it to the example file with a proper comment, I'd say such an RFE can nicely wait till 4.3.

Comment 13 Yedidyah Bar David 2017-10-18 07:28:18 UTC

I'd also like to comment that IMO setting fluentd_log_level to error is an ugly hack, not a solution.

If fluentd does not behave the way we want it to, it should be fixed. If it can't currently (didn't check) throttle warnings etc. to make the log less verbose (e.g. have a parameter skip_warnings_time or whatever that makes it not repeat the same warning for the configured time duration, or whatever), then we should open a bug on it.

Comment 14 Shirly Radco 2017-10-19 11:45:00 UTC

Please open an FRE for data validation  for this variable.

Comment 15 Lukas Svaty 2017-10-19 11:54:31 UTC

Missing documentation in config.yml.example

Tested in ovirt-engine-metrics-1.1.1-0.0.master.20171017080230.el7.centos.noarch

Comment 16 Lukas Svaty 2017-10-19 11:56:50 UTC

Validation part moved here -> BZ#1504051

Comment 17 Lukas Svaty 2017-12-01 13:10:14 UTC

verified in ovirt-engine-metrics-1.1.1-0.3.beta2.20171114114644.el7ev.noarch


[root@1-1-1-1ansible]# grep -R log_level .
./roles/ovirt_fluentd/http_input/templates/http-input.conf:  @log_level debug
./roles/ovirt_fluentd/set_fluentd_system_configurations/README.md:- `fluentd_log_level:` (default: `"info"`)
./roles/ovirt_fluentd/set_fluentd_system_configurations/README.md:    configure_ovirt_machines_for_metrics.sh -e "fluentd_log_level=debug"
./roles/ovirt_fluentd/set_fluentd_system_configurations/README.md:    fluentd_log_level: debug
./roles/ovirt_fluentd/set_fluentd_system_configurations/defaults/main.yml:fluentd_log_level: info
./roles/ovirt_fluentd/set_fluentd_system_configurations/templates/ovirt-fluentd-system-configurations.conf:  log_level {{ fluentd_log_level }}

Comment 18 Sandro Bonazzola 2017-12-20 11:09:15 UTC

This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.