Bug 1475795 - [RFE] Parameterize fluentd Verbosity logs Level
[RFE] Parameterize fluentd Verbosity logs Level
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine-metrics
Classification: oVirt
Component: RFEs (Show other bugs)
1.0.5
Unspecified Unspecified
medium Severity medium
: ovirt-4.2.0
: 1.0.7.1
Assigned To: Shirly Radco
Lukas Svaty
: FutureFeature
Depends On:
Blocks: oVirt-Metrics-and-Logs 1475135
  Show dependency treegraph
 
Reported: 2017-07-27 07:40 EDT by Shirly Radco
Modified: 2017-12-20 06:09 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-20 06:09:15 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Metrics
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.2?
lsvaty: testing_plan_complete+
rule-engine: planning_ack?
sradco: devel_ack+
lsvaty: testing_ack+


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 81551 master MERGED fluentd: add role to set system configurations 2017-09-28 16:08 EDT

  None (edit)
Description Shirly Radco 2017-07-27 07:40:36 EDT
Description of problem:
The default log level is 'info', and Fluentd outputs 'info', 'warn', 'error' and 'fatal' logs by default.
We want to decrease verbosity level to log only 'error' messages.
We will make this value as a parameter that can be updated for debugging proposes.
Comment 1 Yaniv Lavi 2017-07-31 04:54:00 EDT
Please check with the viaq team why this doesn't come out of the box with the log level you are suggesting.
Comment 2 Yaniv Lavi 2017-07-31 04:54:39 EDT
Also, add the use case to why we need to change this.
Comment 3 Shirly Radco 2017-07-31 05:38:31 EDT
Description of problem:
Currently when the client fluentd fails to connect to the remote fluentd(mux) it starts logging warning messages every retry.

Suggested solution is to decrease verbosity level to log only 'error' messages.
Comment 4 Yaniv Kaul 2017-09-05 04:32:08 EDT
Shirly - status? devel-ack?
Comment 5 Lukas Svaty 2017-09-25 06:21:52 EDT
Don't think this is an RFE. Please consider removing FutureFeature keyword.
Comment 6 Shirly Radco 2017-09-28 16:13:18 EDT
I left the default log level as info and added the log_level as an ansible parameter.

Now user can set it in the config.yml as needed.
Comment 7 Lukas Svaty 2017-10-04 12:12:40 EDT
2 issues found with this RFE that needs fixing: ovirt-engine-metrics-1.1.1-0.0.master.20171001113530.el7.centos.noarch

1. Missing documentation in /etc/ovirt-engine-metrics/config.yml.example
Please add it so we have all options mentioned here consistently.

2. This option needs data validation, at the moment you can add any value ('wrong-value') and break fluentd in all your machines.

[root@10-37-138-17 ~]# grep log_level /etc/ovirt-engine-metrics/config.yml
fluentd_log_level: wrong-value

... run configure-playbook....

[root@10-37-138-17 ~]# grep -R log_level /etc/fluentd/config.d/
/etc/fluentd/config.d/05-ovirt-fluentd-system-configurations.conf:  log_level wrong-value
/etc/fluentd/config.d/10-http-input.conf:  @log_level debug
Comment 8 Shirly Radco 2017-10-16 04:18:59 EDT
(In reply to Lukas Svaty from comment #7)
> 2 issues found with this RFE that needs fixing:
> ovirt-engine-metrics-1.1.1-0.0.master.20171001113530.el7.centos.noarch
> 
> 1. Missing documentation in /etc/ovirt-engine-metrics/config.yml.example
> Please add it so we have all options mentioned here consistently.

Added a patch with README files for all metrics roles.

> 
> 2. This option needs data validation, at the moment you can add any value
> ('wrong-value') and break fluentd in all your machines.
> 
> [root@10-37-138-17 ~]# grep log_level /etc/ovirt-engine-metrics/config.yml
> fluentd_log_level: wrong-value
> 
> ... run configure-playbook....
> 
> [root@10-37-138-17 ~]# grep -R log_level /etc/fluentd/config.d/
> /etc/fluentd/config.d/05-ovirt-fluentd-system-configurations.conf: 
> log_level wrong-value
> /etc/fluentd/config.d/10-http-input.conf:  @log_level debug

We do not validate every parameter.

I don't think this is required.

This would mean the user will see that the fluentd failed to start.

According to what I see, the log message is very clear 
"unexpected error error="Unknown log level: level = wrong-value""
Comment 9 Shirly Radco 2017-10-16 04:21:26 EDT
In my opinion, We can move this back to ON_QA and add an RFE if needed to validate the values.

But please note that there are many parameters that we don't validate and can break fluentd in all your machines.
Comment 10 Lukas Svaty 2017-10-16 04:35:22 EDT
For QA part I believe verification is required if we add a way to set these parameters, otherwise, admin can just change it in a fluentd configuration file with the same effect and outcome. Don't see a reason to check some parameters (such as env_name) and not to check the others.

If you believe this is out of scope 4.2 we can add RFE as you suggested otherwise from the QA perspective I would suggest fixing this.
Comment 11 Shirly Radco 2017-10-16 05:31:54 EDT
env_name is a parameter unrelated to fluentd.
It has much more complexity since it requires to be a valid openshift namespace.

Didi, what is your take on this?
Comment 12 Yedidyah Bar David 2017-10-18 03:25:02 EDT
Re validation:

So far, we have these in config.yml.example - and please note that you can add there many other things, these are only the documented ones:

file/host names:
================
- fluentd_fluentd_host
- local_fluentd_ca_cert_path

It will be nice to verify them, but IMO bad values should be quite obvious
to understand/debug, so not that important. Can still open future RFEs if you want.

Arbitrary data:
===============
- fluentd_shared_key

We can't do any syntax validation, as it has no syntax.

Only validation we can do is to try to connect to the remote fluentd and see if we succeed. This is not trivial, but might still be useful. So can also open RFE if you want.

- ovirt_env_name

Does have syntax, already handled. Not sure we can do more than that, but if we can, might be useful, so perhaps consider opening an RFE. Example: Might be possible to somehow connect to remote fluentd/elastic and ask them what they think about the name - is it in use (so avoid collisions), is it valid (so handle cases where our validation is wrong/not-up-to-date), etc.

enum:
=====
(Not sure this is the best title, but I think you get what I mean)

- fluentd_log_level

I agree it should be in config.yml.example. Perhaps we should consider trying to create the file automatically, no idea how. If it's possible to attach some kind  of "tag" to ansible vars, we can tag the vars that should be there, and try to write code that will loop over all vars with the tag and create a file with the associated doc comment. Might be cool as a kind of ansible-hacking-experience, not that important...

I agree it makes sense to validate it, I'd open another RFE for this. If we add it to the example file with a proper comment, I'd say such an RFE can nicely wait till 4.3.
Comment 13 Yedidyah Bar David 2017-10-18 03:28:18 EDT
I'd also like to comment that IMO setting fluentd_log_level to error is an ugly hack, not a solution.

If fluentd does not behave the way we want it to, it should be fixed. If it can't currently (didn't check) throttle warnings etc. to make the log less verbose (e.g. have a parameter skip_warnings_time or whatever that makes it not repeat the same warning for the configured time duration, or whatever), then we should open a bug on it.
Comment 14 Shirly Radco 2017-10-19 07:45:00 EDT
Please open an FRE for data validation  for this variable.
Comment 15 Lukas Svaty 2017-10-19 07:54:31 EDT
Missing documentation in config.yml.example

Tested in ovirt-engine-metrics-1.1.1-0.0.master.20171017080230.el7.centos.noarch
Comment 16 Lukas Svaty 2017-10-19 07:56:50 EDT
Validation part moved here -> BZ#1504051
Comment 17 Lukas Svaty 2017-12-01 08:10:14 EST
verified in ovirt-engine-metrics-1.1.1-0.3.beta2.20171114114644.el7ev.noarch


[root@1-1-1-1ansible]# grep -R log_level .
./roles/ovirt_fluentd/http_input/templates/http-input.conf:  @log_level debug
./roles/ovirt_fluentd/set_fluentd_system_configurations/README.md:- `fluentd_log_level:` (default: `"info"`)
./roles/ovirt_fluentd/set_fluentd_system_configurations/README.md:    configure_ovirt_machines_for_metrics.sh -e "fluentd_log_level=debug"
./roles/ovirt_fluentd/set_fluentd_system_configurations/README.md:    fluentd_log_level: debug
./roles/ovirt_fluentd/set_fluentd_system_configurations/defaults/main.yml:fluentd_log_level: info
./roles/ovirt_fluentd/set_fluentd_system_configurations/templates/ovirt-fluentd-system-configurations.conf:  log_level {{ fluentd_log_level }}
Comment 18 Sandro Bonazzola 2017-12-20 06:09:15 EST
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.