Bug 2214880

Summary: [NetworkManager-cloud-setup] update the default NM_CLOUD_SETUP_LOG level to INFO or call _LOGW|E when error happens
Product: Red Hat Enterprise Linux 9 Reporter: Frank Liang <xiliang>
Component: NetworkManagerAssignee: Thomas Haller <thaller>
Status: CLOSED ERRATA QA Contact: Matej Berezny <mberezny>
Severity: medium Docs Contact:
Priority: medium    
Version: 9.3CC: bgalvani, fpokryvk, libhe, linl, lrintel, qzhang, rkhan, sfaye, sukulkar, thaller, till, vbenes, wenliang, ymao
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: NetworkManager-1.43.10-1.el9 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:38:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Frank Liang 2023-06-14 02:35:01 UTC
Description of problem:
From the man page[1] of nm-cloud-setup, the default NM_CLOUD_SETUP_LOG level is WARN.
But _LOGW() is not called in nm-cloud-setup and a few places called _LOGE().

I am suggesting the default NM_CLOUD_SETUP_LOG level to INFO as NetworkManager or print as warn or error when error occurs.

It is not giving much information when "no provider detected".
# journalctl -u nm-cloud-setup
Jun 14 02:16:50 ip-10-22-1-89.us-west-2.compute.internal systemd[1]: Starting Automatically configure NetworkManager in cloud...
Jun 14 02:16:51 ip-10-22-1-89.us-west-2.compute.internal systemd[1]: nm-cloud-setup.service: Deactivated successfully.
Jun 14 02:16:51 ip-10-22-1-89.us-west-2.compute.internal systemd[1]: Finished Automatically configure NetworkManager in cloud.

It becomes better after changing LOG level to INFO.
Jun 14 02:31:48 ip-10-22-1-13.us-west-2.compute.internal systemd[1]: Started Automatically configure NetworkManager in cloud.
Jun 14 02:37:41 ip-10-22-1-13.us-west-2.compute.internal systemd[1]: Starting Automatically configure NetworkManager in cloud...
Jun 14 02:37:41 ip-10-22-1-13.us-west-2.compute.internal nm-cloud-setup[2527]: [3056.09025] <info>  provider GCP not detected: provider is disabled
Jun 14 02:37:41 ip-10-22-1-13.us-west-2.compute.internal nm-cloud-setup[2527]: [3056.09026] <info>  provider azure not detected: provider is disabled
Jun 14 02:37:41 ip-10-22-1-13.us-west-2.compute.internal nm-cloud-setup[2527]: [3056.09026] <info>  provider aliyun not detected: provider is disabled
Jun 14 02:37:48 ip-10-22-1-13.us-west-2.compute.internal nm-cloud-setup[2527]: [3063.09035] <info>  provider ec2 not detected: failure to get EC2 metadata: timeout expired
Jun 14 02:37:48 ip-10-22-1-13.us-west-2.compute.internal nm-cloud-setup[2527]: [3063.09035] <info>  no provider detected
Jun 14 02:37:48 ip-10-22-1-13.us-west-2.compute.internal systemd[1]: nm-cloud-setup.service: Succeeded.
Jun 14 02:37:48 ip-10-22-1-13.us-west-2.compute.internal systemd[1]: Started Automatically configure NetworkManager in cloud.

If we do not want to print much information to journal, please print the log as warn or error while handling errors.
Here is an example:
    if (error) {
        _LOGI("provider %s not detected: %s", nmcs_provider_get_name(provider), error->message);
        goto out;
    }

[1]:NM_CLOUD_SETUP_LOG: control the logging verbosity. Set it to one of TRACE, DEBUG, INFO, WARN, ERR or OFF. The program will print message on stdout and the default level is WARN.

RHEL Version:
RHEL-9.3(5.14.0-325.el9.x86_64)

How reproducible:
100%

Steps to Reproduce:
1. Check nm-cloud-setup journal output 

Actual results:
No information to indicate what is the problem when no provider detected

Expected results:
Give useful information when the application runs into exceptions.

Additional info:
- N/A

Comment 1 Thomas Haller 2023-06-14 09:42:20 UTC
nm-cloud-setup usually runs via a timer and a dispatcher script. So it runs a *lot*, and by default it must not generate too much logging.


> But _LOGW() is not called in nm-cloud-setup and a few places called _LOGE().
> 
> I am suggesting the default NM_CLOUD_SETUP_LOG level to INFO as NetworkManager or print as warn or error when error occurs.

That nm-cloud-setup currently logs nothing with level WARN, merely means that no information so far was deemed to have that priority.

It does not mean, that the default level should change. What we currently log at INFO level, seems too verbose to enable by default.


> Actual results:
> No information to indicate what is the problem when no provider detected
> 
> Expected results:
> Give useful information when the application runs into exceptions.

Maybe we could log a warning, if any providers are enabled but none detected.

That still doesn't tell you, "what" is the problem, which would require TRACE logs.

Comment 2 Frank Liang 2023-06-14 09:59:33 UTC
(In reply to Thomas Haller from comment #1)
> nm-cloud-setup usually runs via a timer and a dispatcher script. So it runs
> a *lot*, and by default it must not generate too much logging.
> 
> 
> > But _LOGW() is not called in nm-cloud-setup and a few places called _LOGE().
> > 
> > I am suggesting the default NM_CLOUD_SETUP_LOG level to INFO as NetworkManager or print as warn or error when error occurs.
> 
> That nm-cloud-setup currently logs nothing with level WARN, merely means
> that no information so far was deemed to have that priority.
> 
> It does not mean, that the default level should change. What we currently
> log at INFO level, seems too verbose to enable by default.
> 
It make sense. Thanks
> 
> > Actual results:
> > No information to indicate what is the problem when no provider detected
> > 
> > Expected results:
> > Give useful information when the application runs into exceptions.
> 
> Maybe we could log a warning, if any providers are enabled but none detected.
Agree, with the proper warn/error message, the user will realize the service has problems.
As the example in this case, we just know the nm-cloud-service started, but it does not work actually because error hit.

> 
> That still doesn't tell you, "what" is the problem, which would require
> TRACE logs.

Yes, TRACE log is helpful for further investigation.

Comment 3 Thomas Haller 2023-06-14 11:34:42 UTC
I still think, it makes sense that the default level is WARN. While we log some messages at INFO level, they seem too verbose as a default. If the default level would change to INFO, most of those messages should be downgraded to DEBUG. That is not obviously better. I am thus hesitant to change the default level from WARN to INFO.

Logging is anyway subject to constant tweaking. There are conflicting goals of providing enough information to understand what's relevant, vs. not being too verbose. So we can any the time reason whether a log line should be added/modified/dropped (or it's severity changed).

In this particular case, it would seem that logging a warning when no provider was detected would be sufficient. That would be done by 

  https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1659

Indeed, when NM enables any providers but fails to detect any, that seems a condition worthy to warn. A user probably shouldn't enable nm-cloud-setup in such an non-cloud environment or there is an actual problem. Both cases warrants a warning.


Otherwise, I think we should discuss very specific circumstances, where individual logging lines should be tweaked.

Comment 5 Thomas Haller 2023-06-29 09:01:16 UTC
I think what's done with https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1659 is sufficient.

What we log is in general not well-defined, and there are conflicting requirements.

In this case, it seems the user was surprised why the plugin was not detected, while being enabled.
The change would log a warning in that case, so I think the problem is solved.


It's in general better to look at a very specific scenario, where the logging was confusing/incomplete, and try to address that. Instead of questioning the whole logging system (that the default is WARN instead of INFO). While we can discuss to default to INFO level instead of WARN, the current default level is chosen intentionally, and it's not clearly wrong. Meaning, I think it's right and should not change.


Moving forward to ON_QA.

Comment 9 errata-xmlrpc 2023-11-07 08:38:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (NetworkManager bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6585