Bug 1779708

Summary:	no clear indication of profile activation error
Product:	Red Hat Enterprise Linux 8	Reporter:	Jaroslav Škarvada <jskarvad>
Component:	tuned	Assignee:	Jaroslav Škarvada <jskarvad>
Status:	CLOSED WONTFIX	QA Contact:	qe-baseos-daemons
Severity:	medium	Docs Contact:
Priority:	medium
Version:	8.2	CC:	aaron_wilk, atheurer, jeder, jmencak, jskarvad, krister, lcapitulino, lmanasko, nilal, pezhang, riehecky, tcerna
Target Milestone:	rc	Keywords:	Triaged
Target Release:	8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1385838	Environment:
Last Closed:	2021-06-04 07:29:48 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1385838
Bug Blocks:	1400961, 1472751, 1485946, 1932086

Comment 2 Jaroslav Škarvada 2021-02-15 10:51:25 UTC

In the current code, there is limited support for this feature. The current D-Bus API supports 'profile_changed' signal with the signature (profile_name:string, result:boolean, error:string). The 'profile_name' is the name of the profile loaded, this supports multiple profiles, in such case the profile names are separated by spaces. The 'result' is either 'True' or 'False' meaning whether the profile/s was/were successfully loaded. The 'error' is the string describing the error. It seems 'tuned_adm' catches this signal and outputs the 'error' on the stdout (it should be changed to stderr). At the moment it catches 'InvalidProfileException' and I suspect that there are places in the plugins from which this exception cannot be thrown (this should be checked and probably fixed, but this is not important for this RFE).

At the moment the 'InvalidProfileException' is thrown by the two internal functions which can be (and actually are) used in the profile definition, there are 'assertion' and 'assertion_non_equal' functions and there are used in 'realtime*' and 'cpu-partitioning' profiles.

I propose the following:
- Store the error status in the daemon (it's currently thrown away after the D-Bus signal is sent).
- Extend the D-Bus API and add method which will report the error status back anytime it's called (currently the status is received only once through the signal). Probably the least intrusive way is to add 'active_profile2' D-Bus method which will work the same way as the 'active_profile' method, i.e. return the profile/s name/s and in addition also return the errors.
- Switch 'tuned-adm' tool to use the 'active_profile2' D-Bus method and return the error information with the 'tuned-adm active'. This could either show the status and few lines of the error (there could be some additional argument, e.g. '--all', to display all errors or we could add some new option e.g. 'profile_status').
- Fix 'tuned-adm' to report the error on 'stderr' not on 'stdout'.
- Extend the 'log.error()' function which is used in the plugins implementation to store the error string and if there is some error throw the 'InvalidProfileException' after the profile is processed.

The questions are:
- Should we extend the 'log.error()' which is used in TuneD plugins or propose new call, e.g. 'log.fail()'? 
I think the current logging API supports more than enough methods ('log.error()', 'log.info()', 'log.warn()', 'log.debug()'), so personally I wouldn't add new log method. Let's say the error is always fatal and if there are some places in the plugins we do not want to cause switch to the 'failing state', change them to the 'log.warn()'.
- Should we a) fail early when the first 'log.error()' is called or b) process the whole profile log all errors and fail afterwards?
I would go with the b) approach, it's more backward compatible and it could be useful to load the profile even if some errors occurred.

Comment 3 Nitesh Narayan Lal 2021-02-15 15:13:23 UTC

(In reply to Jaroslav Škarvada from comment #2)
> In the current code, there is limited support for this feature. The current

<snip>

> 
> I propose the following:
> - Store the error status in the daemon (it's currently thrown away after the
> D-Bus signal is sent).
> - Extend the D-Bus API and add method which will report the error status
> back anytime it's called (currently the status is received only once through
> the signal). Probably the least intrusive way is to add 'active_profile2'
> D-Bus method which will work the same way as the 'active_profile' method,
> i.e. return the profile/s name/s and in addition also return the errors.

If I am not mistaken here then we will have something like 'tuned-adm
active-debug/status' in parallel to 'tuned-adm active'?

If so, then IMHO it may cause some confusion as to which one a user should
run, yes we will have --help to read about it but someone who was already
using tuned-adm active may not care to explore this new option.

Considering my understanding of your suggestion is correct? How difficult
will it be to extend the existing tuned-adm active to report the errors
with output such as "Current active profile (with errors): xyz"?


> - Switch 'tuned-adm' tool to use the 'active_profile2' D-Bus method and
> return the error information with the 'tuned-adm active'. This could either
> show the status and few lines of the error (there could be some additional
> argument, e.g. '--all', to display all errors or we could add some new
> option e.g. 'profile_status').

I agree with having the 'profile_status' option to report the errors explicitly
as long as we have some sort of indication in the tuned-adm active that the
profile is applied with 'x' number of errors.

> - Fix 'tuned-adm' to report the error on 'stderr' not on 'stdout'.
> - Extend the 'log.error()' function which is used in the plugins
> implementation to store the error string and if there is some error throw
> the 'InvalidProfileException' after the profile is processed.
> 
> The questions are:
> - Should we extend the 'log.error()' which is used in TuneD plugins or
> propose new call, e.g. 'log.fail()'? 
> I think the current logging API supports more than enough methods
> ('log.error()', 'log.info()', 'log.warn()', 'log.debug()'), so personally I
> wouldn't add new log method. Let's say the error is always fatal and if
> there are some places in the plugins we do not want to cause switch to the
> 'failing state', change them to the 'log.warn()'.

Totally agree, instead of adding new options we should leverage the existing one
that is log.error().
For the places where the error is not fatal, we should replace log.error() with
log.warn() instead.

> - Should we a) fail early when the first 'log.error()' is called or b)
> process the whole profile log all errors and fail afterwards?
> I would go with the b) approach, it's more backward compatible and it could
> be useful to load the profile even if some errors occurred.

Although, I am not strictly against this but for example in a Real-time environment
we receive "-ENOSPC" from the kernel then we should definitely terminate and in no
way give an impression that the profile is correctly applied.

For situations where we would want to continue applying the profile even when
we are getting errors from the plugins we should not have log.error() in the first
place, we should replace those occurrences with log.warn().

Having said that, I agree that modifying these occurrences of non-fatal log.error()
with log.warn() may be difficult because there are several profiles.

Thanks

Comment 4 Jaroslav Škarvada 2021-02-15 20:47:39 UTC

(In reply to Nitesh Narayan Lal from comment #3)
> (In reply to Jaroslav Škarvada from comment #2)
> > In the current code, there is limited support for this feature. The current
> 
> <snip>
> 
> > 
> > I propose the following:
> > - Store the error status in the daemon (it's currently thrown away after the
> > D-Bus signal is sent).
> > - Extend the D-Bus API and add method which will report the error status
> > back anytime it's called (currently the status is received only once through
> > the signal). Probably the least intrusive way is to add 'active_profile2'
> > D-Bus method which will work the same way as the 'active_profile' method,
> > i.e. return the profile/s name/s and in addition also return the errors.
> 
> If I am not mistaken here then we will have something like 'tuned-adm
> active-debug/status' in parallel to 'tuned-adm active'?
>
> If so, then IMHO it may cause some confusion as to which one a user should
> run, yes we will have --help to read about it but someone who was already
> using tuned-adm active may not care to explore this new option.
> 
> Considering my understanding of your suggestion is correct? How difficult
> will it be to extend the existing tuned-adm active to report the errors
> with output such as "Current active profile (with errors): xyz"?
>
No regarding the 'tuned-adm' tool I just thought about 'tuned-adm active' to show something
like:

$ tuned-adm active
Current active profile: balanced
Error(s) when applying profile:
ERROR_LINE1
ERROR_LINE2
ERROR_LINE3
....

To show all error lines:
$ tuned-adm active --all

I didn't think about the warn/debug, just about the errors.
 
> 
> > - Switch 'tuned-adm' tool to use the 'active_profile2' D-Bus method and
> > return the error information with the 'tuned-adm active'. This could either
> > show the status and few lines of the error (there could be some additional
> > argument, e.g. '--all', to display all errors or we could add some new
> > option e.g. 'profile_status').
> 
> I agree with having the 'profile_status' option to report the errors
> explicitly
> as long as we have some sort of indication in the tuned-adm active that the
> profile is applied with 'x' number of errors.
> 
> > - Fix 'tuned-adm' to report the error on 'stderr' not on 'stdout'.
> > - Extend the 'log.error()' function which is used in the plugins
> > implementation to store the error string and if there is some error throw
> > the 'InvalidProfileException' after the profile is processed.
> > 
> > The questions are:
> > - Should we extend the 'log.error()' which is used in TuneD plugins or
> > propose new call, e.g. 'log.fail()'? 
> > I think the current logging API supports more than enough methods
> > ('log.error()', 'log.info()', 'log.warn()', 'log.debug()'), so personally I
> > wouldn't add new log method. Let's say the error is always fatal and if
> > there are some places in the plugins we do not want to cause switch to the
> > 'failing state', change them to the 'log.warn()'.
> 
> Totally agree, instead of adding new options we should leverage the existing
> one
> that is log.error().
> For the places where the error is not fatal, we should replace log.error()
> with
> log.warn() instead.
> 
> > - Should we a) fail early when the first 'log.error()' is called or b)
> > process the whole profile log all errors and fail afterwards?
> > I would go with the b) approach, it's more backward compatible and it could
> > be useful to load the profile even if some errors occurred.
> 
> Although, I am not strictly against this but for example in a Real-time
> environment
> we receive "-ENOSPC" from the kernel then we should definitely terminate and
> in no
> way give an impression that the profile is correctly applied.
> 
> For situations where we would want to continue applying the profile even when
> we are getting errors from the plugins we should not have log.error() in the
> first
> place, we should replace those occurrences with log.warn().
> 
> Having said that, I agree that modifying these occurrences of non-fatal
> log.error()
> with log.warn() may be difficult because there are several profiles.
> 
If you want to terminate profile loading you can still throw exception 'InvalidProfileException' or we could add some wrapper for it. In the most cases (taking into account the current profiles) the errors are not fatal. E.g. the plugin is providing FEATURE1, FEATURE2, FEATURE3. When e.g. the FEATURE2 fails, the rest of features can be still usable. In such case we are using 'error' - some feature failed, we are sure there will be troubles. If all features are applied, but e.g. FEATURE2 was applied with some troubles, we use 'warn' (e.g. your bootloader doesn't seem to support multiple initrds and the initrd will be probably ignored) - there can be or needn't be troubles, we cannot judge. There are places in the code where the error/warn aren't used consistently, but we are trying to fix this.

Comment 5 Jaroslav Škarvada 2021-02-15 20:54:19 UTC

It seems OCP doesn't use D-Bus. TuneD uses D-Bus for CLI/GUI/frontend <-> daemon communication. It has fallback mode to support basic functionality without D-Bus. In this degraded mode just the configuration files are used (to e.g. change the profile). More complex fallback communication method hasn't exist yet. If this functionality from the D-Bus API has to be used without the D-Bus, we will have to implement some fallback communication method (e.g. by using sockets).

Comment 6 Nitesh Narayan Lal 2021-02-15 21:21:48 UTC

(In reply to Jaroslav Škarvada from comment #4)
> (In reply to Nitesh Narayan Lal from comment #3)
> > (In reply to Jaroslav Škarvada from comment #2)

<snip>

> >
> No regarding the 'tuned-adm' tool I just thought about 'tuned-adm active' to
> show something
> like:
> 
> $ tuned-adm active
> Current active profile: balanced
> Error(s) when applying profile:
> ERROR_LINE1
> ERROR_LINE2
> ERROR_LINE3
> ....
> 
> To show all error lines:
> $ tuned-adm active --all
> 
> I didn't think about the warn/debug, just about the errors.
>  

Ok, that makes sense to me.

> > 
> > > - Switch 'tuned-adm' tool to use the 'active_profile2' D-Bus method and

<snip>

> > 
> > Having said that, I agree that modifying these occurrences of non-fatal
> > log.error()
> > with log.warn() may be difficult because there are several profiles.
> > 
> If you want to terminate profile loading you can still throw exception
> 'InvalidProfileException' or we could add some wrapper for it. 

That's true, this need to be applied around the plugin itself?
So that if the plugin fails an exception is thrown.

> In the most
> cases (taking into account the current profiles) the errors are not fatal.
> E.g. the plugin is providing FEATURE1, FEATURE2, FEATURE3. When e.g. the
> FEATURE2 fails, the rest of features can be still usable. In such case we
> are using 'error' - some feature failed, we are sure there will be troubles.
> If all features are applied, but e.g. FEATURE2 was applied with some
> troubles, we use 'warn' (e.g. your bootloader doesn't seem to support
> multiple initrds and the initrd will be probably ignored) - there can be or
> needn't be troubles, we cannot judge. There are places in the code where the
> error/warn aren't used consistently, but we are trying to fix this.

I agree where the errors are not-fatal we should be fine in continuing the
profile application.

But atleast for the RT profiles, there are certain errors that can be
considered fatal and in those cases, it is better to rollback the changes and
notify the user to fix those issues and re-try.

I think once we fix all these places where log.error() should not be used
it should be ok to make a decision to rollback and terminate based on it,
am I wrong?

Comment 9 RHEL Program Management 2021-06-04 07:29:48 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.