Bug 1576029

Summary: [RFE] Limit Started/Finished notify messages for engine-backup --scope=all
Product: [oVirt] ovirt-engine Reporter: Shon Stephens <sstephen>
Component: Backup-Restore.EngineAssignee: Yedidyah Bar David <didi>
Status: CLOSED DEFERRED QA Contact: Lucie Leistnerova <lleistne>
Severity: medium Docs Contact:
Priority: low    
Version: futureCC: bugs, lleistne, mperina, sstephen
Target Milestone: ---Keywords: FutureFeature, Reopened
Target Release: ---Flags: pm-rhel: ovirt-4.4?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-20 11:17:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shon Stephens 2018-05-08 15:31:40 UTC
Proprosed title of this Feature Request: Reduce notify_engine() messages for --scope=all

Currently when --scope=all is set the engine-backup script will generate a separate call to do_notify() for files, db, dwdb, reports db. The end result is up to 8 separate notifications for a single configured backup. One each for "Backup Starting" and "Backup Finishing" if all stages are successful.

Some users find this behavior unexpected and do not want the notifications for every stage. A small change is suggested to allow --scope=all to only send a single "Backup Starting" and "Backup Finishing" notification for all stages, but also allow "Backup Failed" messages for any failed stages.

I suggest a small change to engine-backup.sh

--- engine-backup.sh	2018-03-27 17:39:43.065669775 +0000
+++ new-engine-backup.sh	2018-03-27 18:19:30.874669775 +0000
@@ -387,6 +387,7 @@
 MODE=
 DEFAULT_SCOPE=all
 SCOPE=
+SCOPE_ALL=
 SCOPE_FILES=
 SCOPE_ENGINE_DB=
 SCOPE_DWH_DB=
@@ -508,6 +509,7 @@
 			SCOPE_ENGINE_DB=1
 			SCOPE_DWH_DB=1
 			SCOPE_REPORTS_DB=1
+			SCOPE_ALL=1
 			;;
 		files)
 			SCOPE_FILES=1
@@ -1881,10 +1883,11 @@
 	}
 
 	output "Notifying engine"
-	[ -n "${SCOPE_FILES}" ] && do_notify 'files'
-	[ -n "${SCOPE_ENGINE_DB}" ] && do_notify 'db'
-	[ -n "${SCOPE_DWH_DB}" ] && do_notify 'dwhdb'
-	[ -n "${SCOPE_REPORTS_DB}" ] && do_notify 'reportsdb'
+        [ -n "${SCOPE_ALL}" ] && do_notify 'all'
+	[ -n "${SCOPE_FILES}" ] && [ -z "${SCOPE_ALL}" ] && do_notify 'files'
+	[ -n "${SCOPE_ENGINE_DB}" ] && [ -z "${SCOPE_ALL}" ] && do_notify 'db'
+	[ -n "${SCOPE_DWH_DB}" ] && [ -z "${SCOPE_ALL}" ] && do_notify 'dwhdb'
+	[ -n "${SCOPE_REPORTS_DB}" ] && [ -z "${SCOPE_ALL}" ] && do_notify 'reportsdb'
 
 	unset -f do_notify
 }

Comment 1 Yedidyah Bar David 2018-05-09 06:50:29 UTC
I agree this makes sense, but:

1. If you look at the log and see 'all', it's not clear what was actually included.

2. We have in the engine some logic that parses that table, check [1]. We'll need to adapt it as well.

So it might make sense to keep the existing code in engine-backup, and just make the UI nicer/more compact - e.g. if scope included 'db' and 'files', show just a single line. Actually I thought we'll do something similar also originally, that's why the log content is so terse - it was meant for machine parsing, not directly showing to humans. See also bug 1365808.

BTW, Reports is not included since 4.0 (also see bug 1351622), so you can get max 6 lines.

[1] backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/EngineBackupAwarenessManager.java

Comment 2 Shon Stephens 2018-05-15 14:09:52 UTC
I thought I'd clarify that these aren't being seen in the logs by customers. While the engine_notify() is "logging" the message, if Email Event Notifier is configured, then 6 emails show up. Clearly that's more visible than logs that aren't intended for direct parsing.

-Shon

Comment 3 Yedidyah Bar David 2018-05-16 07:13:19 UTC
Thanks for the clarification. Suppose that we ignore code complexity and similar issues, what would the ideal solution be, for you?

Perhaps something like this:

1. There should be means for the engine to know which notifications belong to a specific run of 'engine-backup'. This can be done by making relevant tables in the engine database include a column like 'backup_id' or 'backup_run_id', which will include a randomly-generated unique id created by engine-backup per run.

2. Notifications from engine-backup to the engine, that are currently forwarded mostly as-is to the user, should be optional.

3. The engine should monitor runs of engine-backup. If it sees that a run started and finished, after up to some configurable amount of time - say 'max_expected_backup_time', it should send to the user a single notification, including what was backed up.

4. A failure, even partial, should have a different format, to allow easy filtering on the user side. In the current implementation, a partial failure is very unlikely, though, and the reason for the different messages was so that the engine can know the scope of backup.

5. A successful backup that did not include at least both 'files' and 'db', should have a different format as well.

6. Only a successful backup that backed up both 'files' and 'db' should send a "normal" "backup finished successfully" message.

7. If backup did not finish after 'max_expected_backup_time', the engine should notify the user as well. It should continue monitoring this run and continue notifying the user periodically, until either the run eventually finishes, or we give up - after a different certain amount of time, or the user manually told the engine to give up monitoring this run, or until a next run started (with the same scope). Each of these cases should send a different notification, again to allow easy filtering.

Comment 4 Yaniv Lavi 2018-06-12 11:25:15 UTC
Please reopen if you can provide the needed info.
For now closing.

Comment 5 Shon Stephens 2018-07-20 15:14:28 UTC
Dear Yaniv,
     I hope I am reopening this correctly.

While I agree with the concept you've sketched out for monitoring and notifying of backups, it's not clear to me how that would change SNMP event notifications of backup events. 

I think that feature-wise, these are good ideas, but might be outside the scope of what would meet the clients needs. Specifically just not to receive notifications for every stage of the backup when "all" is specified as the type.
These notifications are being sent by ovirt-engine-notifier based on these SNMP event types

ENGINE_BACKUP_COMPLETED	9025	Engine backup completed successfully.
ENGINE_BACKUP_FAILED	9026	Engine backup failed.
ENGINE_BACKUP_STARTED	9024	Engine backup started.

So if the engine-backup is using "--scope=all" it sends SNMP notifications (log messages?? not clear to me the chain of messaging) but these get picked up by Event Notifier and send email. My simple change modified the bash to not send event notifier for every stage if --scope=all

I realize I explained this before, but wanted to re-empasize that I think its a simple fix. However, you have more experience and knowledge here and I'd like to defer to your wisdom, with the general guidance that we just want to suppress multiple messaging for --scope=all backups

Thanks,
Shon

Comment 6 Yedidyah Bar David 2018-07-22 11:27:39 UTC
(In reply to Shon Stephens from comment #5)
> Dear Yaniv,
>      I hope I am reopening this correctly.

(Moving to NEW for now)

> 
> While I agree with the concept you've sketched out

You refer to comment 3? It was me, not Yaniv.

> for monitoring and
> notifying of backups, it's not clear to me how that would change SNMP event
> notifications of backup events. 

I think it would do what you asked for, but perhaps I didn't fully understand. 

> 
> I think that feature-wise, these are good ideas, but might be outside the
> scope of what would meet the clients needs. Specifically just not to receive
> notifications for every stage of the backup when "all" is specified as the
> type.

Understood.

> These notifications are being sent by ovirt-engine-notifier based on these
> SNMP event types
> 
> ENGINE_BACKUP_COMPLETED	9025	Engine backup completed successfully.
> ENGINE_BACKUP_FAILED	9026	Engine backup failed.
> ENGINE_BACKUP_STARTED	9024	Engine backup started.
> 
> So if the engine-backup is using "--scope=all" it sends SNMP notifications
> (log messages?? not clear to me the chain of messaging) but these get picked
> up by Event Notifier and send email. My simple change modified the bash to
> not send event notifier for every stage if --scope=all
> 
> I realize I explained this before, but wanted to re-empasize that I think
> its a simple fix.

Ok, and I explained in comment 1 why we can't use it as-is.

> However, you have more experience and knowledge here and
> I'd like to defer to your wisdom, with the general guidance that we just
> want to suppress multiple messaging for --scope=all backups

I do not think engine-backup should be changed much. I sketched in comment 3 how the notifications from engine-backup to the engine can be filtered out/summarized/etc before being sent further on.

If you do not want to do this in the engine, or think it will require too much work, you can do this filtering elsewhere. A simple example: You can send notifications from the engine only via email, and have your mail server run some script for you that analyzes the input and sends snmp traps as needed.

In any case, engine-backup should IMO be simple, provide whatever relevant information that's needed, and not be too smart about how this information is used. Other components that use this information can be smart as needed, and if they need more information from engine-backup in order to be smart, we can add that as needed.

Comment 7 Shon Stephens 2018-07-23 13:49:04 UTC
(In reply to Yedidyah Bar David from comment #6)
> (In reply to Shon Stephens from comment #5)
> > Dear Yaniv,
> >      I hope I am reopening this correctly.
> 
> (Moving to NEW for now)
> 
> > 
> > While I agree with the concept you've sketched out
> 
> You refer to comment 3? It was me, not Yaniv.

My apologies Yedidyah, I realized that after posting that Yaniv has just closed and commented.

> 
> > for monitoring and
> > notifying of backups, it's not clear to me how that would change SNMP event
> > notifications of backup events. 
> 
> I think it would do what you asked for, but perhaps I didn't fully
> understand. 
> 
> > 
> > I think that feature-wise, these are good ideas, but might be outside the
> > scope of what would meet the clients needs. Specifically just not to receive
> > notifications for every stage of the backup when "all" is specified as the
> > type.
> 
> Understood.
> 
> > These notifications are being sent by ovirt-engine-notifier based on these
> > SNMP event types
> > 
> > ENGINE_BACKUP_COMPLETED	9025	Engine backup completed successfully.
> > ENGINE_BACKUP_FAILED	9026	Engine backup failed.
> > ENGINE_BACKUP_STARTED	9024	Engine backup started.
> > 
> > So if the engine-backup is using "--scope=all" it sends SNMP notifications
> > (log messages?? not clear to me the chain of messaging) but these get picked
> > up by Event Notifier and send email. My simple change modified the bash to
> > not send event notifier for every stage if --scope=all
> > 
> > I realize I explained this before, but wanted to re-empasize that I think
> > its a simple fix.
> 
> Ok, and I explained in comment 1 why we can't use it as-is.

I  agree that my "fix" is overly simplistic. I'm not sure the engine-backup code can be kept completely as is though because that code is what is generating the messages to notify-engine at each stage via do_notify(). 

> 
> > However, you have more experience and knowledge here and
> > I'd like to defer to your wisdom, with the general guidance that we just
> > want to suppress multiple messaging for --scope=all backups
> 
> I do not think engine-backup should be changed much. I sketched in comment 3
> how the notifications from engine-backup to the engine can be filtered
> out/summarized/etc before being sent further on.
> 
> If you do not want to do this in the engine, or think it will require too
> much work, you can do this filtering elsewhere. A simple example: You can
> send notifications from the engine only via email, and have your mail server
> run some script for you that analyzes the input and sends snmp traps as
> needed.
> 
> In any case, engine-backup should IMO be simple, provide whatever relevant
> information that's needed, and not be too smart about how this information
> is used. Other components that use this information can be smart as needed,
> and if they need more information from engine-backup in order to be smart,
> we can add that as needed.

In brief, I think as long as the engine itself knows what to do with, and how to correlate the messages it gets from engine-backup via do_notify() and we can provide an enhancement in the UI via Event Notifier(that's what makes sense to me atm) to more precisely filter those, then that's that win.

Thanks Yedidyah
Yours,
Shon

Comment 8 Yedidyah Bar David 2018-07-24 06:10:34 UTC
(In reply to Shon Stephens from comment #7)
> (In reply to Yedidyah Bar David from comment #6)
> I  agree that my "fix" is overly simplistic. I'm not sure the engine-backup
> code can be kept completely as is though because that code is what is
> generating the messages to notify-engine at each stage via do_notify(). 

I am not saying engine-backup must not be changed, but that it should remain simple.

engine-backup's "notification" only notifies the _engine_, not anything external. The engine currently passes that on, and I suggest (in comment 3) to change that to be smarter.

> In brief, I think as long as the engine itself knows what to do with, and
> how to correlate the messages it gets from engine-backup via do_notify() and
> we can provide an enhancement in the UI via Event Notifier(that's what makes
> sense to me atm) to more precisely filter those, then that's that win.

Thanks for the summary.

So, if my plan in comment 3 is accepted, we should have two patches, more-or-less:

1. engine-backup - should be patched to add such a backup_id
2. engine - should implement most of the logic detailed there

(1.) will likely be done by me, (2.) I guess by infra team, not sure.

Yaniv - please pick one:

1. Ack above
2. Ack something simpler, although not sure what exactly is considerably simpler and still fulfills what we wanted to achieve in the "Backup Awareness" project
3. Nack
4. Something else? Not sure what...

Comment 9 Yaniv Lavi 2018-08-01 08:55:55 UTC
Why are you not using filtering for the notifications/SNMP?
Please review:
https://www.ovirt.org/documentation/admin-guide/chap-Event_Notifications/


I think that address you ask.

Comment 10 Shon Stephens 2018-08-01 13:21:41 UTC
(In reply to Yaniv Lavi from comment #9)
> Why are you not using filtering for the notifications/SNMP?

From what I can tell, the FILTER functionality is not up to the specific task. Namely preventing individual (db, dwdb, files) Start/Stop emails being sent when only "--scope=all" is specified. Six emails for a what the customer perceives as one job.

AFAICT from the examples provided, FILTER would only allow me to exclude all the events of a specific type, not allow me to be specific about excluding start/stop messages from a job with --scope=all when it's running each section. Instead it would suppress all the messages, like so

FILTER="exclude:ENGINE_BACKUP_* include:*(snmp:) ${FILTER}"
or
FILTER="exclude:ENGINE_BACKUP_COMPLETED include:*(snmp:) ${FILTER}"
FILTER="exclude:ENGINE_BACKUP_STARTED include:*(snmp:) ${FILTER}"
FILTER="exclude:ENGINE_BACKUP_FAILED include:*(snmp:) ${FILTER}"


> Please review:
> https://www.ovirt.org/documentation/admin-guide/chap-Event_Notifications/
> 
> 
> I think that address you ask.

I've reviewed the documentation, but the examples that I can find don't indicate any further advanced filtering available that would prevent --scope=all from sending 6 emails but still allow the proper notifications (1 started, 1 completed, and any failures that occur) to still happen.

If there is a more advanced way to configure the filter that I'm not understanding, please provide an example.

Yours,
Shon

Comment 11 Yaniv Lavi 2018-08-01 13:31:48 UTC
Martin, can you help with this?

Comment 12 Sandro Bonazzola 2019-11-20 11:17:03 UTC
Missed 4.4 feature freeze, low priority. Closing deferred due to capacity. We can consider re-opening when we'll have capacity for this.