1188144 – [RFE] - Operational errors that arise during engine-backup should prompt alerts to check is they are backup related.

Bug 1188144 - [RFE] - Operational errors that arise during engine-backup should prompt alerts to check is they are backup related.

Summary: [RFE] - Operational errors that arise during engine-backup should prompt aler...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	oVirt
Classification:	Retired
Component:	ovirt-engine-installer
Sub Component:
Version:	3.5
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.6.0
Assignee:	Doron Fediuck
QA Contact:	Pavel Stehlik
Docs Contact:
URL:
Whiteboard:	integration
Depends On:	1188143
Blocks:	1188119
TreeView+	depends on / blocked

Reported:	2015-02-02 07:30 UTC by Yaniv Lavi
Modified:	2015-03-24 19:42 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-03-24 19:42:55 UTC
oVirt Team:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Yaniv Lavi 2015-02-02 07:30:17 UTC

Description of problem:
This is because backup can clog up network and use a lot of I\O when these resources are short and we want user to be aware this may be because of backup. 

Errors will mention repeating issues may be caused by this, but it could also be unrelated.

Comment 1 Doron Fediuck 2015-03-22 11:54:28 UTC

(In reply to Yaniv Dary from comment #0)
> Description of problem:
> This is because backup can clog up network and use a lot of I\O when these
> resources are short and we want user to be aware this may be because of
> backup. 
> 
> Errors will mention repeating issues may be caused by this, but it could
> also be unrelated.

I suggest we review this carefully.
I do not see how a simple SQL sequence can know something about the
peripherals. It is the responsibility of a monitoring system to handle
these issues as it has all the information.

Comment 2 Yaniv Lavi 2015-03-24 10:00:25 UTC

(In reply to Doron Fediuck from comment #1)
> (In reply to Yaniv Dary from comment #0)
> > Description of problem:
> > This is because backup can clog up network and use a lot of I\O when these
> > resources are short and we want user to be aware this may be because of
> > backup. 
> > 
> > Errors will mention repeating issues may be caused by this, but it could
> > also be unrelated.
> 
> I suggest we review this carefully.
> I do not see how a simple SQL sequence can know something about the
> peripherals. It is the responsibility of a monitoring system to handle
> these issues as it has all the information.

What monitoring software? 3rd party will probably would not be aware of virt related affects on network\IO hogging.
We should have events on backup start\completion\failure to know that backup is running.

Comment 3 Doron Fediuck 2015-03-24 12:51:16 UTC

(In reply to Yaniv Dary from comment #2)
> (In reply to Doron Fediuck from comment #1)
> > (In reply to Yaniv Dary from comment #0)
> > > Description of problem:
> > > This is because backup can clog up network and use a lot of I\O when these
> > > resources are short and we want user to be aware this may be because of
> > > backup. 
> > > 
> > > Errors will mention repeating issues may be caused by this, but it could
> > > also be unrelated.
> > 
> > I suggest we review this carefully.
> > I do not see how a simple SQL sequence can know something about the
> > peripherals. It is the responsibility of a monitoring system to handle
> > these issues as it has all the information.
> 
> What monitoring software? 3rd party will probably would not be aware of virt
> related affects on network\IO hogging.
> We should have events on backup start\completion\failure to know that backup
> is running.

We will have start\end events, but you can not deduce that any error which occurs during the backup is related to the backup. For example, if a host fails during backup or CPU spikes it has nothing to do with the backup. 
Bottom line, other than start/stop events (with status and stack trace the backup process gets) we cannot guarantee anything else since we do not have the knowledge.
I suggest to close this RFE and focus on Bug 1188143.

Comment 4 Yaniv Lavi 2015-03-24 19:42:55 UTC

(In reply to Doron Fediuck from comment #3)
> (In reply to Yaniv Dary from comment #2)
> > (In reply to Doron Fediuck from comment #1)
> > > (In reply to Yaniv Dary from comment #0)
> > > > Description of problem:
> > > > This is because backup can clog up network and use a lot of I\O when these
> > > > resources are short and we want user to be aware this may be because of
> > > > backup. 
> > > > 
> > > > Errors will mention repeating issues may be caused by this, but it could
> > > > also be unrelated.
> > > 
> > > I suggest we review this carefully.
> > > I do not see how a simple SQL sequence can know something about the
> > > peripherals. It is the responsibility of a monitoring system to handle
> > > these issues as it has all the information.
> > 
> > What monitoring software? 3rd party will probably would not be aware of virt
> > related affects on network\IO hogging.
> > We should have events on backup start\completion\failure to know that backup
> > is running.
> 
> We will have start\end events, but you can not deduce that any error which
> occurs during the backup is related to the backup. For example, if a host
> fails during backup or CPU spikes it has nothing to do with the backup. 
> Bottom line, other than start/stop events (with status and stack trace the
> backup process gets) we cannot guarantee anything else since we do not have
> the knowledge.
> I suggest to close this RFE and focus on Bug 1188143.

Since we will not run the tool automatically, I will close this one.
I hope that if some try to automate backup, he will not start getting issues without knowing the cause.

Note You need to log in before you can comment on or make changes to this bug.