Bug 1188144

Summary:	[RFE] - Operational errors that arise during engine-backup should prompt alerts to check is they are backup related.
Product:	[Retired] oVirt	Reporter:	Yaniv Lavi <ylavi>
Component:	ovirt-engine-installer	Assignee:	Doron Fediuck <dfediuck>
Status:	CLOSED WONTFIX	QA Contact:	Pavel Stehlik <pstehlik>
Severity:	high	Docs Contact:
Priority:	high
Version:	3.5	CC:	bugs, ecohen, gklein, iheim, lsurette, rbalakri, yeylon, ylavi
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	3.6.0
Hardware:	x86_64
OS:	Linux
Whiteboard:	integration
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-03-24 19:42:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1188143
Bug Blocks:	1188119

Description Yaniv Lavi 2015-02-02 07:30:17 UTC

Description of problem:
This is because backup can clog up network and use a lot of I\O when these resources are short and we want user to be aware this may be because of backup. 

Errors will mention repeating issues may be caused by this, but it could also be unrelated.

Comment 1 Doron Fediuck 2015-03-22 11:54:28 UTC

(In reply to Yaniv Dary from comment #0)
> Description of problem:
> This is because backup can clog up network and use a lot of I\O when these
> resources are short and we want user to be aware this may be because of
> backup. 
> 
> Errors will mention repeating issues may be caused by this, but it could
> also be unrelated.

I suggest we review this carefully.
I do not see how a simple SQL sequence can know something about the
peripherals. It is the responsibility of a monitoring system to handle
these issues as it has all the information.

Comment 2 Yaniv Lavi 2015-03-24 10:00:25 UTC

(In reply to Doron Fediuck from comment #1)
> (In reply to Yaniv Dary from comment #0)
> > Description of problem:
> > This is because backup can clog up network and use a lot of I\O when these
> > resources are short and we want user to be aware this may be because of
> > backup. 
> > 
> > Errors will mention repeating issues may be caused by this, but it could
> > also be unrelated.
> 
> I suggest we review this carefully.
> I do not see how a simple SQL sequence can know something about the
> peripherals. It is the responsibility of a monitoring system to handle
> these issues as it has all the information.

What monitoring software? 3rd party will probably would not be aware of virt related affects on network\IO hogging.
We should have events on backup start\completion\failure to know that backup is running.

Comment 3 Doron Fediuck 2015-03-24 12:51:16 UTC

(In reply to Yaniv Dary from comment #2)
> (In reply to Doron Fediuck from comment #1)
> > (In reply to Yaniv Dary from comment #0)
> > > Description of problem:
> > > This is because backup can clog up network and use a lot of I\O when these
> > > resources are short and we want user to be aware this may be because of
> > > backup. 
> > > 
> > > Errors will mention repeating issues may be caused by this, but it could
> > > also be unrelated.
> > 
> > I suggest we review this carefully.
> > I do not see how a simple SQL sequence can know something about the
> > peripherals. It is the responsibility of a monitoring system to handle
> > these issues as it has all the information.
> 
> What monitoring software? 3rd party will probably would not be aware of virt
> related affects on network\IO hogging.
> We should have events on backup start\completion\failure to know that backup
> is running.

We will have start\end events, but you can not deduce that any error which occurs during the backup is related to the backup. For example, if a host fails during backup or CPU spikes it has nothing to do with the backup. 
Bottom line, other than start/stop events (with status and stack trace the backup process gets) we cannot guarantee anything else since we do not have the knowledge.
I suggest to close this RFE and focus on Bug 1188143.

Comment 4 Yaniv Lavi 2015-03-24 19:42:55 UTC

(In reply to Doron Fediuck from comment #3)
> (In reply to Yaniv Dary from comment #2)
> > (In reply to Doron Fediuck from comment #1)
> > > (In reply to Yaniv Dary from comment #0)
> > > > Description of problem:
> > > > This is because backup can clog up network and use a lot of I\O when these
> > > > resources are short and we want user to be aware this may be because of
> > > > backup. 
> > > > 
> > > > Errors will mention repeating issues may be caused by this, but it could
> > > > also be unrelated.
> > > 
> > > I suggest we review this carefully.
> > > I do not see how a simple SQL sequence can know something about the
> > > peripherals. It is the responsibility of a monitoring system to handle
> > > these issues as it has all the information.
> > 
> > What monitoring software? 3rd party will probably would not be aware of virt
> > related affects on network\IO hogging.
> > We should have events on backup start\completion\failure to know that backup
> > is running.
> 
> We will have start\end events, but you can not deduce that any error which
> occurs during the backup is related to the backup. For example, if a host
> fails during backup or CPU spikes it has nothing to do with the backup. 
> Bottom line, other than start/stop events (with status and stack trace the
> backup process gets) we cannot guarantee anything else since we do not have
> the knowledge.
> I suggest to close this RFE and focus on Bug 1188143.

Since we will not run the tool automatically, I will close this one.
I hope that if some try to automate backup, he will not start getting issues without knowing the cause.