Bug 1493408

Summary:	abort regression jobs after a period of inactivity rather than a hard timeout of 360 minutes
Product:	[Community] GlusterFS	Reporter:	Milind Changire <mchangir>
Component:	project-infrastructure	Assignee:	bugs <bugs>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	bugs, gluster-infra, nigelb
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-04-12 15:00:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Milind Changire 2017-09-20 06:50:44 UTC

Description of problem:
Sometimes regression jobs hang earlier than 360 minutes.
Sometimes the regressions just run slower.

Expected results:
It would help to abort regression jobs earlier than 360 minutes and reboot the node to make way for other jobs if there has been 15 minutes of inactivity on the STDOUT.

This will help to abort hung jobs earlier than wasting time.
For slow running regression jobs, this would help to continue and complete the regression run than aborting the job and running it again for 360 minutes.

Comment 1 Nigel Babu 2017-09-20 07:09:17 UTC

We currently use type: absolute for timeout (see http://git.gluster.org/cgit/build-jobs.git/tree/build-gluster-org/jobs/centos6-regression.yml#n60)

There's a type: no-activity which will abort after no activity for the timeout defined. If we set that to 900 (15 mins), it should potentially work.

Comment 2 Nigel Babu 2018-04-12 15:00:10 UTC

This is now fixed thanks to Amar's timeout on a per patch.