Bug 1493408

Summary: abort regression jobs after a period of inactivity rather than a hard timeout of 360 minutes
Product: [Community] GlusterFS Reporter: Milind Changire <mchangir>
Component: project-infrastructureAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, gluster-infra, nigelb
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-12 15:00:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Milind Changire 2017-09-20 06:50:44 UTC
Description of problem:
Sometimes regression jobs hang earlier than 360 minutes.
Sometimes the regressions just run slower.

Expected results:
It would help to abort regression jobs earlier than 360 minutes and reboot the node to make way for other jobs if there has been 15 minutes of inactivity on the STDOUT.

This will help to abort hung jobs earlier than wasting time.
For slow running regression jobs, this would help to continue and complete the regression run than aborting the job and running it again for 360 minutes.

Comment 1 Nigel Babu 2017-09-20 07:09:17 UTC
We currently use type: absolute for timeout (see http://git.gluster.org/cgit/build-jobs.git/tree/build-gluster-org/jobs/centos6-regression.yml#n60)

There's a type: no-activity which will abort after no activity for the timeout defined. If we set that to 900 (15 mins), it should potentially work.

Comment 2 Nigel Babu 2018-04-12 15:00:10 UTC
This is now fixed thanks to Amar's timeout on a per patch.