Bug 688339

Summary: Cannot schedule job on machine, cannot release machine. Infinite loop.
Product: [Retired] Beaker Reporter: Michael Boisvert <mboisver>
Component: schedulerAssignee: Bill Peck <bpeck>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 0.7CC: bpeck, dcallagh, mcsontos, rmancy, stl
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-16 21:18:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Michael Boisvert 2011-03-16 20:18:47 UTC
Description of problem:

When scheduling a job on a machine I am the current user for, the job stays stuck in "queued." This is a known problem apparently. However, when I try to return the machine when the job is still queued, it fails telling me that the job that I scheduled is stopping me from returning the machine.

Thanks.

Comment 1 Bill Peck 2011-03-16 20:26:27 UTC
Could you provide more details please?

Job Id and System would be a good start.

Comment 2 Michael Boisvert 2011-03-16 20:36:42 UTC
Failed to return frankenstein-02.wlan.rhts.eng.bos.redhat.com: u'System has active recipe 129312'

I deleted the job and created a new one.

Comment 3 Bill Peck 2011-03-16 20:50:59 UTC
(In reply to comment #2)
> Failed to return frankenstein-02.wlan.rhts.eng.bos.redhat.com: u'System has
> active recipe 129312'
> 
> I deleted the job and created a new one.

Did you cancel recipe 129312?

Here is what I think happened:

You had a recipe running 129312 on that system already (thats why you were the current user).

You queued another recipe which also wanted that system, but it was already in use, so it stayed queued.

You tried to return the system and it rightfully told you that you had an active recipe.

Comment 4 Michael Boisvert 2011-03-16 21:01:56 UTC
I wanted to create a job on Frankenstein 2. Being new to beaker, I did not know I was not supposed to take ownership of the system before creating the job.

When I created the job, it's status remained "queued."

I should have been able to remove my ownership of the machine so that the job starts, instead of cancelling the job then giving the machine back. Right?

Comment 5 Bill Peck 2011-03-16 21:18:27 UTC
We support two ways of taking a system.

- Manual, where you use the take link.
- Automated, where you schedule your request.

Both use system.user to know if the system is currently in use or not,  You wouldn't want an automated job to take your system when you are using it manually.


You are right, you should have been able to return the system if it wasn't running a job.  I've never seen what you describe happen, doesn't mean it can't.  But since you have returned it and cancelled the job its hard to figure out now what happened.  

And according to the history log it looks like everything worked as expected:

mboisver Scheduler 2011-03-16 20:53:21 	User 	Returned mboisver 	
mboisver Scheduler 2011-03-16 19:54:43 	Distro 	Provision RHEL6.1-20110315.0
mboisver Scheduler 2011-03-16 19:51:02 	User 	Reserved mboisver
mboisver Scheduler 2011-03-16 19:47:39 	User 	Returned mboisver 	
mboisver Scheduler 2011-03-16 19:46:51 	Distro 	Provision RHEL6.1-20110315.0
mboisver Scheduler 2011-03-16 19:46:38 	User 	Reserved mboisver
mboisver WEBUI 	2011-03-16 19:46:13 	User 	Returned 	mboisver 	
mboisver WEBUI 	2011-03-16 18:11:47 	Power 	on 		Success
mboisver WEBUI 	2011-03-16 18:00:01 	User 	Reserved 	mboisver 

The WEBUI entries are when you manually used the system, and the Scheduler entries are the automated ones.