Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 717500

Summary:	reserved guest doesn't return after timeout
Product:	[Retired] Beaker	Reporter:	Han Pingtian <phan>
Component:	lab controller	Assignee:	Dan Callaghan <dcallagh>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	0.6	CC:	bpeck, dcallagh, mcsontos, rmancy, stl
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-07-14 02:07:17 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 1 Dan Callaghan 2011-06-29 03:33:03 UTC

It looks like there is a valid watchdog record for the system in question, with the correct kill time (which is now well past). But it was never triggered. Currently investigating why that is so.

Comment 2 Dan Callaghan 2011-06-29 04:01:32 UTC

The beaker-watchdog daemon on the lab controller in question was stuck reading from a dead HTTP connection. Apparently the system-wide default TCP timeout for established connections is 5 days(!), at least on that box, and we never set any stricter timeouts in the beaker-watchdog daemon itself. I think that is probably the real bug we should be fixing...

Comment 3 Dan Callaghan 2011-06-29 05:19:25 UTC

I was wrong, it seems we *do* set a timeout on the kobo hub transport for all the lab controller processes.

So the question is, why in this case did the timeout not kick in and prevent beaker-watchdog from getting stuck for 19 hours?

Comment 5 Dan Callaghan 2011-07-01 04:09:59 UTC

Hmm okay I thought I wrote another comment about this yesterday but perhaps I never hit save...

I think the problem is that although the Watchdog object itself has a timeout set, it creates Monitor objects which do not have the timeout set. I think it was one of those which was stuck in a read yesterday. (That explains why there was two connections open to the server, and it was the second one which was stuck.)

I think the best fix is to move the timeout setting into ProxyHelper, which is a parent class for all the objects which talk to the server. That way it will apply to Monitor as well as any other classes we have missed (or add in the future).