Bug 717500
| Summary: | reserved guest doesn't return after timeout | ||
|---|---|---|---|
| Product: | [Retired] Beaker | Reporter: | Han Pingtian <phan> |
| Component: | lab controller | Assignee: | Dan Callaghan <dcallagh> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 0.6 | CC: | bpeck, dcallagh, mcsontos, rmancy, stl |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-07-14 02:07:17 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 1
Dan Callaghan
2011-06-29 03:33:03 UTC
The beaker-watchdog daemon on the lab controller in question was stuck reading from a dead HTTP connection. Apparently the system-wide default TCP timeout for established connections is 5 days(!), at least on that box, and we never set any stricter timeouts in the beaker-watchdog daemon itself. I think that is probably the real bug we should be fixing... I was wrong, it seems we *do* set a timeout on the kobo hub transport for all the lab controller processes. So the question is, why in this case did the timeout not kick in and prevent beaker-watchdog from getting stuck for 19 hours? Hmm okay I thought I wrote another comment about this yesterday but perhaps I never hit save... I think the problem is that although the Watchdog object itself has a timeout set, it creates Monitor objects which do not have the timeout set. I think it was one of those which was stuck in a read yesterday. (That explains why there was two connections open to the server, and it was the second one which was stuck.) I think the best fix is to move the timeout setting into ProxyHelper, which is a parent class for all the objects which talk to the server. That way it will apply to Monitor as well as any other classes we have missed (or add in the future). |