Bug 1779033
| Summary: | job running very slow and "socket i/o timed out" | ||
|---|---|---|---|
| Product: | [Retired] Beaker | Reporter: | LiLiang <liali> |
| Component: | lab controller | Assignee: | beaker-dev-list |
| Status: | CLOSED DUPLICATE | QA Contact: | tools-bugs <tools-bugs> |
| Severity: | urgent | Docs Contact: | |
| Priority: | medium | ||
| Version: | 26 | CC: | bpeck, cbouchar, jiabwang, kzhang, mastyk, tklohna, xiawu |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-12-05 08:12:37 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
LiLiang
2019-12-03 05:54:57 UTC
The slowness issue is severely affecting us, many of our jobs are failing because of timeout. Please help as soon as possible. Thanks. *** This bug has been marked as a duplicate of bug 1749316 *** Let's make this clear. Providing a patch for socket I/O in restraint will not help you. Execution of tests will be longer anyway and with possible timeouts. The proper solution for this problem is not raising this here. But instead of that creating a ticket to PnT to migrate PEK2 Lab controller to a different location. With more reliable host + network (traffic). Hai jiab, Please see comment #4 , thanks. Hi Martin, Below is my investigation: A old task finished in 26 minutes: https://beaker.engineering.redhat.com/recipes/7434252#task100349631 In this task's log(http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/10/38251/3825148/7434252/100349631/taskout.log), you can see, rlPhase "check if 'ip link show' work for bonding and slave" ended at "16:11:33", rlPhase "bond mode=0 test shutdown/no shutdown bond0" started at "16:11:37", the interval between them is 4s. A new task finished in 90 minutes: https://beaker.engineering.redhat.com/recipes/7675477#task103435088 In this task's log(http://lab-02.rhts.eng.pek2.redhat.com/beaker/logs/tasks/103435+/103435088/taskout.log), you can see, rlPhase "check if 'ip link show' work for bonding and slave" ended at "23:21:11", rlPhase "bond mode=0 test shutdown/no shutdown bond0" started at "23:21:56", the interval between them is 45s. So i guess the increasing of interval between rlPhases is the root reason which cause job execute slower. To calculate, if i have 30 rlPhases in my task, the total interval between rlPhases will be 30*40=1200s. Would you give some suggestion about how to resolve this issue? Do you know whether this is a lay-controller issue or beaker issue? Regards, Liang. I'll take a look. rlPhase is part of beakerlib. I have to take look into implementation and how exactly it is doing things. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |