Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1779033

Summary: job running very slow and "socket i/o timed out"
Product: [Retired] Beaker Reporter: LiLiang <liali>
Component: lab controllerAssignee: beaker-dev-list
Status: CLOSED DUPLICATE QA Contact: tools-bugs <tools-bugs>
Severity: urgent Docs Contact:
Priority: medium    
Version: 26CC: bpeck, cbouchar, jiabwang, kzhang, mastyk, tklohna, xiawu
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-12-05 08:12:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description LiLiang 2019-12-03 05:54:57 UTC
Description of problem:

I am using pek2 beaker.

1. I saw more "Socket I/O timed out" issues in my beaker jobs recently. This has a big impact on our testing.
   https://beaker.engineering.redhat.com/recipes/7638839#task103030698
   https://beaker.engineering.redhat.com/recipes/7635660#task102992060
   https://beaker.engineering.redhat.com/recipes/7643243#task103078536,task103078537

2. The job running is slower then before. A task can finish in 1h before, but it need 2h now.
   https://beaker.engineering.redhat.com/recipes/7638839#task103030694
   https://beaker.engineering.redhat.com/recipes/7643243#task103078537

   For compare, this is a old job which is vefy fast:
   https://beaker.engineering.redhat.com/recipes/7177550#task96995914

Version-Release number of selected component (if applicable):


How reproducible:
"Socket I/O timed out" sometimes.
"job running is slower then before" everytime.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Zhang Kexin 2019-12-05 07:52:27 UTC
The slowness issue is severely affecting us, many of our jobs are failing because of timeout. Please help as soon as possible. Thanks.

Comment 3 Martin Styk 2019-12-05 08:12:37 UTC

*** This bug has been marked as a duplicate of bug 1749316 ***

Comment 4 Martin Styk 2019-12-05 08:15:13 UTC
Let's make this clear.
Providing a patch for socket I/O in restraint will not help you.
Execution of tests will be longer anyway and with possible timeouts.

The proper solution for this problem is not raising this here. But instead of that creating a ticket to PnT to migrate PEK2 Lab controller to a different location.
With more reliable host + network (traffic).

Comment 5 LiLiang 2019-12-06 01:21:14 UTC
Hai jiab,

Please see comment #4 , thanks.

Comment 6 LiLiang 2019-12-10 11:09:20 UTC
Hi Martin,

Below is my investigation:

A old task finished in 26 minutes:
  https://beaker.engineering.redhat.com/recipes/7434252#task100349631

  In this task's log(http://beaker-archive.host.prod.eng.bos.redhat.com/beaker-logs/2019/10/38251/3825148/7434252/100349631/taskout.log),
  you can see,
  rlPhase "check if 'ip link show' work for bonding and slave" ended at "16:11:33",
  rlPhase "bond mode=0 test shutdown/no shutdown bond0" started at "16:11:37",
  the interval between them is 4s.


A new task finished in 90 minutes:
  https://beaker.engineering.redhat.com/recipes/7675477#task103435088

  In this task's log(http://lab-02.rhts.eng.pek2.redhat.com/beaker/logs/tasks/103435+/103435088/taskout.log),
  you can see,
  rlPhase "check if 'ip link show' work for bonding and slave" ended at "23:21:11",
  rlPhase "bond mode=0 test shutdown/no shutdown bond0" started at "23:21:56",
  the interval between them is 45s.

So i guess the increasing of interval between rlPhases is the root reason which cause job execute slower.
To calculate, if i have 30 rlPhases in my task, the total interval between rlPhases will be 30*40=1200s.

Would you give some suggestion about how to resolve this issue? 
Do you know whether this is a lay-controller issue or beaker issue?


Regards,
Liang.

Comment 7 Martin Styk 2019-12-10 11:47:17 UTC
I'll take a look.
rlPhase is part of beakerlib. I have to take look into implementation and how exactly it is doing things.

Comment 8 Red Hat Bugzilla 2023-09-14 05:47:56 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days