Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1792421

Summary: rstrnt-report-result hanging
Product: [Retired] Restraint Reporter: Martin Styk <mastyk>
Component: generalAssignee: beaker-dev-list
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: 0.1.43CC: asavkov, asosedki, azelinka, bpeck, breilly, cbeer, coli, hewang, hkario, jblazek, kzhang, liali, mkyral, pkotvan, szidek, todoleza, vkadlcik, zhguo
Target Milestone: 0.3.0Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-19 15:16:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Styk 2020-01-17 16:07:45 UTC
Description of problem:
[root@ibm-p8-kvm-11-guest-07 ~]# systemctl status restraintd 
● restraintd.service - The restraint harness.
   Loaded: loaded (/usr/lib/systemd/system/restraintd.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2020-01-17 10:40:55 EST; 24min ago
  Process: 1748 ExecStartPre=/usr/bin/check_beaker (code=exited, status=0/SUCCESS)
 Main PID: 1757 (restraintd)
   CGroup: /system.slice/restraintd.service
           ├─ 1757 /usr/bin/restraintd
           ├─ 1865 make run
           ├─ 1878 /bin/bash ./runtest.sh
           ├─11359 /usr/bin/rstrnt-report-result --rhts Test PASS /tmp/tmp.7nwdD9Q9TF 0
           ├─11420 /bin/bash -l /usr/share/restraint/plugins/task_run.d/10_bash_login /usr/share/restraint/plugins/task_run.d/15_beakerlib /usr/share/restraint/plugins/task_run.d/20_unconf...
           ├─11442 /bin/sh /usr/share/restraint/plugins/run_plugins
           ├─11463 /bin/sh ./10_localwatchdog
           └─11465 rstrnt-report-result --no-plugins /10_localwatchdog WARN 0

Jan 17 10:56:33 ibm-p8-kvm-11-guest-07.virt.pnr.lab.eng.rdu2.redhat.com restraintd[1757]: gpg: Signature made Fri 17 Jan 2020 10:51:25 AM EST using DSA key ID AA7488BA
Jan 17 10:56:33 ibm-p8-kvm-11-guest-07.virt.pnr.lab.eng.rdu2.redhat.com restraintd[1757]: gpg: Good signature from "bob-dsa-4096 <bob-dsa-4096>"
Jan 17 10:56:33 ibm-p8-kvm-11-guest-07.virt.pnr.lab.eng.rdu2.redhat.com restraintd[1757]: :: [ 10:51:26 ] :: [   PASS   ] :: Veryfing all Bob's dsa 4096 bit key signs (except detac..., got 0)
Jan 17 10:56:33 ibm-p8-kvm-11-guest-07.virt.pnr.lab.eng.rdu2.redhat.com restraintd[1757]: :: [ 10:51:26 ] :: [  BEGIN   ] :: Removing signatures :: actually running 'rm text.msg-si...-signed'
Jan 17 10:56:33 ibm-p8-kvm-11-guest-07.virt.pnr.lab.eng.rdu2.redhat.com restraintd[1757]: :: [ 10:51:26 ] :: [   PASS   ] :: Removing signatures (Expected 0, got 0)
Jan 17 10:56:33 ibm-p8-kvm-11-guest-07.virt.pnr.lab.eng.rdu2.redhat.com restraintd[1757]: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Jan 17 10:56:33 ibm-p8-kvm-11-guest-07.virt.pnr.lab.eng.rdu2.redhat.com restraintd[1757]: ::   Duration: 544s
Jan 17 10:56:33 ibm-p8-kvm-11-guest-07.virt.pnr.lab.eng.rdu2.redhat.com restraintd[1757]: ::   Assertions: 2157 good, 0 bad
Jan 17 10:56:33 ibm-p8-kvm-11-guest-07.virt.pnr.lab.eng.rdu2.redhat.com restraintd[1757]: ::   RESULT: PASS (Test)
Jan 17 10:56:33 ibm-p8-kvm-11-guest-07.virt.pnr.lab.eng.rdu2.redhat.com restraintd[1757]: ** Test PASS Score:0
Hint: Some lines were ellipsized, use -l to show in full.

[root@ibm-p8-kvm-11-guest-07 ~]# ps aux | grep rstrnt
root     11359  0.0  0.0 157760  5888 ?        Sl   10:51   0:00 /usr/bin/rstrnt-report-result --rhts Test PASS /tmp/tmp.7nwdD9Q9TF 0
root     11465  0.0  0.0 157760  5760 ?        Sl   10:56   0:00 rstrnt-report-result --no-plugins /10_localwatchdog WARN 0
root     11709  0.0  0.0 111552  3200 pts/1    S+   11:06   0:00 grep --color=auto rstrnt

[root@ibm-p8-kvm-11-guest-07 ~]# date
Fri Jan 17 11:06:53 EST 2020



Version-Release number of selected component (if applicable):


How reproducible:
Unsure

Actual results:
None. Restraint is just hanging there for better times

Expected results:
Killed/Retried/Reported

Additional info:

Comment 1 Martin Styk 2020-02-14 09:51:59 UTC
So lets elaborate this further.
The issue is still appearing in 0.1.45.

One queue is used in restraint to send all data out. Which can be dangerous. And now we are hitting it.
Let's say that Restraintd has to install a huge task. (hello /distibution/install) -> Log of logs appearing on the screen. Those will be saved in harness.log. Now, what happened.

Harness.log is stored for upload to lab controller.
SOUP will chunk it in small bits and transfer it to LC. If you will add a bad network to this it will take forever to put it into LC.
Meanwhile, you are trying to transfer harness.log to LC task will fire rstrnt-report-result. This will create a blocking operation between program and restraintd waiting for the result.
Restraintd will add this into the same queue as we have harness.log (which we are already transferring for 2 decades because of internets).

Comment 2 Martin Styk 2020-02-19 17:07:12 UTC
*** Bug 1804555 has been marked as a duplicate of this bug. ***

Comment 3 Martin Styk 2020-08-19 15:16:38 UTC
Fixed in 0.3.0. 
Which is already released in upstream.

Please read more about the change in the changelog:
https://restraint.readthedocs.io/en/latest/release-notes.html#whats-new