Bug 1823545 - extendtesttime.sh does not work after another reboot of test system
Summary: extendtesttime.sh does not work after another reboot of test system
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Restraint
Classification: Retired
Component: general
Version: 0.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: 0.2.1
Assignee: Daniel Rodríguez
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-13 21:35 UTC by Lenny Szubowicz
Modified: 2020-05-20 07:20 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-20 07:20:47 UTC
Embargoed:


Attachments (Terms of Use)

Description Lenny Szubowicz 2020-04-13 21:35:25 UTC
Description of problem:

extendtesttime.sh does not work after a reboot of the test system.

[root@qualcomm-amberwing-rep-14 ~]# extendtesttime.sh
How many hours would you like to extend the reservation.
             Must be between 1 and 99                   
90
Extending reservation time 90

** (rstrnt-adjust-watchdog:2364): WARNING **: 16:37:11.908: Failed to adjust watchdog, status: 4 Message: Could not connect: Connection refused


It worked correctly after the system was provisioned and rebooted the first
time into the newly configured system:

[root@qualcomm-amberwing-rep-14 ~]# extendtesttime.sh
How many hours would you like to extend the reservation.
             Must be between 1 and 99                   
50
Extending reservation time 50

I then rebooted the system without making any additional changes.
On reboot, it looked like restraintd knew it was running an existing task.
From the serial console:

Red Hat Enterprise Linux 8.2 (Ootpa)
Kernel 4.18.0-193.6.el8.aarch64 on an aarch64

qualcomm-amberwing-rep-14 login: [   27.417551] restraintd[1873]: * Fetching recipe: http://lab-02.rhts.eng.bos.redhat.com:8000//recipes/8128032/
[   28.364005] restraintd[1873]: * Parsing recipe
[   28.380715] restraintd[1873]: * Running recipe
[   28.381085] restraintd[1873]: ** Continuing task: 108888082 [/mnt/tests/distribution/reservesys]
[   28.408794] restraintd[1873]: ** Preparing metadata
[   28.422963] restraintd[1873]: ** Refreshing peer role hostnames: Retries 0
[   28.577546] restraintd[1873]: ** Updating env vars
[   28.680793] restraintd[1873]: ** Running task: 108888082 [/distribution/reservesys]



How reproducible:

The problem is 100% reproducible with  qualcomm-amberwing-rep-14.khw4.lab.eng.bos.redhat.com and provisioning any recent RHEL-8.3 nightly build for aarch64.

I have not tested if this problem also occurs with other RHEL releases or other architectures. I know I have been able to do this in the past. So this is a regression.

Steps to Reproduce:
1. Reserve and provision qualcomm-amberwing-rep-14.khw4.lab.eng.bos.redhat.com
   with a RHEL-8.3 nightly build
2. login and see that extendtesttime.sh works
3. rhts-reboot
4. login and see that extendtesttime.sh fails with "connection refused" error



https://beaker.engineering.redhat.com/recipes/8128032#task108888082

Comment 1 Daniel Rodríguez 2020-04-14 11:20:54 UTC
This issue is caused by the removal of the libssh library in restraint 0.2.0.

/usr/bin/extendtesttime.sh is created by /distribution/reservesys, which runs once, on first boot.

When the script is created, RSTRNT_RECIPE_URL is hard coded. This variable contains hostname and port for restraintd, and it's used by rstrnt-adjust-watchdog,

 export RSTRNT_RECIPE_URL=http://localhost:39673/recipes/42

In 0.2.0, the port for restraintd is dynamic. A new port is randomly chosen for each run. So after the reboot, the value in RSTRNT_RECIPE_URL is no longer valid, because restraintd is not listening on that port.

We can solve this by making restraintd port persist after reboot. Meanwhile, as a workaround, current restraintd port can be checked in journal,

 journalctl -u restraintd

 Apr 14 05:29:24 hostname restraintd[4242]: Listening on http://localhost:34507
 Apr 14 05:29:24 hostname restraintd[4242]: * Fetching recipe: http://labcontroller:8000//recipes/30363/

Then rstrnt-adjust-watchdog can be used passing RECIPE_URL variable with correct port and recipe, like,

 RECIPE_URL=http://localhost:34507/recipes/30363 rstrnt-adjust-watchdog 50h

Comment 2 Carol Bouchard 2020-04-14 11:36:49 UTC
Daniel:

Perhaps you'd like to take advantage of

rstrnt-adjust-watchdog -c --pid <pid of restraint service> 50h

Carol

Comment 3 Lenny Szubowicz 2020-04-14 19:41:05 UTC

(In reply to Daniel Rodríguez from comment #1)
> This issue is caused by the removal of the libssh library in restraint 0.2.0.
> 
> /usr/bin/extendtesttime.sh is created by /distribution/reservesys, which
> runs once, on first boot.
> 
> When the script is created, RSTRNT_RECIPE_URL is hard coded. This variable
> contains hostname and port for restraintd, and it's used by
> rstrnt-adjust-watchdog,
> 
>  export RSTRNT_RECIPE_URL=http://localhost:39673/recipes/42
> 
> In 0.2.0, the port for restraintd is dynamic. A new port is randomly chosen
> for each run. So after the reboot, the value in RSTRNT_RECIPE_URL is no
> longer valid, because restraintd is not listening on that port.
> 
> We can solve this by making restraintd port persist after reboot. Meanwhile,
> as a workaround, current restraintd port can be checked in journal,
> 
>  journalctl -u restraintd
> 
>  Apr 14 05:29:24 hostname restraintd[4242]: Listening on
> http://localhost:34507
>  Apr 14 05:29:24 hostname restraintd[4242]: * Fetching recipe:
> http://labcontroller:8000//recipes/30363/
> 
> Then rstrnt-adjust-watchdog can be used passing RECIPE_URL variable with
> correct port and recipe, like,
> 
>  RECIPE_URL=http://localhost:34507/recipes/30363 rstrnt-adjust-watchdog 50h

Thank you for the quick response with an effective work-around.

                       -Lenny.

Comment 4 Lenny Szubowicz 2020-04-14 19:56:40 UTC
(In reply to Carol Bouchard from comment #2)
> Daniel:
> 
> Perhaps you'd like to take advantage of
> 
> rstrnt-adjust-watchdog -c --pid <pid of restraint service> 50h
> 
> Carol

 rstrnt-adjust-watchdog from restraint-0.2.0-1.el8bkr.aarch64 does not have the -c nor --pid options.

Is this something available in a newer version?

                       -Lenny.

Comment 5 Carol Bouchard 2020-04-15 11:00:01 UTC
Lenny:

My comment on arguments for rstrnt-adjust-watchdog pertains to newer code not yet released.  These are arguments which are useful when working outside a job.  More details coming for next release.

Carol

Comment 6 Jeff Bastian 2020-04-15 20:36:48 UTC
Another workaround is to use the bkr tool from an external system, e.g., your laptop.

I was able to add 48 hours to this /distribution/reservesys task:
  https://beaker.engineering.redhat.com/recipes/8137732#task108995117

By running from my laptop:
  $ bkr watchdog-extend --by=$((60 * 60 * 48)) T:108995117

Comment 7 Daniel Rodríguez 2020-04-28 10:00:02 UTC
This issue is fixed in https://github.com/beaker-project/restraint/pull/31

In 0.2.1, the port for the restraintd running as a service in the system will be static by default, set to 8081. Therefore, variables in extendtesttime.sh will be valid after reboot.


Note You need to log in before you can comment on or make changes to this bug.