Description of problem: extendtesttime.sh does not work after a reboot of the test system. [root@qualcomm-amberwing-rep-14 ~]# extendtesttime.sh How many hours would you like to extend the reservation. Must be between 1 and 99 90 Extending reservation time 90 ** (rstrnt-adjust-watchdog:2364): WARNING **: 16:37:11.908: Failed to adjust watchdog, status: 4 Message: Could not connect: Connection refused It worked correctly after the system was provisioned and rebooted the first time into the newly configured system: [root@qualcomm-amberwing-rep-14 ~]# extendtesttime.sh How many hours would you like to extend the reservation. Must be between 1 and 99 50 Extending reservation time 50 I then rebooted the system without making any additional changes. On reboot, it looked like restraintd knew it was running an existing task. From the serial console: Red Hat Enterprise Linux 8.2 (Ootpa) Kernel 4.18.0-193.6.el8.aarch64 on an aarch64 qualcomm-amberwing-rep-14 login: [ 27.417551] restraintd[1873]: * Fetching recipe: http://lab-02.rhts.eng.bos.redhat.com:8000//recipes/8128032/ [ 28.364005] restraintd[1873]: * Parsing recipe [ 28.380715] restraintd[1873]: * Running recipe [ 28.381085] restraintd[1873]: ** Continuing task: 108888082 [/mnt/tests/distribution/reservesys] [ 28.408794] restraintd[1873]: ** Preparing metadata [ 28.422963] restraintd[1873]: ** Refreshing peer role hostnames: Retries 0 [ 28.577546] restraintd[1873]: ** Updating env vars [ 28.680793] restraintd[1873]: ** Running task: 108888082 [/distribution/reservesys] How reproducible: The problem is 100% reproducible with qualcomm-amberwing-rep-14.khw4.lab.eng.bos.redhat.com and provisioning any recent RHEL-8.3 nightly build for aarch64. I have not tested if this problem also occurs with other RHEL releases or other architectures. I know I have been able to do this in the past. So this is a regression. Steps to Reproduce: 1. Reserve and provision qualcomm-amberwing-rep-14.khw4.lab.eng.bos.redhat.com with a RHEL-8.3 nightly build 2. login and see that extendtesttime.sh works 3. rhts-reboot 4. login and see that extendtesttime.sh fails with "connection refused" error https://beaker.engineering.redhat.com/recipes/8128032#task108888082
This issue is caused by the removal of the libssh library in restraint 0.2.0. /usr/bin/extendtesttime.sh is created by /distribution/reservesys, which runs once, on first boot. When the script is created, RSTRNT_RECIPE_URL is hard coded. This variable contains hostname and port for restraintd, and it's used by rstrnt-adjust-watchdog, export RSTRNT_RECIPE_URL=http://localhost:39673/recipes/42 In 0.2.0, the port for restraintd is dynamic. A new port is randomly chosen for each run. So after the reboot, the value in RSTRNT_RECIPE_URL is no longer valid, because restraintd is not listening on that port. We can solve this by making restraintd port persist after reboot. Meanwhile, as a workaround, current restraintd port can be checked in journal, journalctl -u restraintd Apr 14 05:29:24 hostname restraintd[4242]: Listening on http://localhost:34507 Apr 14 05:29:24 hostname restraintd[4242]: * Fetching recipe: http://labcontroller:8000//recipes/30363/ Then rstrnt-adjust-watchdog can be used passing RECIPE_URL variable with correct port and recipe, like, RECIPE_URL=http://localhost:34507/recipes/30363 rstrnt-adjust-watchdog 50h
Daniel: Perhaps you'd like to take advantage of rstrnt-adjust-watchdog -c --pid <pid of restraint service> 50h Carol
(In reply to Daniel Rodríguez from comment #1) > This issue is caused by the removal of the libssh library in restraint 0.2.0. > > /usr/bin/extendtesttime.sh is created by /distribution/reservesys, which > runs once, on first boot. > > When the script is created, RSTRNT_RECIPE_URL is hard coded. This variable > contains hostname and port for restraintd, and it's used by > rstrnt-adjust-watchdog, > > export RSTRNT_RECIPE_URL=http://localhost:39673/recipes/42 > > In 0.2.0, the port for restraintd is dynamic. A new port is randomly chosen > for each run. So after the reboot, the value in RSTRNT_RECIPE_URL is no > longer valid, because restraintd is not listening on that port. > > We can solve this by making restraintd port persist after reboot. Meanwhile, > as a workaround, current restraintd port can be checked in journal, > > journalctl -u restraintd > > Apr 14 05:29:24 hostname restraintd[4242]: Listening on > http://localhost:34507 > Apr 14 05:29:24 hostname restraintd[4242]: * Fetching recipe: > http://labcontroller:8000//recipes/30363/ > > Then rstrnt-adjust-watchdog can be used passing RECIPE_URL variable with > correct port and recipe, like, > > RECIPE_URL=http://localhost:34507/recipes/30363 rstrnt-adjust-watchdog 50h Thank you for the quick response with an effective work-around. -Lenny.
(In reply to Carol Bouchard from comment #2) > Daniel: > > Perhaps you'd like to take advantage of > > rstrnt-adjust-watchdog -c --pid <pid of restraint service> 50h > > Carol rstrnt-adjust-watchdog from restraint-0.2.0-1.el8bkr.aarch64 does not have the -c nor --pid options. Is this something available in a newer version? -Lenny.
Lenny: My comment on arguments for rstrnt-adjust-watchdog pertains to newer code not yet released. These are arguments which are useful when working outside a job. More details coming for next release. Carol
Another workaround is to use the bkr tool from an external system, e.g., your laptop. I was able to add 48 hours to this /distribution/reservesys task: https://beaker.engineering.redhat.com/recipes/8137732#task108995117 By running from my laptop: $ bkr watchdog-extend --by=$((60 * 60 * 48)) T:108995117
This issue is fixed in https://github.com/beaker-project/restraint/pull/31 In 0.2.1, the port for the restraintd running as a service in the system will be static by default, set to 8081. Therefore, variables in extendtesttime.sh will be valid after reboot.