Description of problem: If /distribution/virt/install fails for some reason, guest system installation waits for 24hours and then fails. Could it be possible to make guest systems or whole job fail immediately? It causes problem, when waiting for job complete by some other automation. Version-Release number of selected component (if applicable): How reproducible: sometimes, (if virt install fails) Steps to Reproduce: 1. 2. 3. Actual results: Time Remaining -1 day, Expected results: Aborted immediately, if possible. Additional info:
Instead of implementing AI solution to kill only the right jobs (as that would be sometimes wrong), this should be made part of task's logic. Adding following line to task should do the job: rhts-abort -t RECIPESET Anyway, I am not sure it is the right thing. What about multihost jobs, e.g. virtual-machine migration,...?
Hello Marian, is possible some magic like this: We have test: 0. beaker machine reservation 1. my first test/task 2. my second test/task - if my first test fails then beaker automagicaly close the whole task and reserve and install another machine and try 'my first test again' then 'my second test' - is here some way how to do it^ ?
Bug 618960: rhts-abort -t recipeset not working
Re: Comment 2: No. However, I would like to add some job-control... See Bug 619018
*** Bug 772907 has been marked as a duplicate of this bug. ***
One possible solution would be to add a parameter to virt-(install|start) tasks and on failure this would abort recipe set if (not) set. Gurhan, does this make sense?
From the duplicate bug: > That's a feature not a bug. I don't agree. Having to wait another day for a system is a big bug IMHO. Bumping priority/severity because this issue impacts testing on ia64 where we don't have that much systems.
(In reply to comment #6) > One possible solution would be to add a parameter to virt-(install|start) tasks > and on failure this would abort recipe set if (not) set. Gurhan, does this make > sense? It kinda makes but it's a workaround. Also I don't know how to do this, you did say in previous comments that "rhts-abort -t RECIPESET" didn't work? Just to make sure I understand this correctly, what you are asking here to have virt/install,start programs to abort the recipeset if it fails right? BTW, that still won't work properly if there are multiple guests in the recipe sets, it could be that one guest doesn't install/start but others still might. I don't know, if there is absolutely no other alternatives to do this, I can put a workaround in, but won't be the best solution. Beaker does understand what recipe is machine, what recipe is guest, right? is it possible to make it smart enough to finish the job if the host/dom0 is at 100% and the all guests are 0% . Note that, even in this solution, you'll have to make sure that ALL guests are at 0%.
(In reply to comment #8) > Beaker does understand what recipe is machine, what recipe is guest, right? is > it possible to make it smart enough to finish the job if the host/dom0 is at > 100% and the all guests are 0% . Note that, even in this solution, you'll have > to make sure that ALL guests are at 0%. This will likely not be true. In the job I saw the issue the host was at 100%, one of the guests was at 94% and the other at 0%. Despite the fact that host FAILED the 94% guest managed to complete successfully after a while.
(In reply to comment #9) > (In reply to comment #8) > > > Beaker does understand what recipe is machine, what recipe is guest, right? is > > it possible to make it smart enough to finish the job if the host/dom0 is at > > 100% and the all guests are 0% . Note that, even in this solution, you'll have > > to make sure that ALL guests are at 0%. > > This will likely not be true. In the job I saw the issue the host was at 100%, > one of the guests was at 94% and the other at 0%. Despite the fact that host > FAILED the 94% guest managed to complete successfully after a while. See, that's a valid case. What happens is this: -- dom0/host installs the guests. -- if you just use /distribution/virt/start , it starts the guests and the test is done. -- However, after the guests are started, the tests inside the guests start running. So while all the tests dom0/host might have finished (because as far as they are concerned, just installing and starting up the guests are all), the guests might have bunch of tests that'll take a while to complete. This is not an easy thing to solve. I think, the best way to solve would be to somehow trigger the /distribution/install test inside the guest from dom0 after the guest is installed, so that if the guest installation went awry or guest is installed but just doesn't boot whatever reason, the /distribution/install test of the guest would've timeout and aborted. Marian, is it possible to somehow tell beaker that /distribution/install test inside the guest should start without even having it executed inside the guest? What I want to do is.. When the guest is started, i wanted to tell beaker that /distribution/install test inside the guest has started. So if the guest doesn't boot for whatever reason, the install test inside the guest times out and the whole guest recipe gets aborted. Is this possible?
It does tell beaker when the guest's task gets started. The problem is the watchog for the guests is set to a high value: When job is scheduled all first tasks in recipes are considered in "Running" state with watchdogs assigned. What could help is to reset guests' watchdog to reasonable value after virt-install task or better in virt-start right before the VM is started.
(In reply to comment #11) > What could help is to reset guests' watchdog to reasonable value after > virt-install task or better in virt-start right before the VM is started. I'm assuming we don't want tests talking to lab controller directly, so some new rhts- command would be needed in harness, for example: rhts-guest-started <guest hostname> or rhts-recipe-tasks <hostname> to list tasks with their IDs, then getting ID from there and call: rhts-extend
But still, if there are two guestrecipes and one fails while the second will get to a long running task (like reservesys), it will block for another 24 hours until all EWDs expire. Simplest solution to this problem is making an extension to virt-start, which would wait until all[1] guests are up and abort recipeset if not. This could be desirable for other multihost tests as well, perhaps as a separate task to include right after /distribution/install. [1]: All is a good default, but we could use quantity smaller than all. But at the moment, there is no use running with just part of VMs provisioned in case of multihost test, as we first need a way to reconfigure roles in harness according to which machines are available. This would be an useful extension, especially if beaker allowed returning single machines.
(In reply to comment #13) > But still, if there are two guestrecipes and one fails while the second will > get to a long running task (like reservesys), it will block for another 24 > hours until all EWDs expire. > > Simplest solution to this problem is making an extension to virt-start, which > would wait until all[1] guests are up and abort recipeset if not. This looks better. It would also be nice to have parameter to set, if whole recipeset should be aborted, or just guest recipes which failed to check-in. > > This could be desirable for other multihost tests as well, perhaps as a > separate task to include right after /distribution/install. > > [1]: All is a good default, but we could use quantity smaller than all. But at > the moment, there is no use running with just part of VMs provisioned in case > of multihost test, as we first need a way to reconfigure roles in harness > according to which machines are available. This would be an useful extension, > especially if beaker allowed returning single machines. I think we don't have multihost tests in guests right now, because guest hostname beaker gives you is different from one you have at runtime from DNS/DHCP. So I think this also means, we can't use rhts-sync-* between host and guest, because you can't reach guest with beaker guest hostname.
I think this can be solved pretty easily once bug 655009 is in place. /distribution/virt/install can just tell Beaker to start the recipe for each guest right before it starts the installation of the guest, like Marian suggested in comment 11.
Bulk reassignment of issues as Bill has moved to another team.
As noted in the comments on bug 655009, this issue should have been resolved in 0.10. Feel free to reopen if the problem still occurs.