Bug 737540 - Harness could not run the task: Task 'XXX' finished, exit code 0. rc=2
Summary: Harness could not run the task: Task 'XXX' finished, exit code 0. rc=2
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Beaker
Classification: Community
Component: beah
Version: 0.7
Hardware: Unspecified
OS: Unspecified
high
high vote
Target Milestone: ---
Assignee: Bill Peck
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-09-12 13:28 UTC by Marian Csontos
Modified: 2019-05-22 13:41 UTC (History)
9 users (show)

Fixed In Version: beah-0.6.33-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-09-28 15:22:41 UTC


Attachments (Terms of Use)

Description Marian Csontos 2011-09-12 13:28:09 UTC
Description of problem:
Harness does not restart the task after reboot.

Version-Release number of selected component (if applicable):
beah-0.6.32-1

How reproducible:
Likely low[1][2]

Steps to Reproduce:
1. Run rhts-reboot in a loop
  
Actual results:
Task ends with Fail result "harness/run" with message:

> Harness could not run the task: Task 'XXX' finished, exit code 0. rc=2

Expected results:
Task should cycle forever (until killed by EWD)

Additional info:
[1] I run a 24 hour long test test which was rebooting machine (virtual of course!) in a cycle and have not seen the issue in 400+ runs.

[2] Even 1/1000 ratio would be large enough: running 100 tasks in a job would mean 10% of recipes would be broken and that's just too high failure rate. Considering this a high priority.

Comment 1 Marian Csontos 2011-09-13 11:19:13 UTC
The fix for Bug 683184 caused problem introduced by fixing Bug 711270 got unmasked.

By using non-standard lock-files services are not stopped when changing runlevels as no lockfile is found - /etc/rc requires lockfile for subsystem to be present.

Services are stopped at random order by S01reboot calling killall5 and when task gets killed earlier than server this event is caught and task is considered finished.

Plan to revert the fix for 711270 as that is just a workaround for seriously broken system. We shall not be held responsible for that.

Comment 2 Marian Csontos 2011-09-13 11:45:04 UTC
Pushed to gerrit.

Comment 3 Marian Csontos 2011-09-14 08:26:06 UTC
Tested in a VM on RHEL{4,5,6}.
Updated package is now on beaker-stage.

Comment 11 PaulB 2011-09-21 21:05:09 UTC
All,
The following issue still exists:
  ./harness/run
  Harness could not run the task: Task u'ccd30709-7ecc-427f-a3ad-7057088be08d' finished, exit code 0. rc=2 

See here:
https://beaker.engineering.redhat.com/tasks/executed?arch_id=7&task=%2Fkernel%2Fdrivers%2F3rd-party&result_id=4&job_id=134006&whiteboard=2.6.9-102.EL

Best,
-pbunyan

Comment 12 Marian Csontos 2011-09-22 06:38:54 UTC
Looks like RHEL4 specific issue - services are still not stopped properly on reboot. Will look into it.

If you see this on other releases let me know please.

Comment 13 Bill Peck 2011-09-27 13:35:50 UTC
ping - any news here?  Why is this happening on rhel4?

Comment 14 Marian Csontos 2011-09-27 15:03:36 UTC
Pushed to gerrit for review.
    
> chkconfig on RHEL4 works in a slightly different manner than newer
> releases: chkconfig --levels 345 service on does create only SNNservice
> links and no KNNservice.

'chkconfig service reset' before 'chkconfig ... service on' seems to do the job.

Comment 15 Dan Callaghan 2011-09-28 00:41:46 UTC
Beah already has chkconfig --add in its %post scriptlet, does that not create the necessary K* symlinks? The (current) man page suggests that it should, although maybe on RHEL4 it was different...

> --add name

>    This  option adds a new service for management by chkconfig.  When a new service
>    is added, chkconfig ensures that the service has either a start or a kill  entry
>    in  every  runlevel. If any runlevel is missing such an entry, chkconfig creates
>    the appropriate entry as specified by the default values  in  the  init  script.

Comment 16 Marian Csontos 2011-09-29 08:05:33 UTC
Error in the spec file which used $ instead of %.

Comment 17 Marian Csontos 2011-09-29 08:11:40 UTC
Thanks Dan for pointing me to the error! After you gave me the hint it was easy.

I apologise to chkconfig component who is completely innocent.

Comment 18 Bill Peck 2011-09-30 15:54:21 UTC
Hello, 
Your ticket is ready for testing and is currently running on
https://beaker-stage.app.eng.bos.redhat.com

Please ensure your request for beaker has been adequately addressed by testing
it on the above machine. 

Testing will be available up until COB on the 5th October.

Thank you
Beaker development team

Comment 20 Marian Csontos 2011-11-11 21:23:11 UTC
It is only el4 which suffers from the issue.
As a workaround insert this task to your jobs, please:

  /distribution/beaker/beah/misc/chk-services

Will test and submit proper patch during next week.

Comment 21 PaulB 2011-11-13 22:38:02 UTC
Marian,
Thank you for the workaround task.
I added task /distribution/beaker/beah/misc/chk-services to the jobs, and we have results for RHEL4u9 KT1 Testing :)
https://beaker.engineering.redhat.com/jobs/157186

The issue was still hit on the following recipes, in case your interested:
https://beaker.engineering.redhat.com/recipes/328255
https://beaker.engineering.redhat.com/recipes/328279
https://beaker.engineering.redhat.com/recipes/328280
https://beaker.engineering.redhat.com/recipes/328263
https://beaker.engineering.redhat.com/recipes/328267
(I have cloned the recipes, in hopes of getting past the issue)

Thank you for the workaround, Marian :)

Best,
-pbunyan

Comment 22 Marian Csontos 2011-11-18 12:05:28 UTC
I need more time as the patch does the same as the workaround just at the install time and that apparently does not fix the issue.

The patch was submitted to gerrit for review anyway and will be pushed with next or next+1.

Comment 23 PaulB 2011-12-16 13:36:13 UTC
Marion,
In regard to Comment#22...
Is there a fix expected for this issue with RHEL4 testing?

Best,
-pbunyan

Comment 25 Marian Csontos 2012-04-03 12:10:29 UTC
The patch mentioned in comment 22 is included in beah-0.6.36.
beah-0.6.38 is deployed.


Note You need to log in before you can comment on or make changes to this bug.