Bug 658360

Summary:	failed run in beaker with a bunch of testcases which passed in rhts and passed with run separately in beaker
Product:	[Retired] Beaker	Reporter:	yanfu,wang <yanwang>
Component:	beah	Assignee:	Marian Csontos <mcsontos>
Status:	CLOSED CURRENTRELEASE	QA Contact:	yanfu,wang <yanwang>
Severity:	medium	Docs Contact:
Priority:	high
Version:	0.5	CC:	bpeck, dcallagh, dkovalsk, mcsontos, rmancy
Target Milestone:	---	Keywords:	Regression
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-01-28 09:30:22 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	666980
Bug Blocks:	593365

Description yanfu,wang 2010-11-30 06:11:38 UTC

Description of problem:
failed run in beaker with a bunch of testcases which passed in rhts, and run failed testcase separately in beaker could pass.

Version-Release number of selected component (if applicable):
# uname -a
Linux dhcp-65-195.nay.redhat.com 2.6.31.9-174.fc12.i686.PAE #1 SMP Mon Dec 21 06:04:56 UTC 2009 i686 i686 i386 GNU/Linux
# rpm -qa|grep beaker
beaker-client-0.5.61-2.el6.noarch
beaker-redhat-0.1.18-1.el6.noarch
beakerlib-redhat-1-2.el6.noarch
beaker-0.5.61-2.el6.noarch
beakerlib-1.3-3.el6.noarch
python-beaker-1.3.1-6.fc12.noarch


How reproducible:
always

Steps to Reproduce:
1. a bunch of testcases could pass in rhts on all arches:
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=180959
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=180960
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=180961
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=180962
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=180963

2. the same bunch of testcases failed run in beaker:
https://beaker.engineering.redhat.com/jobs/34704
https://beaker.engineering.redhat.com/jobs/35196
https://beaker.engineering.redhat.com/jobs/35200

3. select failed testcases and run it one by one could pass in beaker:
https://beaker.engineering.redhat.com/jobs/35209
https://beaker.engineering.redhat.com/jobs/35210
https://beaker.engineering.redhat.com/jobs/35211

  
Actual results:
test failed.

Expected results:
the test result should same with rhts ——pass. 

Additional info:
seems it's often failed with batch run.

Comment 1 Marian Csontos 2010-11-30 11:03:17 UTC

Have you considered this could be a problem with the test?

If it runs fine on its own, it is likely an interference with previous test(s) where some of them did not do a proper clean up, what's supported by fact that the nfs service who refuses to start. Have you checked logs?

Did the test in RHTS run in the same order and with same parameters?

Comment 2 yanfu,wang 2010-12-01 16:56:03 UTC

(In reply to comment #1)
> Have you considered this could be a problem with the test?
> 
> If it runs fine on its own, it is likely an interference with previous test(s)
> where some of them did not do a proper clean up, what's supported by fact that
> the nfs service who refuses to start. Have you checked logs?
> 
> Did the test in RHTS run in the same order and with same parameters?

hi Marian,
In my opinion, if these testcases didn't do a proper clean up, they should also fail in rhts, but the true is they all passed in rhts with batched run, pls check the rhts job links.
Could it be that there need different or extra mechanism to create testcase run by beaker against rhts? 

Correct me if I'm wrong, thanks.

Comment 3 David Kovalsky 2010-12-10 07:44:31 UTC

Hi guys, 

since we're actively pushing to kill off legacy RHTS, I'd like to get status on this bug. Either work it as a blocker or let's move it into qe-hotbeakerbugs tracker.

Is this really a regression? Should we block RHTS decommision for this? Is QE going to be able to continue testing when RHTS is dead?

Comment 4 yanfu,wang 2010-12-10 09:31:27 UTC

(In reply to comment #3)
> Hi guys, 
> 
> since we're actively pushing to kill off legacy RHTS, I'd like to get status on
> this bug. Either work it as a blocker or let's move it into qe-hotbeakerbugs
> tracker.
> 
> Is this really a regression? Should we block RHTS decommision for this? Is QE
> going to be able to continue testing when RHTS is dead?

I always run regression test with the bunch of testcases against nfs-utils package, and these passed in rhts before. So I think it will be a big problem to me if I can't run these regression testcases in beaker to deal with nfs-utils errata while rhts is dead.
And these testcases all have clean up phase that work well in rhts, do I need to re-spent time to re-check them again? In fact, I don't know where cause the problem in testcase and most of them created and owned by different engineer.

So pls investigate if it's a beaker bug. If not pls give me detail feedback why they are passed in rhts before and failed in beaker now and how to modify testcase accordingly?

Comment 5 David Kovalsky 2010-12-10 09:38:18 UTC

Understood. I'm taking this to beaker-dev-list.

Comment 6 Marian Csontos 2010-12-10 22:18:08 UTC

The problem is caused by nfs daemon which can not be killed except with kill -9. The service is not restarted and the tests are failing. And once started, it is not killable again.

I have asked steved for help, but even he could not get anything meaningful from the machine. He indicated the problem may lay with kernel:

  <steved> mcsontos: Its not clear... It appeared the kernel process got into some funky state where they were no longer accepting signals....
  <steved> mcsontos: the fact that things are now working normally makes think is something in the kernel...

I will try to isolate the problem.

Comment 7 Bill Peck 2010-12-10 22:36:35 UTC

Hi Marian,

Thanks for looking into this.  From your description it sounds like we should have the same issue under legacy RHTS right?

Comment 8 Marian Csontos 2010-12-10 23:12:28 UTC

Not necessarily. It must be triggered by either harness or configuration. And something in configuration may come e.g. from system installation.

Comment 9 David Kovalsky 2010-12-20 16:21:25 UTC

So what are the next steps here?

Comment 10 Marian Csontos 2011-01-13 13:59:24 UTC

Finally it boiled down to problems with TTY: 'service nfs <OP>' does not work very well without one.

Testing with modified harness ( Bug 666980 )

Comment 11 Marian Csontos 2011-01-17 18:59:46 UTC

Running test in compatible mode helps:

https://beaker.engineering.redhat.com/recipes/94996

This will be delivered within next update window.