Bug 1031456 - info always failed and caused machine halt [NEEDINFO]
info always failed and caused machine halt
Status: CLOSED CANTFIX
Product: Red Hat Hardware Certification Program
Classification: Red Hat
Component: Test Suite (tests) (Show other bugs)
1.6.4
x86_64 Linux
unspecified Severity unspecified
: ---
: ---
Assigned To: Greg Nichols
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-17 21:57 EST by chengjianjun
Modified: 2013-12-01 22:14 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-01 22:14:30 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
chengjianj: needinfo? (gnichols)


Attachments (Terms of Use)

  None (edit)
Description chengjianjun 2013-11-17 21:57:07 EST
Description of problem:

The info test can not be completed no matter which test I run. 
When the screen says "Running plugins.Please wait ...",the system halts.
I have to force shutdown the machine.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:When doing hwcert-backend plan for the first time on my machine,it comes out with another machine type NF5280 and I need to change it to current teating machine SA5212H2
Comment 1 chengjianjun 2013-11-17 22:44:43 EST
I have reinstalled the RHEL6.4 and test suit,but still can't get passed and the system still halt
Comment 2 Greg Nichols 2013-11-18 11:24:16 EST
What versions of hwcert-client and sos are installed?
Comment 3 chengjianjun 2013-11-18 21:13:24 EST
(In reply to Greg Nichols from comment #2)
> What versions of hwcert-client and sos are installed?

hwcert-client version is 1.6.4-57.el6 
sos version?

I installed these packages:

dt-15.14-2.EL6.x86_64 
hwcert-client-1.6.4-57.el6.noarch 
hwcert-client-info-1.6.4-57.el6.noarch 
kernel-debuginfo-2.6.32-358.el6.x86_64 
kernel-debuginfo-common-x86_64-2.6.32-358.el6.x86_64 
lmbench-3.0a7-7a.EL6.x86_64 
stress-0.18.8-1.3.EL6.x86_64

just as I do on other machine
Comment 4 chengjianjun 2013-11-19 01:04:51 EST
The sosreport version is 2.2
Comment 5 chengjianjun 2013-11-20 00:28:46 EST
(In reply to Greg Nichols from comment #2)

Does the test suite read machine information such as vendor,make and model from BIOS?

I doubt that the BIOS version is not appropriate
Comment 6 Rob Landry 2013-11-20 17:19:12 EST
Does sosreport run correctly on this box when called outside of the testsuite or is the halt reproducible there as well?
Comment 7 chengjianjun 2013-11-24 19:36:25 EST
(In reply to Rob Landry from comment #6)
> Does sosreport run correctly on this box when called outside of the
> testsuite or is the halt reproducible there as well?

The halt is reproducible as well.
Comment 8 chengjianjun 2013-11-24 19:56:04 EST
(In reply to Rob Landry from comment #6)
> Does sosreport run correctly on this box when called outside of the
> testsuite or is the halt reproducible there as well?

When I ran the sosreport independently without the hwcert client ,halt appeared .

Screen said

"Running plugins. Please wait ...

completed  [19/72] ..."

then halted...
Comment 9 Rob Landry 2013-11-25 13:26:40 EST
So the good and the bad news is you're not fighting with the hwcert test suite as it's reproducible with sosreport alone.  This means the halt is caused by something called inside of sosreport.  

Sosreport is a requirement of hwcert as it is used by RH support to understand the customer environment, and it is a certification blocker as it would be a bad customer experience if their call to support about one issue led to a system halt.

The next steps are to figure out what caused the halt to determine a plan from there.  Unfortunately [19/72] isn't specific enough as it depends on which of the overall available possible plugins was 19.

Utilizing the -v option on sosreport should help provide additional context to the sos run and hopefully help identify which plugin was last run.  The -n option to disable a suspected plugin to see if sos report then completes, followed by a -o to enable only the suspected plugin to reproduce the halt should provide you the tools to be able to narrow down which plugin is causing the issue.

Once we know which plugin, that plugin can be inspected to see if we can determine a specific cause and a resolution plan from there.  Most likely a BIOS and/or kernel change would be required.
Comment 10 chengjianjun 2013-11-26 06:32:47 EST
I ran the 'sosreport -l' and found the 72 enabled plugins .The 19th plugin is hardware .

Then ran 'sosreport -n hardware',got passed...

I will test my machine without the hardware plugin to see if the results are acceptable by RH cert team.

Otherwise,I have to change BIOS version or kernel as you said.

I'll report the result later

Thanks
Comment 11 chengjianjun 2013-11-26 07:06:38 EST
(In reply to chengjianjun from comment #10)

> I will test my machine without the hardware plugin to see if the results are
> acceptable by RH cert team.


In fact , the sosreport option which is right after the passed video test seems unchangable.

I can't just skip the hardware plugin while the test is running .

Can we get paused at the beginning of the sosreport and change the option to get passed or add an option before running hwcert-backend because there is a notice saying "Usage: sosreport [options]".
Comment 12 chengjianjun 2013-12-01 22:14:30 EST
I have got this problem solved .
Just delete the hardware plugin under the directory :/usr/lib/python2.6/site-packages/sos/plugins/

Note You need to log in before you can comment on or make changes to this bug.