Bug 296931

Summary: IBM QS21 system with no swap fails memory test
Product: [Retired] Red Hat Hardware Certification Program Reporter: Monza Lui <mlui>
Component: Test Suite (tests)Assignee: Greg Nichols <gnichols>
Status: CLOSED CURRENTRELEASE QA Contact: Joseph Kachuck <jkachuck>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: bxu, dmfaria, hannsj_uhl, rlandry
Target Milestone: ---   
Target Release: ---   
Hardware: ppc64   
OS: Linux   
URL: http://www-03.ibm.com/systems/bladecenter/qs21/
Whiteboard: IBM
Fixed In Version: 5.1-11 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-06 21:55:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Result with "tempered hts
none
Result with "tempered hts
none
hts results from three runs with a nfs aware test suite
none
tmp.a25339 - memory test make log
none
Results of: ./threaded_memtest -qpv -m90% -t10
none
HTS results none

Description Monza Lui 2007-09-19 20:15:53 UTC
Red Hat Hardware Certification Submitted

Product:	Red Hat Enterprise Linux
Version:	5
Make:		IBM
Model:		QS21
Vendor: 	IBM
Category:	Server
Reporter:	mlui.com

Comment 1 Monza Lui 2007-09-19 21:11:36 UTC
Created attachment 200101 [details]
Result with "tempered hts

We 

diff /usr/share/hts/tests/network/network.py 

<<	  self.downAllInterfaces() 
<<	  self.setSignalHandler(self.restoreAllInterfaces) 

<<	  if not self.bringUpInterface(): 

>>	  #self.downAllInterfaces() 
>>	  #self.setSignalHandler(self.restoreAllInterfaces) 

>>	  if not 1: #self.bringUpInterface():

Comment 2 Monza Lui 2007-09-19 21:25:03 UTC
Created attachment 200121 [details]
Result with "tempered hts

Due to limitation of NFSroot, we ran the hts testsuite:
1) but commented out the part where it resets the network.  See followings for
how we have edited the code.  
2) with SElinux is disabled
3) and memory test failed due to unavailability of swap partition 

diff /usr/share/hts/tests/network/network.py 

<<	  self.downAllInterfaces() 
<<	  self.setSignalHandler(self.restoreAllInterfaces) 

<<	  if not self.bringUpInterface(): 

>>	  #self.downAllInterfaces() 
>>	  #self.setSignalHandler(self.restoreAllInterfaces) 

>>	  if not 1: #self.bringUpInterface():

Comment 3 Xu Bo 2007-09-20 01:47:55 UTC
Monza,
RHEL5.1 is going to be launched on Q4 2007. Now the version is Alpha or Beta. We
don't accept the test results on Alpha or Beta version. Would you please get the
test run and upload the RPMs after we launching the GA version? 

Thanks,
/

Comment 4 Monza Lui 2007-10-02 22:09:48 UTC
Hi Xu, yes we are planning to upload a successful run later after GA.  However, 
what we need right now is a hts that is adopted to the diskless environment.  

Ron, we did not get the hts for diskless environment last week.  Will we have 
it this week?  Thank you.

Comment 5 Monza Lui 2007-10-02 22:16:09 UTC
Sorry, I meant Rob.

Comment 6 YangKun 2007-10-24 07:13:06 UTC
Hi Monza,

I think you can contact your TAM on geting test suite and related software.

Regards
-YK

Comment 9 Monza Lui 2007-10-25 17:59:11 UTC
We got the new hts and ran the test again.  This time we are able control the 
test so that it does not restart the network which could cause problem with 
our diskless environment on Cell.

However, we currently still have one old and one new problem with the 
testsuite:
1) Swap space requirement (old problem)
   The test failed all memory tests because our QS21 is diskless and therefore 
does not have swap space.
2) Unmount USB drives (new problem)
   In the QS21 environment, three USB devices always show up as they are 
present in the chassis.  However, they are not unmountable.  And now the hts 
always complains about not able to umount them and fails.
[root@cell21c ~]# lsusb
Bus 003 Device 001: ID 0000:0000
Bus 002 Device 001: ID 0000:0000
Bus 001 Device 001: ID 0000:0000

Your prompt response is highly appreciated as our GA dates are drawing really 
close.  Thank you.

Comment 10 Daniel Machado de Faria 2007-10-25 18:49:13 UTC
Created attachment 237751 [details]
hts results from three runs with a nfs aware test suite

Comment 11 Greg Nichols 2007-10-25 19:48:06 UTC
From the above attached results, all three test runs on USB have the following log:

USB test:
USB Hub Interface appears to be plugged into bus 1 port 0
USB Hub Interface appears to be plugged into bus 2 port 0
USB Hub Interface appears to be plugged into bus 3 port 0
How many unused USB sockets are there? response: 0
No USB sockets to test
...finished running ./usb.py, exit code=0

These tests PASS.   What indication do you have that the test fails?

Comment 12 Greg Nichols 2007-10-25 19:58:03 UTC
Also, please run the threaded memory test directly, ala:

$ cd /usr/share/hts/tests/memory
$ make
$ ./threaded_memtest -qpv -m100% -t10

And attach the output.

Thanks!

Comment 13 Daniel Machado de Faria 2007-10-26 13:01:04 UTC
USB test: I answered '0' to "How many unused USB sockets are there?" because if
I answer 3, it will ask me to unplug the device, but as Monza said above, it is
not possible to unmount these USB devices, and the test will repeat indefinitely
until I answer 'NO':
----
USB test:
USB Hub Interface appears to be plugged into bus 1 port 0
USB Hub Interface appears to be plugged into bus 2 port 0
USB Hub Interface appears to be plugged into bus 3 port 0
How many unused USB sockets are there? 3
response: 3
testing socket 1 of 3...
Please plug in a USB device - continue? (yes|no) yes
response: yes
found device at /sys/devices/pci0005:00/0005:00:01.0/usb1/1-0:1.0
Please unplug the device and hit the Enter key: 
Did not confirm the device - repeating test.
testing socket 1 of 3...
Please plug in a USB device - continue? (yes|no) no
response: no
...finished running ./usb.py, exit code=1
recovered exit code=1
hts-report-result /HTS/hts/usb FAIL /var/log/hts/runs/1/usb/output.log 
----


###########################################################################
Memory test - We do not have any enabled swap space as we are working over NFS.
1 - I am posting the make result as tmp.a25339 
2 - The test result is:
----
[root@cell21c memory]# ./threaded_memtest -qpv -m100% -t10
Warning: memsize > free_mem. You will probably hit swap.
Detected 4 processors.
RAM: 89.1% free (1784M/2003M)
Testing 2003M RAM for 10 seconds using 8 threads:
thread 0: mapping 250M RAM
thread 1: mapping 250M RAM
thread 2: mapping 250M RAM
thread 3: mapping 250M RAM
thread 4: mapping 250M RAM
thread 5: mapping 250M RAM
thread 6: mapping 250M RAM
thread 7: mapping 250M RAM
thread 1: mapping complete
thread 0: mapping complete
thread 5: mapping complete
thread 4: mapping complete
thread 2: mapping complete
Killed


Comment 14 Daniel Machado de Faria 2007-10-26 13:02:14 UTC
Created attachment 239061 [details]
tmp.a25339 - memory test make log

Comment 15 Greg Nichols 2007-10-26 13:32:58 UTC
usb test: if there are no unused sockets (meaning, physical sockets available
for use to plug physical usb devices into), then 0 is the correct answer, and
the usb test passes.

So the memory issue is all that remains open.

Comment 16 Greg Nichols 2007-10-29 12:33:28 UTC
Please supply /var/hts/results.xml

Comment 17 Greg Nichols 2007-10-29 13:37:37 UTC
Also, please try running threaded_memtest at 90% of free memory, via:

./threaded_memtest -qpv -m90% -t10

Thanks!

Comment 18 Daniel Machado de Faria 2007-10-29 16:43:37 UTC
Created attachment 242061 [details]
Results of: ./threaded_memtest -qpv -m90% -t10

There is no /var/hts/results.xml. The test is attached.

[root@cell21c memory]# ls /var/hts
config.xml  hts-IBM-Cell_QS21-Tikanga_ppc64_results-1.noarch.rpm  plan.xml

Comment 19 Greg Nichols 2007-10-30 00:30:16 UTC
Please try the revised test suite:

http://people.redhat.com/rlandry/hts/hts-5.1-8.el5.noarch.rpm

Note that you can run the memory test via: 

hts certify --test memory

Comment 20 Daniel Machado de Faria 2007-10-30 19:24:53 UTC
Hi, Now the memory test passed when running hts certify --test memory.
But the test suite is bringing the network down without asking me, so all the
system stops, because it is NFS based. The previous version I used do not have
this issue (hts-5.1-3.el5.noarch_10_18.rpm).


Comment 21 Rob Landry 2007-10-30 20:14:27 UTC
Can the logs for this failed run be attached?  Presumably the're at least
partial up till the network interface went down.  Also, was this done as a clean
run or with an existing plan after an hts upgrade?

Comment 22 Monza Lui 2007-10-30 20:18:58 UTC
Rob, not quite sure about your second question.  What do you mean by "with 
existing plan after an hts upgrade"?  Thanks.

Comment 23 Rob Landry 2007-10-30 21:16:51 UTC
To run hts you have to have a plan file; what gets planned can change between
releases however I believe hts will attempt to use an existing plan file if it
finds one.  So if hts was just updated and a new plan wasn't created then it's
possible that the old plan is the cause of the network shutdown.  Unfortunately
I'm not the developer on this, so I can't say for certain that's likely the
cause, but given the timing I figured I'd at least offer the suggestion to see
if it makes any difference since it should be pretty quick to tell.

Comment 24 Greg Nichols 2007-10-30 23:36:18 UTC
"hts clean" will remove all old test runs, and even the old test plan.

I don't think that's the issue here, but it's worth a try.

The results containing the network failure would be a help, as well as 
the output of the "mount" command.



Comment 25 Daniel Machado de Faria 2007-10-31 19:24:21 UTC
Sorry my mistake, I removed the previous version of hts, installed the new one
and ran the test without creating a new plan. I created a plan and now it asks
before turning the network down.

Another issue: I started the server on a QS21 (nfs environment) running hts
server start, but hts can not mount the supposedly exported directory. Do you
know if I can run the server on a QS21? 

[root@cell21c ~]# mount -o rw,intr,rsize=12288,wsize=12288,udp
cell21e.ltc.austin.ibm.com:/var/hts/export /tmp/mounttest/
mount: cell21e.ltc.austin.ibm.com:/var/hts/export failed, reason given by
server: Permission denied

[root@cell21c ~]# mount cell21e.ltc.austin.ibm.com:/var/hts/export /tmp/mounttest/
mount: cell21e.ltc.austin.ibm.com:/var/hts/export failed, reason given by
server: Permission denied


Comment 26 Greg Nichols 2007-10-31 19:47:09 UTC
No, you can't run the hts server on an nfs root system.

Moving this to MODIFIED, as the memory fix will be released with R10.

Comment 27 Monza Lui 2007-10-31 19:54:32 UTC
Greg, QS21 is a nfs root (diskless) system.  We have requested that this to be 
supported in hts so that we can run hardware certification.  Please check with 
Rob.  Reopening the bug.

Comment 28 Greg Nichols 2007-10-31 20:47:00 UTC

As I understand it, your question in comment #25 is, can the HTS server be run
on the System Under Test.  The answer is no.   The HTS server must be run on a
seperate system so that the network interfaces may be tested.

For the related question, can the HTS server be an nfs-root system, the answer
is also no.

Do I understand the situation correctly?


Comment 29 Monza Lui 2007-10-31 23:06:55 UTC
Greg, thank you for the clarification :)

Comment 30 Monza Lui 2007-11-05 15:32:37 UTC
We are testing the latest hts with Greg's suggestions.  Will post result ASAP.

Comment 31 Daniel Machado de Faria 2007-11-06 15:29:20 UTC
Created attachment 249341 [details]
HTS results

Comment 32 Daniel Machado de Faria 2007-11-06 15:31:51 UTC
All tests passed:
[root@cell21c daniel]# hts print
loaded configuration /var/hts/config.xml
loaded plan /var/hts/plan.xml
loaded results /var/hts/results.xml

Red Hat Hardware Certification test
--------------------------------------------
Test Suite:    5.1    Release: 8
Plan Created:  2007-10-25 20:25:26
Test Server:   cell8.ltc.austin.ibm.com
--------------------------------------------

Run: 1 on 2007-10-26 13:26:51
--------------------------------------------
Tests: 5 planned,  5 run, 5 passed, 0 failed
--------------------------------------------


Test Run 1
----------------------------------------------------------------
usb                                                  - PASS
network eth1    net_00_1a_64_0e_03_03                - PASS
memory                                               - PASS
core                                                 - PASS
info                                                 - PASS

Combined Results for 1 Runs:
--------------------------------------------
   5 tests planned
   5 tests run
   0 tests always failed
   5 tests always passed


Comment 33 Monza Lui 2007-11-06 20:16:54 UTC
Hi Greg, anything else we need to do to have QS21 on the hardware list 
(https://hardware.redhat.com/)?

Comment 34 Rob Landry 2007-11-06 21:55:28 UTC
Monza, 

To have the hardware listed, you'll need to open a certification request on
hardware.redhat.com, providing the specs for the model separately from this bug.
 From the above it looks like this bz is resolved by the latest hts package so
I'm going to go ahead and close this bug as well.  Presuming the package #'s
line up you may re-use the same results from above as part of the hwcert.

-Rob

Comment 35 Monza Lui 2007-11-07 14:43:12 UTC
Thank you for the pointer, Rob :)  Opened the following cert request -
https://hardware.redhat.com/show.cgi?id=369641