245887 – FEAT RHEL5: Hardware Test Suite for diskless systems

Bug 245887 - FEAT RHEL5: Hardware Test Suite for diskless systems

Summary: FEAT RHEL5: Hardware Test Suite for diskless systems

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Hardware Certification Program
Classification:	Retired
Component:	Test Suite (harness)
Sub Component:
Version:	5
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Greg Nichols
QA Contact:	Chris Williams
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	245603
TreeView+	depends on / blocked

Reported:	2007-06-27 07:35 UTC by George Beshers
Modified:	2008-05-01 15:39 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-03-07 02:08:11 UTC
Embargoed:

Attachments	(Terms of Use)
Panic when running HTS info test on flipper. (117.41 KB, text/plain) 2007-10-19 02:01 UTC, Jonathan Lim	no flags	Details
HTS passed on dopple. (1.12 MB, text/plain) 2007-10-19 02:03 UTC, Jonathan Lim	no flags	Details
dmidecode output for flipper (24.12 KB, text/plain) 2007-10-19 02:07 UTC, Jonathan Lim	no flags	Details
dmidecode output for dopple (12.90 KB, text/plain) 2007-10-19 02:07 UTC, Jonathan Lim	no flags	Details
HTS results for x86_64 with NFSroot and using iSCSI for swap (505.06 KB, application/octet-stream) 2007-11-16 21:40 UTC, Jonathan Lim	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2008:0493	0	normal	SHIPPED_LIVE	hts bug fix and enhancement update	2008-06-24 18:19:00 UTC

Description George Beshers 2007-06-27 07:35:50 UTC

Description of problem:
  Carlsbad is the new SGI high density cluster product.
  The major new feature as far as hardware certification
  is concerned is that the nodes are diskless.

  This BZ is to track the effort to hwcert the diskless nodes
  in RHEL5.1.

  The nodes are x86_64 cpus on a SuperMicro motherboard---
  identical to the XE 310. 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 George Beshers 2007-06-28 15:15:19 UTC

1. If there are no disks do the disk tests need to be run?

2. With NFSroot how do we avoid cycling the network connection
   thereby crashing the root file system.

Comment 2 George Beshers 2007-06-28 15:19:01 UTC

3. Will Xen certification be required on the diskless nodes of a cluster?

Comment 4 Jay Lan 2007-07-17 21:49:18 UTC

4. Hardware Test Suite seems to require SELinux to be set to "enforcing".
   With that policy, diskless system with nfsroot would not allow remote
   login because SELinux prevents /usr/sbin/sshd "entrypoint" access to 
   /bin/bash (nfs_t). Setting SELinux policy to "permissive" fixed that problem.

Comment 5 Greg Nichols 2007-09-17 17:53:23 UTC

HTS 5.1 Release 1 includes support for NFSRoot systems.

Comment 6 Jonathan Lim 2007-10-19 02:01:41 UTC

Created attachment 231831 [details]
Panic when running HTS info test on flipper.

When running HTS with flipper as the client and dopple as the
NFSroot server, flipper panicked while undergoing the "info" test.

The same problem occurred when running just that test alone on
flipper without NFSroot.

Comment 7 Jonathan Lim 2007-10-19 02:03:45 UTC

Created attachment 231841 [details]
HTS passed on dopple.

There were no errors when running HTS on dopple (without NFSroot).

Comment 8 Jonathan Lim 2007-10-19 02:07:08 UTC

Created attachment 231861 [details]
dmidecode output for flipper

Comment 9 Jonathan Lim 2007-10-19 02:07:48 UTC

Created attachment 231871 [details]
dmidecode output for dopple

Comment 10 Rob Landry 2007-10-19 18:58:31 UTC

Can this be tried again with the -53 kernel.  -52 had a bug in it where cat'ing
/proc/scsi could cause the system to panic, this was addressed in the -53
kernel.  hts doesn't itself actually cat that, but sysreport/sos do which hts
does call in the info test.  Supposedly that but should of only impacted the
megaraid-sas driver.

- [scsi] megaraid_sas: kabi fix for /proc entries (Chip Coldwell ) [323231]

...in either case, nothing in hts should be capable of causing a kernel panic, I
wouldn't think.

Comment 11 Jonathan Lim 2007-10-19 19:08:20 UTC

> Can this be tried again with the -53 kernel.

The machine I borrowed to do the test on has been returned.  I'll have to check
if I can borrow it again, but it won't be so soon.

Comment 12 Jonathan Lim 2007-10-26 00:21:45 UTC

> Can this be tried again with the -53 kernel.

I've run HTS with the -53 kernel (info test only) and it passes.

However, I ran into a new problem. The NFS root has the following entry in
/etc/fstab:

  /dev/VolGroup00/LogVol01 swap swap defaults 0 0

but that swap space doesn't show up when the system is up.  Subsequently,
the threaded memory test fails and the system hangs.  Also, the swapon man
page has a note saying that swap over NFS may not work.  So how do I get
around this?

Comment 13 Jonathan Lim 2007-11-16 21:40:37 UTC

Created attachment 261851 [details]
HTS results for x86_64 with NFSroot and using iSCSI for swap

Attached are the results from running HTS on an x86_64 system
with NFSroot and using iSCSI for swap:

  1. NFSroot and swap (512MB) are provided from another x86_64
     system.  Both are running 2.6.18-53.el5 (RHEL5.1-GA).

  2. The network test for eth0 and eth1 are disabled because
     any interruption to NFS causes the system to hang.

  3. After running HTS, rpm fails to run with the following
     error:

       rpmdb: PANIC: fatal region error detected; run recovery
       error: db4 error(-30977) from dbenv->open: DB_RUNRECOVERY: Fatal error,
run database recovery
       error: cannot open Packages index using db3 -  (-30977)
       error: cannot open Packages database in /var/lib/rpm

Comment 14 Greg Nichols 2008-03-07 02:07:36 UTC

Created 436419 for above issue.  Closing this FEAT bug as it has been incorporated.

Note You need to log in before you can comment on or make changes to this bug.