Description of problem: Carlsbad is the new SGI high density cluster product. The major new feature as far as hardware certification is concerned is that the nodes are diskless. This BZ is to track the effort to hwcert the diskless nodes in RHEL5.1. The nodes are x86_64 cpus on a SuperMicro motherboard--- identical to the XE 310. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
1. If there are no disks do the disk tests need to be run? 2. With NFSroot how do we avoid cycling the network connection thereby crashing the root file system.
3. Will Xen certification be required on the diskless nodes of a cluster?
4. Hardware Test Suite seems to require SELinux to be set to "enforcing". With that policy, diskless system with nfsroot would not allow remote login because SELinux prevents /usr/sbin/sshd "entrypoint" access to /bin/bash (nfs_t). Setting SELinux policy to "permissive" fixed that problem.
HTS 5.1 Release 1 includes support for NFSRoot systems.
Created attachment 231831 [details] Panic when running HTS info test on flipper. When running HTS with flipper as the client and dopple as the NFSroot server, flipper panicked while undergoing the "info" test. The same problem occurred when running just that test alone on flipper without NFSroot.
Created attachment 231841 [details] HTS passed on dopple. There were no errors when running HTS on dopple (without NFSroot).
Created attachment 231861 [details] dmidecode output for flipper
Created attachment 231871 [details] dmidecode output for dopple
Can this be tried again with the -53 kernel. -52 had a bug in it where cat'ing /proc/scsi could cause the system to panic, this was addressed in the -53 kernel. hts doesn't itself actually cat that, but sysreport/sos do which hts does call in the info test. Supposedly that but should of only impacted the megaraid-sas driver. - [scsi] megaraid_sas: kabi fix for /proc entries (Chip Coldwell ) [323231] ...in either case, nothing in hts should be capable of causing a kernel panic, I wouldn't think.
> Can this be tried again with the -53 kernel. The machine I borrowed to do the test on has been returned. I'll have to check if I can borrow it again, but it won't be so soon.
> Can this be tried again with the -53 kernel. I've run HTS with the -53 kernel (info test only) and it passes. However, I ran into a new problem. The NFS root has the following entry in /etc/fstab: /dev/VolGroup00/LogVol01 swap swap defaults 0 0 but that swap space doesn't show up when the system is up. Subsequently, the threaded memory test fails and the system hangs. Also, the swapon man page has a note saying that swap over NFS may not work. So how do I get around this?
Created attachment 261851 [details] HTS results for x86_64 with NFSroot and using iSCSI for swap Attached are the results from running HTS on an x86_64 system with NFSroot and using iSCSI for swap: 1. NFSroot and swap (512MB) are provided from another x86_64 system. Both are running 2.6.18-53.el5 (RHEL5.1-GA). 2. The network test for eth0 and eth1 are disabled because any interruption to NFS causes the system to hang. 3. After running HTS, rpm fails to run with the following error: rpmdb: PANIC: fatal region error detected; run recovery error: db4 error(-30977) from dbenv->open: DB_RUNRECOVERY: Fatal error, run database recovery error: cannot open Packages index using db3 - (-30977) error: cannot open Packages database in /var/lib/rpm
Created 436419 for above issue. Closing this FEAT bug as it has been incorporated.