Red Hat Bugzilla – Bug 991236
more aggressive hardware testing in bkr machine-test
Last modified: 2018-02-05 19:41:31 EST
At present bkr machine-test just schedules /distribution/install in its jobs, which is a good way of making sure the system can boot and install a distro. But there are many hardware problems which this will never find, so it's not a good way to test a flakey system.
We could write a new task which performs actual hardware tests, such as:
* check SMART data on all disks
* perform SMART self-tests on all disks
* run bad block checking on all disks
* run a memory tester?
* run some kinds of CPU self-tests?
The bkr machine-test command could have an option --aggressive which adds this task when it schedules a job.
Some points that might help:
* SMART is available only on quite small number of machines,
due to machines using either
- SCSI/SAS drives (no SMART at all)
- additional layer (HW raid) between the drives and the OS
* bad blocks can be done using the "badblocks" utility
- make sure to do write testing
- make sure to specify larger "N blocks at a time" value, speed reasons
- make sure to use `-t' to specify at least one pseudorandom pass,
normal check DO NOT detect silent offset pointer corruption (!!)
* memory testing via memtest86+ could be hard to do automatically,
a tool called "memtester"  can do it while the system is running
- it doesn't test all memory, just what it can lock, .. still useful
* CPU stress testing can be done
- using "cpuburn" (burnMMX, ...) running over some period of time
(at least 30min)
- using a Prime95 equivalent for Linux, "mprime" CLI tool, which can
also perform stress tests with verification of result correctness,
however it uses rather arch-specific instructions (AVX on intel),
which might not be a relevant test method
All of this would need to be done from initramfs as HDD testing would effectively overwrite/erase everything. A few approaches come to my mind, but all of them would need all the tools along with beaker-related result uploader in the initramfs anyway:
* using anaconda and %pre section
* using RHEL-based (dracut) initramfs
* using a completely custom (glibc-based) kernel + initramfs pair
- might be a bit more complex to be architecture-independent
- not as complex as it seems, I've built several in the past
.. just my $0.02 ..