Red Hat Bugzilla – Bug 200119
sginfo of RAID drives leads to disk corruption
Last modified: 2008-08-15 05:10:27 EDT
Description of problem:
Disk corruption occurs when attempting to read serial numbers of
physical drives in a RAID configuration.
This is possibly specific to the vendor
(Dell 2850 with PERC 4e/Di MegaRAID).
Version-Release number of selected component (if applicable):
RHEL4 Update 3
There is a direct correlation to running the 'sginfo' program and disk
corruption. The corruption is not immediately apparent. There may be some
dependency on the system activity.
Steps to Reproduce:
1. Configure Dell 2850 BIOS to use RAID on channel A and channel B
2. Configure Dell MegaRAID BIOS for RAID-1,
2x64kb stripes, WRBACK, ReadAdaptive, DirectIO
3. Install RHEL4 Update 3. A mininum install will do.
4. run 'sginfo -l' . It will list /dev/sda, /dev/sg0, /dev/sg1.
5. run 'sginfo -s /dev/sda' multiple times.
sginfo notes that the serial numbers are not accessible.
repeat a number of times, while the network and the disk are active.
After a number of reboots, you may notice that services cannot be
started due to files not found. Also, programs may show:
when invoked. /sbin/reboot was one such program.
Expected to see the serial numbers of the physical drives.
Expected no disk corruption.
RHEL4 Update4 kernel-smp-2.6.9-42.EL on Dell 1850 (PERC 4e/Si),
Dell 2850 (PERC 4e/Di) shows no evidence of corruption.
Hm, so this seems to have been a kernel bug then which got resolved with RHEL4
Read ya, Phil
I can apply the 2.6.9-42.EL linux-2.6.9-megaraid-update.patch to another kernel.
Which one would you deem worthy?
Could you try to apply that patch to your original kernel that caused the problems?
If that prevents the problems i think we can positively say it was a kernel
driver problem of the megaraid driver. And that the sg-tools just triggered one
bug in the old driver that could have been triggered otherwise, too.
Read ya, Phil
Applied 2.6.9-42.EL linux-2.6.9-megaraid-update.patch
to 2.6.9-34.0.2.EL source, rebuilt kernel, ran on freshly installed
Dell 2850, which has the PERC 4e/Di, and again on
Dell 1850, which has the PERC 4e/Si.
Confirmed that the disk corruption was no longer produced by
running 'sginfo -s /dev/sda' in a tight loop while transferring
gigabytes from an external source to the filesystem.