Bug 200119 - sginfo of RAID drives leads to disk corruption
sginfo of RAID drives leads to disk corruption
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: sg3_utils (Show other bugs)
4.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Dan Horák
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-07-25 12:14 EDT by Michael J. Slifcak
Modified: 2008-08-15 05:10 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-08-15 05:10:27 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Michael J. Slifcak 2006-07-25 12:14:09 EDT
Description of problem:
Disk corruption occurs when attempting to read serial numbers of
physical drives in a RAID configuration.
This is possibly specific to the vendor
 (Dell 2850 with PERC 4e/Di MegaRAID).

Version-Release number of selected component (if applicable):
RHEL4 Update 3

How reproducible:
There is a direct correlation to running the 'sginfo' program and disk
corruption.  The corruption is not immediately apparent. There may be some
dependency on the system activity. 

Steps to Reproduce:
1. Configure Dell 2850 BIOS to use RAID on channel A and channel B
2. Configure Dell MegaRAID BIOS for RAID-1,
        2x64kb stripes, WRBACK, ReadAdaptive, DirectIO
3. Install RHEL4 Update 3. A mininum install will do.
4. run 'sginfo -l'  .  It will list /dev/sda, /dev/sg0, /dev/sg1.
5. run 'sginfo -s /dev/sda' multiple times.
  
Actual results:
sginfo notes that the serial numbers are not accessible.
repeat a number of times, while the network and the disk are active.
After a number of reboots, you may notice that services cannot be
started due to files not found.  Also, programs may show:
   Segmentation fault
when invoked.  /sbin/reboot was one such program.

Expected results:
Expected to see the serial numbers of the physical drives.
Expected no disk corruption.


Additional info:
Comment 1 Michael J. Slifcak 2006-08-17 13:30:47 EDT
RHEL4 Update4 kernel-smp-2.6.9-42.EL  on Dell 1850 (PERC 4e/Si),
Dell 2850 (PERC 4e/Di) shows no evidence of corruption.
Comment 2 Phil Knirsch 2006-08-22 09:36:40 EDT
Hm, so this seems to have been a kernel bug then which got resolved with RHEL4
Update 4?

Read ya, Phil
Comment 3 Michael J. Slifcak 2006-08-22 11:16:57 EDT
I can apply the 2.6.9-42.EL linux-2.6.9-megaraid-update.patch to another kernel.
Which one would you deem worthy?
Comment 4 Phil Knirsch 2006-08-23 04:20:17 EDT
Could you try to apply that patch to your original kernel that caused the problems?

If that prevents the problems i think we can positively say it was a kernel
driver problem of the megaraid driver. And that the sg-tools just triggered one
bug in the old driver that could have been triggered otherwise, too.

Thanks,

Read ya, Phil
Comment 5 Michael J. Slifcak 2006-08-24 20:48:03 EDT
Applied 2.6.9-42.EL linux-2.6.9-megaraid-update.patch
to 2.6.9-34.0.2.EL source, rebuilt kernel, ran on freshly installed
Dell 2850, which has the PERC 4e/Di, and again on
Dell 1850, which has the PERC 4e/Si.
Confirmed that the disk corruption was no longer produced by
running 'sginfo -s /dev/sda' in a tight loop while transferring
gigabytes from an external source to the filesystem.

Note You need to log in before you can comment on or make changes to this bug.