Bug 139874

Summary: aacraid driver in 2.1ES causes data corruption on HP 4M RAID card
Product: Red Hat Enterprise Linux 2.1 Reporter: Alan Ferrier <alan.ferrier>
Component: kernelAssignee: Don Howard <dhoward>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: coughlan, riel, shillman, tburke
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: 2.4.9-e.40 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-03 21:35:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 132992    

Description Alan Ferrier 2004-11-18 16:14:34 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5)
Gecko/20041107 Firefox/1.0

Description of problem:
The aacraid driver included in all 2.1 ES kernel versions from
2.4.9-e.24 -> 2.4.9-e.49 causes massive data corruption on HP 4M Raid
controllers (and, we suspect, other hardware relying on this driver.)

We are using Oracle 8i on this kit, and have conducted comprehensive
tests on kernels from 2.4.9-e.3 -> 2.4.9-e.49. Patching the kernel
with the 1.1.4 aacraid driver from Adaptec fixes this issue, but we
note that the problematic 2.1 ES kernels still include a 0.9.9-test6
aacraid version number.

This problem does not occur on Redhat 3ES, but again the aacraid
driver is at a higher patch version.

Version-Release number of selected component (if applicable):
kernel-2.4.9-e.24 and above

How reproducible:
Always

Steps to Reproduce:
dbverify is an Oracle supplied tool that examines Oracle datafiles for
block corruptions.

With kernel-2.4.9-e.24 or higher of Redhat ES 2.1, dbverify fails with
Oracle datablock corruption issues:

DBVERIFY: Release 8.1.7.3.0 - Production on Sat Nov 13 22:47:02 2004

(c) Copyright 2000 Oracle Corporation.  All rights reserved.

DBVERIFY - Verification starting : FILE = /data/ukhot8i/sms01.dbf
Page 3935 is influx - most likely media corrupt
***
Corrupt block relative dba: 0x02400f5f (file 0, block 3935)
Fractured block found during dbv:
Data in bad block -
type: 6 format: 2 rdba: 0x02400f5f
last change scn: 0x0000.00007c5a seq: 0x1 flg: 0x00
consistency value in tail: 0x07706378
check value in block header: 0x0, block checksum disabled
spare1: 0x0, spare2: 0x0, spare3: 0x0
***

The media corruption is sometimes fixed by a filesystem journal
rebuild on reboot, but occasionally this does not resolve the problem
- leaving the Oracle data corrupt. 
    

Additional info:

Comment 1 Tom Coughlan 2004-12-16 20:12:02 UTC
We will investigate this, and will update the driver in U7 as appropriate.

Comment 3 Don Howard 2006-02-03 21:35:13 UTC
It looks like this was corrected prior to e.40

From the pensacola changelog: 

* Fri Jun 18 2004 Jason Baron <jbaron>
- update mpt fusion to version 2.05.16, 2.05.11 to backup (Adam Manthei)
- update ips to v. 7.00.15, 6.11.07 to backup (Adam Manthei)
- update aacraid to 1.1.5-2440, backup 0.9.9 (Adam Manthei)

Note the old driver is still shipped, in drivers/addon.  The more recent driver
can be found in drivers/scsi.