Bug 18022

Summary: Database corruption on RAW devices
Product: [Retired] Red Hat Linux Reporter: Markus Doehr <doehrm>
Component: kernelAssignee: Stephen Tweedie <sct>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: johnsonm
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2000-12-01 18:53:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Markus Doehr 2000-10-01 13:59:25 UTC
We're currently in the process switching our SAP R/3 systems over to 
linux. Database is SAPDB (former ADABAS). The size is about 125 GB. For 
storing RAW DEVSPACES are used (2 x RAID-5, 1 x RAID-1).

We have problems with the database under high I/O load. The kernel doesn't 
report any hardware error (sync problems or the like) but the database 
often dumps with signal 11 doing a 'BAD DATA PAGE' on the RAW device what 
implies a full recover. No oops or hardware related errors to /dev/ida are 
found in /var/log/messages.

Used kernel:

[root@crmprod /root]# cat /proc/version
Linux version 2.2.16-22SAPenterprise (root.redhat.com) (gcc 
version
egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 SMP Wed Aug 30 
14:53:33 CEST 2000

The Redhat version used is 6.1EE order on http://erponlinux.com

Just curious if there are any known issues with RAW devices that are 
bigger than 2 GB and their behaviour under high/very high load. Due to the 
fact, that we have this problem on two nearly identical machines I assume 
that there's no hardware problem with the controller.

The used hardware is Compaq ProLiant 5500 with 1.8 GB of RAM and 3way P-II 
450 Xeon. SCSI-Controller is a Compaq SmartArray-3200 with 55 MB RAM. This 
machine is certified by Redhat. We already switched the controller cache 
from 50 % READ/50 % WRITE to 100 % READ/0 % WRITE to avoid data loss but 
didn't come any further.

Since this is a productive system it's crucial for us getting things up. 
This error was also reported to SAP under message # 0120050409 553748

Please tell me if you need additional information.

I'm glad to hear from you.

Comment 1 Stephen Tweedie 2000-12-01 18:52:59 UTC
I have uploaded a new set of raw IO patches in kiobuf-2.2.18pre24.tar.gz on

	ftp.uk.linux.org:/pub/linux/sct/fs/raw-io
and	ftp.*.kernel.org:/pub/linux/kernel/people/sct/raw-io

This includes fixes for all known raw IO bugs, including one known to cause
possible data corruption on SAP databases under high load.

These patches are relative to the 2.2.18pre24 kernel.  Could you try them?

Comment 2 Stephen Tweedie 2001-01-12 13:42:04 UTC
We've got a set of raw IO patches which fix apparent problems with SAP, and
which have passed basic qualification within SAP.  These will be in the next
errata kernel.