Bug 169154 - Oracle RAC workload using directio on GFS 6.1 causes database corruption.
Summary: Oracle RAC workload using directio on GFS 6.1 causes database corruption.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 4
Hardware: i686
OS: Linux
high
high
Target Milestone: ---
Assignee: Wendy Cheng
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks: 164915
TreeView+ depends on / blocked
 
Reported: 2005-09-23 18:04 UTC by Joseph Salisbury
Modified: 2010-01-12 03:07 UTC (History)
5 users (show)

Fixed In Version: RHEL4U3
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-02-07 14:59:47 UTC
Embargoed:


Attachments (Terms of Use)
Patch set 3-1: kernel_gfs_lock.patch (1.58 KB, patch)
2005-10-21 06:55 UTC, Wendy Cheng
no flags Details | Diff
Patch set 3-2: gfs_i_sem.patch (1.05 KB, patch)
2005-10-21 06:57 UTC, Wendy Cheng
no flags Details | Diff
Patch set 3-3: gfs_dio_sync.patch (483 bytes, patch)
2005-10-21 06:59 UTC, Wendy Cheng
no flags Details | Diff

Description Joseph Salisbury 2005-09-23 18:04:53 UTC
Description of problem:
Running an Oracle RAC workload using directio on GFS 6.1 causes database corruption.


Version-Release number of selected component (if applicable):
Oracle 10gR1, RHEL4U2, GFS 6.1

How reproducible:
Start and Oracle database with directio.  The database datafiles must be on a
shared GFS partition.

Steps to Reproduce:
1. Build some type of Oracle database on GFS with Oracle 10gR1.
2. Start database with directio.
3. Run a workload and watch database crash with ora-600 errors.  Database is now
corrupt.
  
Actual results:
Database corruption

Expected results:
No corruption

Additional info:
Also seeing similar issues with directio on RHEL3 with GFS 6.0.  See bug: 167962
System is a four node cluster(HP DL590s with intel x86).  The systems have 16G
of memory.  No issues are seen when using asyncio or syncio.

Comment 3 Wendy Cheng 2005-09-26 14:39:52 UTC
I don't know much about Oracle RAC - few questions here:

1. Does this problem occur with one node GFS ? Or does it only occur with
multiple nodes running Oracle workload at the same time ?
2. If latter, do we know any file access pattern ? Say, one node doing write and
other nodes doing read or multiple writes ?
3. What kind of corruption ? Data is not completely written when write returned
? Or the data just simply contain garbage ?   

Comment 5 Joseph Salisbury 2005-09-26 15:57:20 UTC
1.  I'll try out the one node test on our cluster now.    
2.  The problem occurs even when the database is starting up.  At that point all
the nodes are opening files and performing reads.
3.  The corruption is at the Oracle level.  Oracle wrote some data in the past.
 The data is not what Oracle expects when it is read in the future.

Comment 6 Joseph Salisbury 2005-09-26 15:59:03 UTC
I'm also gathering strace and Oracle trace information, which will be posted.

Comment 7 Wendy Cheng 2005-09-26 16:14:14 UTC
I suspect gfs_direct_IO() doesn't flush all the data into the disk for some
unkown reasons (i.e. we have short writes). If you can prove from the Oracle
side, it would be a great help. Otherwise, I'm preparing a trace kernel for you
to run.

Also if you can recreate this issue on a single node, it would be super helpful. 



Comment 11 Joseph Salisbury 2005-10-05 18:41:52 UTC
We were able to reproduce data corruption using the debug kernel.  The following
error messages were in the messages file:

Oct  5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer
IO, written=0, count=1032192
Oct  5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer
IO, written=0, count=1048576
Oct  5 13:25:03 war last message repeated 11 times
Oct  5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer
IO, written=0, count=81920
Oct  5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer
IO, written=0, count=1032192
Oct  5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer
IO, written=0, count=1048576
Oct  5 13:25:03 war last message repeated 11 times
Oct  5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer
IO, written=0, count=81920


Oracle reported the following error:
Controlfile created with 836 blocks of size 16384 bytes
Informational message:
Controlfile 0 with seq# 2 has a fractured block, blk# 793
Informational message:
Controlfile 0 with seq# 2 has a fractured block, blk# 793
Informational message:
Controlfile 1 with seq# 2 has a fractured block, blk# 793
Informational message:
Controlfile 1 with seq# 2 has a fractured block, blk# 793
Hex dump of (file 0, block 793) in trace file /mnt/oracle_base/product/10g_R1_rd
bms/rdbms/log/war_ora_14471.trc
Corrupt block relative dba: 0x00000319 (file 0, block 793)
Completely zero block found during controlfile block read


This problem occurs when a create database is performed without having a
controlfile already on the GFS filesystem.

Comment 17 Wendy Cheng 2005-10-21 06:55:44 UTC
Created attachment 120237 [details]
Patch set 3-1: kernel_gfs_lock.patch

Base kernel patch for GFS read/write deadlock issue.

Comment 18 Wendy Cheng 2005-10-21 06:57:43 UTC
Created attachment 120238 [details]
Patch set 3-2: gfs_i_sem.patch

GFS-kernel fix for direct IO read/write deadlock.

Comment 19 Wendy Cheng 2005-10-21 06:59:28 UTC
Created attachment 120239 [details]
Patch set 3-3: gfs_dio_sync.patch

Sync dio write buffer into storage - this fixes the data corruption bug.

Comment 20 Wendy Cheng 2005-10-21 07:01:40 UTC
The above three patches fix all the problems I know of about GFS dio on RHEL 4.
Will do a small write-up and discuss the build and release issue with GFS team. 

Comment 24 Wendy Cheng 2005-11-30 14:46:23 UTC
Code checked into CVS and I would consider this issue closed. 


Note You need to log in before you can comment on or make changes to this bug.