Description of problem: Running an Oracle RAC workload using directio on GFS 6.1 causes database corruption. Version-Release number of selected component (if applicable): Oracle 10gR1, RHEL4U2, GFS 6.1 How reproducible: Start and Oracle database with directio. The database datafiles must be on a shared GFS partition. Steps to Reproduce: 1. Build some type of Oracle database on GFS with Oracle 10gR1. 2. Start database with directio. 3. Run a workload and watch database crash with ora-600 errors. Database is now corrupt. Actual results: Database corruption Expected results: No corruption Additional info: Also seeing similar issues with directio on RHEL3 with GFS 6.0. See bug: 167962 System is a four node cluster(HP DL590s with intel x86). The systems have 16G of memory. No issues are seen when using asyncio or syncio.
I don't know much about Oracle RAC - few questions here: 1. Does this problem occur with one node GFS ? Or does it only occur with multiple nodes running Oracle workload at the same time ? 2. If latter, do we know any file access pattern ? Say, one node doing write and other nodes doing read or multiple writes ? 3. What kind of corruption ? Data is not completely written when write returned ? Or the data just simply contain garbage ?
1. I'll try out the one node test on our cluster now. 2. The problem occurs even when the database is starting up. At that point all the nodes are opening files and performing reads. 3. The corruption is at the Oracle level. Oracle wrote some data in the past. The data is not what Oracle expects when it is read in the future.
I'm also gathering strace and Oracle trace information, which will be posted.
I suspect gfs_direct_IO() doesn't flush all the data into the disk for some unkown reasons (i.e. we have short writes). If you can prove from the Oracle side, it would be a great help. Otherwise, I'm preparing a trace kernel for you to run. Also if you can recreate this issue on a single node, it would be super helpful.
We were able to reproduce data corruption using the debug kernel. The following error messages were in the messages file: Oct 5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer IO, written=0, count=1032192 Oct 5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer IO, written=0, count=1048576 Oct 5 13:25:03 war last message repeated 11 times Oct 5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer IO, written=0, count=81920 Oct 5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer IO, written=0, count=1032192 Oct 5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer IO, written=0, count=1048576 Oct 5 13:25:03 war last message repeated 11 times Oct 5 13:25:03 war kernel: __generic_file_aio_write_nolock falls back to buffer IO, written=0, count=81920 Oracle reported the following error: Controlfile created with 836 blocks of size 16384 bytes Informational message: Controlfile 0 with seq# 2 has a fractured block, blk# 793 Informational message: Controlfile 0 with seq# 2 has a fractured block, blk# 793 Informational message: Controlfile 1 with seq# 2 has a fractured block, blk# 793 Informational message: Controlfile 1 with seq# 2 has a fractured block, blk# 793 Hex dump of (file 0, block 793) in trace file /mnt/oracle_base/product/10g_R1_rd bms/rdbms/log/war_ora_14471.trc Corrupt block relative dba: 0x00000319 (file 0, block 793) Completely zero block found during controlfile block read This problem occurs when a create database is performed without having a controlfile already on the GFS filesystem.
Created attachment 120237 [details] Patch set 3-1: kernel_gfs_lock.patch Base kernel patch for GFS read/write deadlock issue.
Created attachment 120238 [details] Patch set 3-2: gfs_i_sem.patch GFS-kernel fix for direct IO read/write deadlock.
Created attachment 120239 [details] Patch set 3-3: gfs_dio_sync.patch Sync dio write buffer into storage - this fixes the data corruption bug.
The above three patches fix all the problems I know of about GFS dio on RHEL 4. Will do a small write-up and discuss the build and release issue with GFS team.
Code checked into CVS and I would consider this issue closed.