Red Hat Bugzilla – Bug 161985
O_DIRECT on RHEL v4 may not return correct number of bytes when concurrent I/O
Last modified: 2007-11-30 17:07:18 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6) Gecko/20050225 Firefox/1.0.1
Description of problem:
We discovered a major flaw with O_DIRECT on RHEL v4 on EXT3 that did not exist on previous versions of RHEL.
We have reproduced this on several machines, and can resolve the problem by switching back to RHEL v3 from RHEL v4. The rest of the environment remains the same, and the problem only occurs under RHEL v4.
The problem is that when there is concurrent I/O on a file that was opened with O_DIRECT, then a pread() of n bytes by another process on the same file may only return part of the requested n bytes.
Additionally, a pread() call may also return an error (errno is set to 5).
We have temporarily hacked around this bug, by getting the second process to retry pread calls that do not return the correct number of bytes. But, this is obviously a temporary solution.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Open a file with O_DIRECT
2. Open the same file with a second process.
3. Do a pread() of n bytes in the second process while the first process continues to read and write to the file.
Actual Results: The second process will sometimes read the correct number of bytes, sometimes read an incorrect number of bytes, and sometimes return an error message.
We actually discovered this running the MySQL database server with O_DIRECT, and using their Hot Backup software to copy the files live. We have tested this extensively with the MySQL support team, and have determined that the problem only happens when the MySQL server starts with O_DIRECT on RHEL v4.
I don't suppose that anyone has written a program which reproduces the problem?
That would help me to do the analysis plus to verify the changes when they
I really do need some sort of testcase which shows the problem. Without it,
I am not sure what I can do. Would you let me know whether there is something
that can be used?
It seems a little odd to be accessing something like a database using
O_DIRECT, but backing it up while not using O_DIRECT. O_DIRECT avoids
the use of the page cache, while regular i/o uses the page cache, ie.
a copy of the data from the disk, but not necessarily the most recent
copy, due to i/o occuring which does not update the page cache. Shouldn't
the two applications be using the same kind of i/o and perhaps also be
using something to synchronize between the two?
We are running the MySQL database server with the O_DIRECT option to enhance
performance. The server runs fine without a problem.
We use the InnoDB Hot Backup utility, so that we can make live backups
consistently without having to take the database server down. It does not use
Mixed O_DIRECT and normal file I/O seems to work ok in earlier Red Hat distros
than RHEL 4, and similar types of I/O work ok in Windows NT/2000/XP.
Every single machine that we upgrade from RHEL 3 (or earlier versions) has the
problem when we upgrade to RHEL 4, so it seems to only be a problem there.
MySQL says that they could optionally make ibbackup (the Hot Backup program) use
O_DIRECT to read the files. But, they have to test if THAT does work. Using
synchronization between mysqld and ibbackup is difficult.
Red Hat should document in the manual if a file cannot be accessed with normal
file I/O when it has been opened with O_DIRECT by another process, if this
behavior is intentional. There does not seem to be any indication of this in
the man pages currently.
And again, this is only a problem on RHEL 4.
No, there is no reason that a particular file can not be accessed via
buffered and direct i/o at the sametime. As Stephen pointed out, the
contents returned are undefined when one process is writing using direct
i/o to the same region that another process is reading using buffered
i/o, but that, of course, is the same situation which occurs even if
the two processes are both using direct i/o.
I am looking into this race.