Description of problem: Hi, I am using O_DIRECT to read regular files under ext3, with no special mount options. It works fine except for the final read when the remaining file size is less than pagesize. In this case 0 byte is returned instead of the actual number of bytes. The program I used is diotest.c: #include <stdio.h> #include <fcntl.h> #include <errno.h> main() { char *message; int fd = open("/work/xiaojun/oxx", O_RDONLY|O_DIRECT); int len, pagesize = getpagesize(); printf("pagesize = %d.\n", pagesize); posix_memalign((void **)&message, pagesize, pagesize); if(fd < 0) { printf("Unable to open file, errno is %d.\n", errno); } else { while((len = read(fd, message, pagesize)) > 0) { printf("%d bytes read from file.\n", len); } if(len < 0) { perror("read"); } else { printf("%d bytes read from file.\n", len); } } close(fd); } The file /work/xiaojun/oxx is 5587 bytes. Version-Release number of selected component (if applicable): kernel-2.4.21-20.0.1Elsmp How reproducible: Every time Steps to Reproduce: 1. Modify diotest.c to change file name to an existing file with size bigger than 4K but less than 8K. 2. gcc -D_GNU_SOURCE diotest.c 3. ./a.out 4. For comparison, remove the O_DIRECT flag from open() and repeat steps 2 and 3 above. Actual results: pagesize = 4096. 4096 bytes read from file. 0 bytes read from file. Expected results: pagesize = 4096. 4096 bytes read from file. 1491 bytes read from file. 0 bytes read from file. Additional info:
Any movement on this?
O_DIRECT can only do block aligned reads in multiples of the block size. Since the last portion of the file is not a multiple of the block size (or, in the 2.6 kernel, a multiple of 512 bytes), O_DIRECT cannot do that read. To quote the man page for read(2): O_DIRECT Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The I/O is synchronous, i.e., at the completion of the read(2) or write(2) system call, data is guaranteed to have been trans- ferred. Under Linux 2.4 transfer sizes, and the alignment of user buffer and file offset must all be multiples of the logical block size of the file system. Under Linux 2.6 alignment to 512-byte boundaries suffices. A semantically similar interface for block devices is described in raw(8). Since there are programs that depend on exactly this behaviour, I suspect this cannot be changed.