Bug 512100
Summary: | libaio reports success and returns incorrect data when interleaved with fork() | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Marek Dopiera <marek> | ||||
Component: | kernel | Assignee: | Jeff Moyer <jmoyer> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.2 | CC: | dzickus, l_heldt, lwoodman, marek | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-07-17 13:36:06 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
This sounds like a duplicate of bug 471613. Please try a kernel from here: http://people.redhat.com/dzickus/el5/158.el5/ Thanks, Jeff That's correct, the bug you mention is a duplicate, the kernel from the link above works correctly. Thanks OK, thanks for taking the time to confirm that, and thanks also for the reproducer. While we are addressing this for RHEL 5, the upstream status of this patch is unclear. Linus has rejected the fixes that we are incorporating, and I'm not sure at this point whether the bug will be fixed at all in upstream kernels. If you're looking for ways to avoid this in the future, I would suggest not mixing fork(), pthreads and direct I/O to buffers that are smaller than the host system's page size. In other words, if your reproducer program used getpagesize() sized pages instead of 512 byte pages, you wouldn't see this problem. *** This bug has been marked as a duplicate of bug 471613 *** |
Created attachment 353967 [details] Test program to reproduce the bug. Description of problem: Reads scheduled with libaio sometimes fail when fork()ing simultaneously despite success is being reported. Version-Release number of selected component (if applicable): Checked on Centos 5.2. uname -a Linux w-50 2.6.18-128.2.1.el5 #1 SMP Tue Jul 14 06:36:37 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux libaio: 0.3.106 and on Red Hat Enterprise Linux Server release 5.1 (Tikanga) uname -a: Linux beta12 2.6.18-92.1.6.1 #1 SMP Mon Aug 25 12:16:35 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux libaio: 0.3.106 How reproducible: Interleave fork()'s with submitting reads through libaio. There is a test program attached, which reproduces the bug. Its input should be a 4Gb file filled with zeros. It launches 2 new threads, which do: 1: keeps calling fork() and wait() 2: allocates a buffer, fills it with 0xff and submits reads from the test file with a callback set on completion (loops until the end of file) 3: calls io_queue_run() in a loop to launch callbacks; callbacks check if (in case of no error reported) the buffer contains zeros and exit()s if the condition is not met (with a "corruption found" message") Steps to Reproduce: 1. gcc -o aiotest aiotest.c -lpthread -laio 2. dd if=/dev/zero of=test bs=1048576 count=4096 oflag=direct 3. ./aiotest test Actual results: Terminates with a message "corruption found" (after a nondeterministic amount of time, usually a second). Expected results: Having read the whole 4Gb it should fall into an infinite loop forking and waiting. Additional info: When the loop with fork()'s is replaced with an empty loop, everything works correctly.