Bug 509702 - Implement support for CLONE_IO
Implement support for CLONE_IO
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Andreas Schwab
Fedora Extras Quality Assurance
Depends On:
Blocks: 516995 498242
  Show dependency treegraph
Reported: 2009-07-05 05:51 EDT by Avi Kivity
Modified: 2016-11-24 10:39 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-09-01 22:54:18 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Avi Kivity 2009-07-05 05:51:41 EDT
Description of problem:

The kernel (since 2.6.25) supports a CLONE_IO flag which tells the kernel that the new thread cooperates with the current thread on I/O.  This greatly increases the throughput of a thread pool issuing sequential I/O to a single file when using the CFQ scheduler.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Try to create a thread with CLONE_IO
Actual results:
No pthread API

Expected results:
pthread_attr_shareio_np() or something

Additional info:
See also blocked qemu bug.
Comment 1 Mark McLoughlin 2009-07-06 05:07:21 EDT
Note, the qemu bug (bug #498242) is on F12VirtTarget
Comment 2 Ulrich Drepper 2009-07-30 17:30:27 EDT
What are the consequences of adding the flag?  Where will there be POSIX incompabilities if this flag is used?

I ask because if there are none and there are no other drawbacks it should be the default.
Comment 3 Mark McLoughlin 2009-08-07 06:48:15 EDT
Comment 4 Avi Kivity 2009-08-11 12:14:52 EDT
There will be no POSIX incompatibility if CLONE_IO is used by default, but there may be severe performance implications.  Consider a threaded database accessing several indices (in response to different queries).  Without CLONE_IO, each thread gets its on IO context and thus a "time slice" of the disk.  This allows sequential clustered accesses to complete rapidly.

On the other hand, with CLONE_IO, requests from a single thread will have no special affinity to each other, and thus requests from all threads will be interspersed with each other.  If the threads issue sequential or clustered requests, they will be forced to seek more than without CLONE_IO.

To avoid these regressions, I recommend having CLONE_IO as an opt-in choice for applications that know that their threads are making unrelated requests.
Comment 5 Ulrich Drepper 2009-09-01 22:54:18 EDT
I talked to Chris Wright today about this.

He explained that this is meant to consolidate IO contexts so that the kernel doesn't wait for more requests to see whether consolidation of requests is possible.  If all the threads use the same context the consecutive requests cause the outstanding requests to be processed.

But this is really a nice side effect.  The kernel doesn't really gets smarter.  It doesn't notice which threads are working on the same files and regions so that requests can be consolidated.  And it doesn't notice when requests don't ever can be consolidated.  Using a single IO context just hides the effects enough.

This is all a detail of the current kernel implementation.  Codifying this in an interface which is has to maintained forever isn't a good idea.

It is likely not a good idea to have more than one IO context for a process.  Chris explained that qemu wants to use the flag for all threads of the thread pool.  And even there is a problem: the flag canot be set for already running threads.

Therefore I suggest an alternative.  Add a new prctl() to select this mode process-wide.  This way all newly created threads will get the support.  And it might even be possible to change all existing threads in a process to revert back to one IO context.

I cannot see a way to formulate all this in a useful way as a thread attribute which makes sense from this point on, even if the kernel IO and thread implementation changes.  Therefore I'm closing this as WONTFIX.

Note You need to log in before you can comment on or make changes to this bug.