Bug 509702

Summary: Implement support for CLONE_IO
Product: [Fedora] Fedora Reporter: Avi Kivity <avi>
Component: glibcAssignee: Andreas Schwab <schwab>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: chrisw, drepper, ehabkost, fweimer, jakub, knoel, markmc, pmuller, schwab, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 02:54:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 498242, 516995    

Description Avi Kivity 2009-07-05 09:51:41 UTC
Description of problem:

The kernel (since 2.6.25) supports a CLONE_IO flag which tells the kernel that the new thread cooperates with the current thread on I/O.  This greatly increases the throughput of a thread pool issuing sequential I/O to a single file when using the CFQ scheduler.

Version-Release number of selected component (if applicable):
glibc-2.10.1-2.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Try to create a thread with CLONE_IO
  
Actual results:
No pthread API

Expected results:
pthread_attr_shareio_np() or something

Additional info:
See also blocked qemu bug.

Comment 1 Mark McLoughlin 2009-07-06 09:07:21 UTC
Note, the qemu bug (bug #498242) is on F12VirtTarget

Comment 2 Ulrich Drepper 2009-07-30 21:30:27 UTC
What are the consequences of adding the flag?  Where will there be POSIX incompabilities if this flag is used?

I ask because if there are none and there are no other drawbacks it should be the default.

Comment 3 Mark McLoughlin 2009-08-07 10:48:15 UTC
Avi?

Comment 4 Avi Kivity 2009-08-11 16:14:52 UTC
There will be no POSIX incompatibility if CLONE_IO is used by default, but there may be severe performance implications.  Consider a threaded database accessing several indices (in response to different queries).  Without CLONE_IO, each thread gets its on IO context and thus a "time slice" of the disk.  This allows sequential clustered accesses to complete rapidly.

On the other hand, with CLONE_IO, requests from a single thread will have no special affinity to each other, and thus requests from all threads will be interspersed with each other.  If the threads issue sequential or clustered requests, they will be forced to seek more than without CLONE_IO.

To avoid these regressions, I recommend having CLONE_IO as an opt-in choice for applications that know that their threads are making unrelated requests.

Comment 5 Ulrich Drepper 2009-09-02 02:54:18 UTC
I talked to Chris Wright today about this.

He explained that this is meant to consolidate IO contexts so that the kernel doesn't wait for more requests to see whether consolidation of requests is possible.  If all the threads use the same context the consecutive requests cause the outstanding requests to be processed.

But this is really a nice side effect.  The kernel doesn't really gets smarter.  It doesn't notice which threads are working on the same files and regions so that requests can be consolidated.  And it doesn't notice when requests don't ever can be consolidated.  Using a single IO context just hides the effects enough.

This is all a detail of the current kernel implementation.  Codifying this in an interface which is has to maintained forever isn't a good idea.

It is likely not a good idea to have more than one IO context for a process.  Chris explained that qemu wants to use the flag for all threads of the thread pool.  And even there is a problem: the flag canot be set for already running threads.


Therefore I suggest an alternative.  Add a new prctl() to select this mode process-wide.  This way all newly created threads will get the support.  And it might even be possible to change all existing threads in a process to revert back to one IO context.

I cannot see a way to formulate all this in a useful way as a thread attribute which makes sense from this point on, even if the kernel IO and thread implementation changes.  Therefore I'm closing this as WONTFIX.