509702 – Implement support for CLONE_IO

Bug 509702 - Implement support for CLONE_IO

Summary: Implement support for CLONE_IO

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	glibc
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Andreas Schwab
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	498242 516995
TreeView+	depends on / blocked

Reported:	2009-07-05 09:51 UTC by Avi Kivity
Modified:	2016-11-24 15:39 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-09-02 02:54:18 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Avi Kivity 2009-07-05 09:51:41 UTC

Description of problem:

The kernel (since 2.6.25) supports a CLONE_IO flag which tells the kernel that the new thread cooperates with the current thread on I/O.  This greatly increases the throughput of a thread pool issuing sequential I/O to a single file when using the CFQ scheduler.

Version-Release number of selected component (if applicable):
glibc-2.10.1-2.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Try to create a thread with CLONE_IO
  
Actual results:
No pthread API

Expected results:
pthread_attr_shareio_np() or something

Additional info:
See also blocked qemu bug.

Comment 1 Mark McLoughlin 2009-07-06 09:07:21 UTC

Note, the qemu bug (bug #498242) is on F12VirtTarget

Comment 2 Ulrich Drepper 2009-07-30 21:30:27 UTC

What are the consequences of adding the flag?  Where will there be POSIX incompabilities if this flag is used?

I ask because if there are none and there are no other drawbacks it should be the default.

Comment 3 Mark McLoughlin 2009-08-07 10:48:15 UTC

Avi?

Comment 4 Avi Kivity 2009-08-11 16:14:52 UTC

There will be no POSIX incompatibility if CLONE_IO is used by default, but there may be severe performance implications.  Consider a threaded database accessing several indices (in response to different queries).  Without CLONE_IO, each thread gets its on IO context and thus a "time slice" of the disk.  This allows sequential clustered accesses to complete rapidly.

On the other hand, with CLONE_IO, requests from a single thread will have no special affinity to each other, and thus requests from all threads will be interspersed with each other.  If the threads issue sequential or clustered requests, they will be forced to seek more than without CLONE_IO.

To avoid these regressions, I recommend having CLONE_IO as an opt-in choice for applications that know that their threads are making unrelated requests.

Comment 5 Ulrich Drepper 2009-09-02 02:54:18 UTC

I talked to Chris Wright today about this.

He explained that this is meant to consolidate IO contexts so that the kernel doesn't wait for more requests to see whether consolidation of requests is possible. If all the threads use the same context the consecutive requests cause the outstanding requests to be processed.

But this is really a nice side effect. The kernel doesn't really gets smarter. It doesn't notice which threads are working on the same files and regions so that requests can be consolidated. And it doesn't notice when requests don't ever can be consolidated. Using a single IO context just hides the effects enough.

This is all a detail of the current kernel implementation. Codifying this in an interface which is has to maintained forever isn't a good idea.

It is likely not a good idea to have more than one IO context for a process. Chris explained that qemu wants to use the flag for all threads of the thread pool. And even there is a problem: the flag canot be set for already running threads.

Therefore I suggest an alternative. Add a new prctl() to select this mode process-wide. This way all newly created threads will get the support. And it might even be possible to change all existing threads in a process to revert back to one IO context.

I cannot see a way to formulate all this in a useful way as a thread attribute which makes sense from this point on, even if the kernel IO and thread implementation changes. Therefore I'm closing this as WONTFIX.

Note You need to log in before you can comment on or make changes to this bug.