Bug 655096 - sort eats CPU when used with pipe
Summary: sort eats CPU when used with pipe
Alias: None
Product: Fedora
Classification: Fedora
Component: coreutils
Version: rawhide
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: Ondrej Vasik
QA Contact: Fedora Extras Quality Assurance
Depends On:
TreeView+ depends on / blocked
Reported: 2010-11-19 15:44 UTC by Zdenek Kabelac
Modified: 2010-12-23 13:26 UTC (History)
6 users (show)

Fixed In Version: coreutils-8.8-1.fc15
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2010-12-23 13:26:31 UTC
Type: ---

Attachments (Terms of Use)

Description Zdenek Kabelac 2010-11-19 15:44:48 UTC
Description of problem:

Not sure where the problem is - but my kernel 2.6.37-rc2 and 

and some simple   'sort file | less'  are able to eat 100% cpu 
on my machine -   here is  'bt' from gdb attached at loop moment

(gdb) bt
#0  0x00007f37374fe2bd in write () from /lib64/libc.so.6
#1  0x00007f3737499943 in _IO_new_file_write () from /lib64/libc.so.6
#2  0x00007f373749958a in new_do_write () from /lib64/libc.so.6
#3  0x00007f373749aaf5 in _IO_new_do_write () from /lib64/libc.so.6
#4  0x00007f37374996ed in _IO_new_file_xsputn () from /lib64/libc.so.6
#5  0x00007f3737498d45 in fwrite_unlocked () from /lib64/libc.so.6
#6  0x0000000000403976 in write_line (line=<value optimized out>, fp=<value optimized out>, output_file=0x0) at sort.c:3301
#7  0x000000000040507c in mergelines_node (temp_output=0x0, tfp=0x7f37377c48c0, total_lines=627, node=0x7fff3f35c210) at sort.c:3888
#8  merge_loop (temp_output=0x0, tfp=0x7f37377c48c0, total_lines=627, queue=0x7fff3f35c710) at sort.c:3974
#9  sortlines (lines=<value optimized out>, dest=<value optimized out>, nthreads=140734253877056, total_lines=627, parent=<value optimized out>, 
    lo_child=<value optimized out>, merge_queue=0x7fff3f35c710, tfp=0x7f37377c48c0, temp_output=0x0) at sort.c:4088
#10 0x0000000000405448 in sortlines (lines=0x7f3731595330, dest=<value optimized out>, nthreads=2, total_lines=627, parent=<value optimized out>, 
    lo_child=<value optimized out>, merge_queue=0x7fff3f35c710, tfp=0x7f37377c48c0, temp_output=0x0) at sort.c:4067
#11 0x000000000040b81d in sort (nthreads=2, output_file=0x0, nfiles=0, files=0x12673e8) at sort.c:4374
#12 main (argc=<value optimized out>, argv=0x12674b0) at sort.c:5191

Version-Release number of selected component (if applicable):
vanilla 2.6.37-rc2  - I hope  sort is not depending on some new feature to be builtin into kernel. 

How reproducible:

Steps to Reproduce:
1.  sort file | less
Actual results:

Expected results:

Additional info:

Comment 1 Zdenek Kabelac 2010-11-19 16:06:51 UTC
I've downgraded to version: coreutils-8.5-10.fc15.x86_64
which seem to be the last properly working release.

Release coreutils-8.6-1.fc15.x86_64  is the first broken one for F15.

Comment 2 Ondrej Vasik 2010-11-19 16:14:50 UTC
One more thing - file is not just common file (Zdenek is able to reproduce on x86_64 T61 machine it with /var/log/messages , but not with e.g. /etc/passwd) ... I was not able to reproduce on my i686 T60 machine, though ...

Comment 3 Pádraig Brady 2010-11-22 10:09:06 UTC
Does spin forever?
If not, how much longer does sort-8.7 take than sort-8.5?
Does adding the --parallel=1 option change things?

Comment 4 Zdenek Kabelac 2010-11-22 11:06:03 UTC
Yes, it's spinning forever as long as it waits for 'less' to print more text.

i.e. a workaround is this:   sort file | tee /dev/null | less
though it's not really practical....

And positive seems to be adding  '--parallel=1' at it fixes the problem.

Comment 5 Pádraig Brady 2010-11-22 11:52:51 UTC
Drats. I'd been worried about that. Pity my laptop isn't multicore :(

You could try changing spinlocks to mutexes with:
and configure with HAVE_PTHREAD_SPINLOCK_T undefined.
Or you could manually put the following at the top of src/sort.c:

typedef pthread_mutex_t pthread_spinlock_t;

static inline int
pthread_spin_init (pthread_spinlock_t *lock, int pshared)
  return pthread_mutex_init (lock, NULL);

static inline int
pthread_spin_destroy (pthread_spinlock_t *lock)
  return pthread_mutex_destroy (lock);

static inline int
pthread_spin_lock (pthread_spinlock_t *lock)
  return pthread_mutex_lock (lock);

static inline int
pthread_spin_trylock (pthread_spinlock_t *lock)
  return pthread_mutex_trylock (lock);

static inline int
pthread_spin_unlock (pthread_spinlock_t *lock)
  return pthread_mutex_unlock (lock);

Comment 6 Zdenek Kabelac 2010-11-22 13:04:11 UTC
Here is actual backtrace from the spinlocked thread from current 

(gdb) bt
#0  0x00007fd4275cd425 in ?? () from /lib64/libpthread.so.0
#1  0x0000000000404fde in lock_node (node=<value optimized out>) at sort.c:3766
#2  update_parent (queue=0x7fff8098f2a0, merged=<value optimized out>, node=0x7fd3db4d5da0) at sort.c:3943
#3  merge_loop (temp_output=0x0, tfp=0x7fd4275ba8c0, total_lines=257991, queue=0x7fff8098f2a0) at sort.c:3977
#4  sortlines (lines=<value optimized out>, dest=<value optimized out>, nthreads=140735350895272, total_lines=257991, parent=<value optimized out>, 
    lo_child=<value optimized out>, merge_queue=0x7fff8098f2a0, tfp=0x7fd4275ba8c0, temp_output=0x0) at sort.c:4088
#5  0x00000000004054ab in sortlines_thread (data=<value optimized out>) at sort.c:4011
#6  0x00007fd4275c7d5b in start_thread () from /lib64/libpthread.so.0
#7  0x00007fd427301aad in clone () from /lib64/libc.so.6

btw - it's very nicely visible i.e. on quad core that even though you make a real time processing fast -  user time is actually much higher.
(i.e. on larger file    22s real&user  single threaded ->  13/47s multi core solution - so in effect code takes twice as much CPU at current implementation.)

I think it is not a good plan to 'randomly' replace  spinlock with mutex - that's not how pthread works - I would expect some conditional waits somewhere.

Oh and btw. I really don't understand how such patch could have been acked....

Comment 7 Pádraig Brady 2010-11-22 14:11:22 UTC
> Oh and btw. I really don't understand how such patch could have been acked

Mea culpa. I've no dedicated multicore hardware, but mainly no time.
Full performance testing is on my todo list (you could help here :))
I thought it better to release early to get varied testing,
given I was happy that it was functionally correct.

Comment 8 Ondrej Vasik 2010-12-23 13:26:31 UTC
coreutils-8.8-1.fc15 with various upstream fixes for parallel sorting built, closing RAWHIDE.

Note You need to log in before you can comment on or make changes to this bug.