Bug 655096

Summary:	sort eats CPU when used with pipe
Product:	[Fedora] Fedora	Reporter:	Zdenek Kabelac <zkabelac>
Component:	coreutils	Assignee:	Ondrej Vasik <ovasik>
Status:	CLOSED RAWHIDE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	low
Version:	rawhide	CC:	aquini, kdudka, meyering, ovasik, p, twaugh
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	coreutils-8.8-1.fc15	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-12-23 13:26:31 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Zdenek Kabelac 2010-11-19 15:44:48 UTC

Description of problem:

Not sure where the problem is - but my kernel 2.6.37-rc2 and 
coreutils-8.7-1.fc15.x86_64

and some simple   'sort file | less'  are able to eat 100% cpu 
on my machine -   here is  'bt' from gdb attached at loop moment


(gdb) bt
#0  0x00007f37374fe2bd in write () from /lib64/libc.so.6
#1  0x00007f3737499943 in _IO_new_file_write () from /lib64/libc.so.6
#2  0x00007f373749958a in new_do_write () from /lib64/libc.so.6
#3  0x00007f373749aaf5 in _IO_new_do_write () from /lib64/libc.so.6
#4  0x00007f37374996ed in _IO_new_file_xsputn () from /lib64/libc.so.6
#5  0x00007f3737498d45 in fwrite_unlocked () from /lib64/libc.so.6
#6  0x0000000000403976 in write_line (line=<value optimized out>, fp=<value optimized out>, output_file=0x0) at sort.c:3301
#7  0x000000000040507c in mergelines_node (temp_output=0x0, tfp=0x7f37377c48c0, total_lines=627, node=0x7fff3f35c210) at sort.c:3888
#8  merge_loop (temp_output=0x0, tfp=0x7f37377c48c0, total_lines=627, queue=0x7fff3f35c710) at sort.c:3974
#9  sortlines (lines=<value optimized out>, dest=<value optimized out>, nthreads=140734253877056, total_lines=627, parent=<value optimized out>, 
    lo_child=<value optimized out>, merge_queue=0x7fff3f35c710, tfp=0x7f37377c48c0, temp_output=0x0) at sort.c:4088
#10 0x0000000000405448 in sortlines (lines=0x7f3731595330, dest=<value optimized out>, nthreads=2, total_lines=627, parent=<value optimized out>, 
    lo_child=<value optimized out>, merge_queue=0x7fff3f35c710, tfp=0x7f37377c48c0, temp_output=0x0) at sort.c:4067
#11 0x000000000040b81d in sort (nthreads=2, output_file=0x0, nfiles=0, files=0x12673e8) at sort.c:4374
#12 main (argc=<value optimized out>, argv=0x12674b0) at sort.c:5191

Version-Release number of selected component (if applicable):
coreutils-8.7-1.fc15.x86_64
vanilla 2.6.37-rc2  - I hope  sort is not depending on some new feature to be builtin into kernel. 

How reproducible:


Steps to Reproduce:
1.  sort file | less
2.
3.
  
Actual results:
looping

Expected results:
sleeping

Additional info:

Comment 1 Zdenek Kabelac 2010-11-19 16:06:51 UTC

I've downgraded to version: coreutils-8.5-10.fc15.x86_64
which seem to be the last properly working release.


Release coreutils-8.6-1.fc15.x86_64  is the first broken one for F15.

Comment 2 Ondrej Vasik 2010-11-19 16:14:50 UTC

One more thing - file is not just common file (Zdenek is able to reproduce on x86_64 T61 machine it with /var/log/messages , but not with e.g. /etc/passwd) ... I was not able to reproduce on my i686 T60 machine, though ...

Comment 3 Pádraig Brady 2010-11-22 10:09:06 UTC

Does spin forever?
If not, how much longer does sort-8.7 take than sort-8.5?
Does adding the --parallel=1 option change things?

Comment 4 Zdenek Kabelac 2010-11-22 11:06:03 UTC

Yes, it's spinning forever as long as it waits for 'less' to print more text.

i.e. a workaround is this:   sort file | tee /dev/null | less
though it's not really practical....

And positive seems to be adding  '--parallel=1' at it fixes the problem.

Comment 5 Pádraig Brady 2010-11-22 11:52:51 UTC

Drats. I'd been worried about that. Pity my laptop isn't multicore :(
http://lists.gnu.org/archive/html/bug-gnulib/2010-09/msg00361.html

You could try changing spinlocks to mutexes with:
ftp://ftp.gnu.org/gnu/coreutils/coreutils-8.7.tar.xz
and configure with HAVE_PTHREAD_SPINLOCK_T undefined.
Or you could manually put the following at the top of src/sort.c:


typedef pthread_mutex_t pthread_spinlock_t;

static inline int
pthread_spin_init (pthread_spinlock_t *lock, int pshared)
{
  return pthread_mutex_init (lock, NULL);
}

static inline int
pthread_spin_destroy (pthread_spinlock_t *lock)
{
  return pthread_mutex_destroy (lock);
}

static inline int
pthread_spin_lock (pthread_spinlock_t *lock)
{
  return pthread_mutex_lock (lock);
}

static inline int
pthread_spin_trylock (pthread_spinlock_t *lock)
{
  return pthread_mutex_trylock (lock);
}

static inline int
pthread_spin_unlock (pthread_spinlock_t *lock)
{
  return pthread_mutex_unlock (lock);
}

Comment 6 Zdenek Kabelac 2010-11-22 13:04:11 UTC

Here is actual backtrace from the spinlocked thread from current 
coreutils-8.7-2.fc15.x86_64


(gdb) bt
#0  0x00007fd4275cd425 in ?? () from /lib64/libpthread.so.0
#1  0x0000000000404fde in lock_node (node=<value optimized out>) at sort.c:3766
#2  update_parent (queue=0x7fff8098f2a0, merged=<value optimized out>, node=0x7fd3db4d5da0) at sort.c:3943
#3  merge_loop (temp_output=0x0, tfp=0x7fd4275ba8c0, total_lines=257991, queue=0x7fff8098f2a0) at sort.c:3977
#4  sortlines (lines=<value optimized out>, dest=<value optimized out>, nthreads=140735350895272, total_lines=257991, parent=<value optimized out>, 
    lo_child=<value optimized out>, merge_queue=0x7fff8098f2a0, tfp=0x7fd4275ba8c0, temp_output=0x0) at sort.c:4088
#5  0x00000000004054ab in sortlines_thread (data=<value optimized out>) at sort.c:4011
#6  0x00007fd4275c7d5b in start_thread () from /lib64/libpthread.so.0
#7  0x00007fd427301aad in clone () from /lib64/libc.so.6

btw - it's very nicely visible i.e. on quad core that even though you make a real time processing fast -  user time is actually much higher.
(i.e. on larger file    22s real&user  single threaded ->  13/47s multi core solution - so in effect code takes twice as much CPU at current implementation.)

I think it is not a good plan to 'randomly' replace  spinlock with mutex - that's not how pthread works - I would expect some conditional waits somewhere.

Oh and btw. I really don't understand how such patch could have been acked....

Comment 7 Pádraig Brady 2010-11-22 14:11:22 UTC

> Oh and btw. I really don't understand how such patch could have been acked

Mea culpa. I've no dedicated multicore hardware, but mainly no time.
Full performance testing is on my todo list (you could help here :))
http://lists.gnu.org/archive/html/bug-coreutils/2010-07/msg00144.html
I thought it better to release early to get varied testing,
given I was happy that it was functionally correct.

Comment 8 Ondrej Vasik 2010-12-23 13:26:31 UTC

coreutils-8.8-1.fc15 with various upstream fixes for parallel sorting built, closing RAWHIDE.