Description of problem: Not sure where the problem is - but my kernel 2.6.37-rc2 and coreutils-8.7-1.fc15.x86_64 and some simple 'sort file | less' are able to eat 100% cpu on my machine - here is 'bt' from gdb attached at loop moment (gdb) bt #0 0x00007f37374fe2bd in write () from /lib64/libc.so.6 #1 0x00007f3737499943 in _IO_new_file_write () from /lib64/libc.so.6 #2 0x00007f373749958a in new_do_write () from /lib64/libc.so.6 #3 0x00007f373749aaf5 in _IO_new_do_write () from /lib64/libc.so.6 #4 0x00007f37374996ed in _IO_new_file_xsputn () from /lib64/libc.so.6 #5 0x00007f3737498d45 in fwrite_unlocked () from /lib64/libc.so.6 #6 0x0000000000403976 in write_line (line=<value optimized out>, fp=<value optimized out>, output_file=0x0) at sort.c:3301 #7 0x000000000040507c in mergelines_node (temp_output=0x0, tfp=0x7f37377c48c0, total_lines=627, node=0x7fff3f35c210) at sort.c:3888 #8 merge_loop (temp_output=0x0, tfp=0x7f37377c48c0, total_lines=627, queue=0x7fff3f35c710) at sort.c:3974 #9 sortlines (lines=<value optimized out>, dest=<value optimized out>, nthreads=140734253877056, total_lines=627, parent=<value optimized out>, lo_child=<value optimized out>, merge_queue=0x7fff3f35c710, tfp=0x7f37377c48c0, temp_output=0x0) at sort.c:4088 #10 0x0000000000405448 in sortlines (lines=0x7f3731595330, dest=<value optimized out>, nthreads=2, total_lines=627, parent=<value optimized out>, lo_child=<value optimized out>, merge_queue=0x7fff3f35c710, tfp=0x7f37377c48c0, temp_output=0x0) at sort.c:4067 #11 0x000000000040b81d in sort (nthreads=2, output_file=0x0, nfiles=0, files=0x12673e8) at sort.c:4374 #12 main (argc=<value optimized out>, argv=0x12674b0) at sort.c:5191 Version-Release number of selected component (if applicable): coreutils-8.7-1.fc15.x86_64 vanilla 2.6.37-rc2 - I hope sort is not depending on some new feature to be builtin into kernel. How reproducible: Steps to Reproduce: 1. sort file | less 2. 3. Actual results: looping Expected results: sleeping Additional info:
I've downgraded to version: coreutils-8.5-10.fc15.x86_64 which seem to be the last properly working release. Release coreutils-8.6-1.fc15.x86_64 is the first broken one for F15.
One more thing - file is not just common file (Zdenek is able to reproduce on x86_64 T61 machine it with /var/log/messages , but not with e.g. /etc/passwd) ... I was not able to reproduce on my i686 T60 machine, though ...
Does spin forever? If not, how much longer does sort-8.7 take than sort-8.5? Does adding the --parallel=1 option change things?
Yes, it's spinning forever as long as it waits for 'less' to print more text. i.e. a workaround is this: sort file | tee /dev/null | less though it's not really practical.... And positive seems to be adding '--parallel=1' at it fixes the problem.
Drats. I'd been worried about that. Pity my laptop isn't multicore :( http://lists.gnu.org/archive/html/bug-gnulib/2010-09/msg00361.html You could try changing spinlocks to mutexes with: ftp://ftp.gnu.org/gnu/coreutils/coreutils-8.7.tar.xz and configure with HAVE_PTHREAD_SPINLOCK_T undefined. Or you could manually put the following at the top of src/sort.c: typedef pthread_mutex_t pthread_spinlock_t; static inline int pthread_spin_init (pthread_spinlock_t *lock, int pshared) { return pthread_mutex_init (lock, NULL); } static inline int pthread_spin_destroy (pthread_spinlock_t *lock) { return pthread_mutex_destroy (lock); } static inline int pthread_spin_lock (pthread_spinlock_t *lock) { return pthread_mutex_lock (lock); } static inline int pthread_spin_trylock (pthread_spinlock_t *lock) { return pthread_mutex_trylock (lock); } static inline int pthread_spin_unlock (pthread_spinlock_t *lock) { return pthread_mutex_unlock (lock); }
Here is actual backtrace from the spinlocked thread from current coreutils-8.7-2.fc15.x86_64 (gdb) bt #0 0x00007fd4275cd425 in ?? () from /lib64/libpthread.so.0 #1 0x0000000000404fde in lock_node (node=<value optimized out>) at sort.c:3766 #2 update_parent (queue=0x7fff8098f2a0, merged=<value optimized out>, node=0x7fd3db4d5da0) at sort.c:3943 #3 merge_loop (temp_output=0x0, tfp=0x7fd4275ba8c0, total_lines=257991, queue=0x7fff8098f2a0) at sort.c:3977 #4 sortlines (lines=<value optimized out>, dest=<value optimized out>, nthreads=140735350895272, total_lines=257991, parent=<value optimized out>, lo_child=<value optimized out>, merge_queue=0x7fff8098f2a0, tfp=0x7fd4275ba8c0, temp_output=0x0) at sort.c:4088 #5 0x00000000004054ab in sortlines_thread (data=<value optimized out>) at sort.c:4011 #6 0x00007fd4275c7d5b in start_thread () from /lib64/libpthread.so.0 #7 0x00007fd427301aad in clone () from /lib64/libc.so.6 btw - it's very nicely visible i.e. on quad core that even though you make a real time processing fast - user time is actually much higher. (i.e. on larger file 22s real&user single threaded -> 13/47s multi core solution - so in effect code takes twice as much CPU at current implementation.) I think it is not a good plan to 'randomly' replace spinlock with mutex - that's not how pthread works - I would expect some conditional waits somewhere. Oh and btw. I really don't understand how such patch could have been acked....
> Oh and btw. I really don't understand how such patch could have been acked Mea culpa. I've no dedicated multicore hardware, but mainly no time. Full performance testing is on my todo list (you could help here :)) http://lists.gnu.org/archive/html/bug-coreutils/2010-07/msg00144.html I thought it better to release early to get varied testing, given I was happy that it was functionally correct.
coreutils-8.8-1.fc15 with various upstream fixes for parallel sorting built, closing RAWHIDE.