Bug 82032 - huge sort speed regression
Summary: huge sort speed regression
Alias: None
Product: Red Hat Public Beta
Classification: Retired
Component: coreutils
Version: phoebe
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Tim Waugh
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2003-01-16 15:54 UTC by Pádraig Brady
Modified: 2005-10-31 22:00 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2003-03-12 10:02:01 UTC

Attachments (Terms of Use)
coreutils-4.5.3-i18n.patch (101.12 KB, patch)
2003-03-11 11:26 UTC, Tim Waugh
no flags Details | Diff

Description Pádraig Brady 2003-01-16 15:54:38 UTC
Description of problem:
OK I downloaded coreutils-4.5.3 myself and compiled and
it did NOT have this problem. Here are the differences in speed:
$time ./mysort_test > /dev/null
 user    0m0.030s
$time ./rhsort_test > /dev/null
 user    0m32.638s

Version-Release number of selected component (if applicable):


How reproducible:

every time

Steps to Reproduce:

find /bin /usr/bin -size +0c -type f -printf "%p\0\060\0%i\0%s\n" |
tr ' \t\0' '\0\1 ' |
sort -k2n -k4nr -k3 -u

Actual results:

takes around 30s (on my 866MHz celery)

Expected results:

should execute in < 1s (after inodes are cached on subsequent runs)

Additional info:

Specifying (more accurate) keys than above actually increases
the run time to about 1 minute. i.e. using:
sort -k2,2n -k4,4nr -k3,3 -u

Comment 1 Tim Waugh 2003-01-16 16:10:59 UTC
With LC_CTYPE=C it is much faster.  In UTF-8 charset it just has much more work
to do.

We patch coreutils to make it handle multi-byte encodings; this also makes it
L18NUX compliant.

Comment 2 Pádraig Brady 2003-01-16 16:38:55 UTC
Wow, that's some speed difference though! 585 times slower?
1m11s to sort 1526 items is pretty much useless IMHO and at least
shouldn't be the default?

Yes my LANG was en_IE.UTF-8 and when I changed it to C
or LC_CTYPE=C it reverted to the previous operation.

Comment 3 Mitsuru Chinen 2003-03-11 09:28:42 UTC
My name is Mitsuru Chinen. I'm a staff of OpenI18N (aka LI18NUX).
I found this problem by chance.

This problem was caused by the 2 matters.
 1) strlen() was called too many times in spite of being unnecessary.
 2) wctype() was called too many times in spite of being unnecessary.

If you modify the coreutis-4.5.3-i18n.patch with the following
methods, I18N sort utility becomes much faster.
(However new I18N sort is still 8 time slower than original one.)

After all patches in rpm package are applied, you can find the
following sentence at ismbblank() in sort.c.

| mblength = mbrtowc (&wc, str, strlen (str), &state);

strlen() spends a lot of time. It should be replaced with MB_LEN_MAX,

There are similar defects in coreutis-4.5.3-i18n.patch.
You can find those with `grep "mbrtowc.*strlen"'.

After all patches in rpm package are applied, you can find the
following sentence at ismbblank() in sort.c.

| return (iswctype (wc, wctype ("blank")));

wctype spends a lot of time, too. 
  iswctype (wc, wctype ("blank"))
should be replaced with
  iswblank (wc) 

There are similar defects in coreutis-4.5.3-i18n.patch, too.
You can find those with `grep "iswctype.*wctype"'.

Thank you,

Comment 4 Tim Waugh 2003-03-11 09:55:00 UTC
> strlen() spends a lot of time. It should be replaced with MB_LEN_MAX

It can't be outright replaced with MB_LEN_MAX; otherwise mbrtowc risks reading
past the end of the string in the case that no complete character is found.  But
it could be replaced withs strnlen (str, MB_LEN_MAX).  Is that what you meant?

> iswctype (wc, wctype ("blank"))

Oops, good catch!


Comment 5 Mitsuru Chinen 2003-03-11 11:21:29 UTC
You are right. I could not find such a risk out.
I agree with you: strnlen (str, MB_LEN_MAX) is correct one.

Thank you,

Comment 6 Tim Waugh 2003-03-11 11:26:11 UTC
Created attachment 90549 [details]

Actually it turns out that most callers of ismbblank() already know what the
length is, so I adjusted its prototype to include that.

Here's a new version of coreutils-4.5.3-i18n.patch with the changes you
suggested and the change I just mentioned.

Comment 7 Mitsuru Chinen 2003-03-12 01:46:38 UTC
I reviewed your new patch. It's very nice.

Not concerned with this problem, I found a tiny bug.

In getmonth_mb(),
 | month_wcs = (wchar_t *) alloca (len + 1);
shoud be
 | month_wcs = (wchar_t *) alloca (len + 1) * sizeof (wchar_t);

I think you should close this bug now.
Thanks so much!

Comment 8 Tim Waugh 2003-03-12 09:58:04 UTC
month_wcs = (wchar_t *) alloca ((len + 1) * sizeof (wchar_t));

Thanks, also fixed.

Comment 9 Tim Waugh 2003-03-12 10:02:01 UTC
Fixed package is coreutils-4.5.3-21, which will appear in rawhide in due course.

Comment 10 Pádraig Brady 2003-03-12 15:50:12 UTC
I tested the patch (rawhide doesn't have it yet) and it's
a LOT better. In fact 4500% better from my tests.
It's still 1300% slower than LC_CTYPE=C though,
maybe room for more improvments?

$ ./sort_test 
sorting 1534 lines
timing: sort -k2,2n -k4,4nr -k3,3 -u
real	1m11.922s
user	2m23.500s
sys	0m0.060s
same with LC_CTYPE=C
real	0m0.063s
user	0m0.080s
sys	0m0.030s
timing: ~/sort -k2,2n -k4,4nr -k3,3 -u
real	0m0.804s
user	0m1.580s
sys	0m0.040s

Note the real time is correct, the user seems to
be multiplied by 2 (sometimes 4) for some reason?

btw coreutils is currently at version 4.5.9

Comment 11 Tim Waugh 2003-03-12 15:53:05 UTC
> maybe room for more improvments?

Perhaps.  Patches welcome.

> btw coreutils is currently at version 4.5.9

The last version that Jim Meyering suggested that distributors use is 4.5.3;
also 4.5.11 is targetted as being a bona fide stable release, so I'm holding out
for that.

Note You need to log in before you can comment on or make changes to this bug.