1259942 – coreutils "sort -M" memory leak

Bug 1259942 - coreutils "sort -M" memory leak

Summary: coreutils "sort -M" memory leak

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	coreutils
Sub Component:
Version:	21
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Kamil Dudka
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1540059
TreeView+	depends on / blocked

Reported:	2015-09-03 21:25 UTC by Sam Elstob
Modified:	2018-01-30 07:51 UTC (History)
CC List:	9 users (show)
Fixed In Version:	8.23-11.fc22
Clone Of:
Clones:	1540059 (view as bug list)
Environment:
Last Closed:	2015-09-21 10:47:59 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sam Elstob 2015-09-03 21:25:34 UTC

Description of problem:

I believe I have encountered  a major memory leak in coreutils sort when sorting by month "-M"

Version-Release number of selected component (if applicable):

[sam@deben coreutils]$ rpm -q coreutils
coreutils-8.22-22.fc21.x86_64
[sam@deben coreutils]$ sort --version
sort (GNU coreutils) 8.22
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.


How reproducible: Every time


Steps to Reproduce:
1. Create a test file

base64 /dev/urandom | head -n 10000 > 10000.txt

2. Run under valgrind (defaults)

valgrind sort 10000.txt > /dev/null

3. Run under valgrind (-M)

valgrind sort -M 10000.txt > /dev/null

Actual results:

[sam@deben coreutils]$ valgrind sort 10000.txt > /dev/null
==8382== Memcheck, a memory error detector
==8382== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==8382== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==8382== Command: sort 10000.txt
==8382== 
==8382== 
==8382== HEAP SUMMARY:
==8382==     in use at exit: 192 bytes in 14 blocks
==8382==   total heap usage: 60 allocs, 46 frees, 74,697,309 bytes allocated
==8382== 
==8382== LEAK SUMMARY:
==8382==    definitely lost: 0 bytes in 0 blocks
==8382==    indirectly lost: 0 bytes in 0 blocks
==8382==      possibly lost: 0 bytes in 0 blocks
==8382==    still reachable: 192 bytes in 14 blocks
==8382==         suppressed: 0 bytes in 0 blocks
==8382== Rerun with --leak-check=full to see details of leaked memory
==8382== 
==8382== For counts of detected and suppressed errors, rerun with: -v
==8382== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


[sam@deben coreutils]$ valgrind sort -M 10000.txt > /dev/null
==8312== Memcheck, a memory error detector
==8312== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==8312== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==8312== Command: sort -M 10000.txt
==8312== 

==8312== 
==8312== HEAP SUMMARY:
==8312==     in use at exit: 92,753,702 bytes in 481,851 blocks
==8312==   total heap usage: 722,815 allocs, 240,964 frees, 186,001,505 bytes allocated
==8312== 
==8312== LEAK SUMMARY:
==8312==    definitely lost: 92,731,870 bytes in 481,751 blocks
==8312==    indirectly lost: 0 bytes in 0 blocks
==8312==      possibly lost: 21,021 bytes in 78 blocks
==8312==    still reachable: 811 bytes in 22 blocks
==8312==         suppressed: 0 bytes in 0 blocks
==8312== Rerun with --leak-check=full to see details of leaked memory
==8312== 
==8312== For counts of detected and suppressed errors, rerun with: -v
==8312== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


Expected results:

No "definitely lost" blocks when using -M

Additional info:

This issue was found when a set of machines were receiving intermittent OOM killer triggers.  We tracked it down to a "sort -M <large file>" command being run as root which was using all the available memory.

Comment 1 Sam Elstob 2015-09-03 21:31:22 UTC

> Additional info:
> 
> This issue was found when a set of machines were receiving intermittent OOM > > killer triggers.  We tracked it down to a "sort -M <large file>" command being > run as root which was using all the available memory.

To clarify the <large file> that lead to finding this issue was around 1GB of syslog text data.

I believe the test case I have included in the previous comment demonstrates the memory leak exists regardless of the input size.

Comment 2 Pádraig Brady 2015-09-03 23:09:23 UTC

I've not looked into it, but it does seem restricted to the i18n patch.
There are an amazing amount of allocs to begin with,
never mind the number of frees don't match.

$ valgrind sort -M 10000.txt > /dev/null
  HEAP SUMMARY:
     in use at exit: 92,797,172 bytes in 482,074 blocks
   total heap usage: 723,145 allocs, 241,071 frees, 186,052,277 bytes allocated
  LEAK SUMMARY:
    definitely lost: 92,794,835 bytes in 482,054 blocks
    indirectly lost: 1,088 bytes in 2 blocks 
      possibly lost: 1,001 bytes in 4 blocks 
    still reachable: 248 bytes in 14 blocks

$ valgrind sort-upstream -M 10000.txt > /dev/null
  HEAP SUMMARY:
     in use at exit: 272 bytes in 15 blocks
   total heap usage: 53 allocs, 38 frees, 74,696,851 bytes allocated
  LEAK SUMMARY:
    definitely lost: 24 bytes in 1 blocks 
    indirectly lost: 0 bytes in 0 blocks 
      possibly lost: 0 bytes in 0 blocks 
    still reachable: 248 bytes in 14 blocks

$ export LC_ALL=C
$ valgrind sort -M 10000.txt > /dev/null
  HEAP SUMMARY:
     in use at exit: 1,344 bytes in 6 blocks 
   total heap usage: 12 allocs, 6 frees, 74,693,156 bytes allocated
  LEAK SUMMARY:
    definitely lost: 56 bytes in 2 blocks 
    indirectly lost: 1,088 bytes in 2 blocks 
      possibly lost: 0 bytes in 0 blocks 
    still reachable: 200 bytes in 2 blocks 
         suppressed: 0 bytes in 0 blocks

Comment 3 Ondrej Vasik 2015-09-04 07:06:21 UTC

Yes, likely an issue with i18n patch as Pádraig wrote. This patch is a nightmare and we recommend to use C locales when running scripts - as this patch heavily affects performance, memory consumption (because of the leaks) and reliability.

We don't want to invest too much time into i18n patch improvements, as there is an effort to do upstream-able rewrite. Anyway Ondrej Oprala did several improvements in sort part of this patch, so keeping it opened for investigation.

Comment 4 Sam Elstob 2015-09-08 10:02:21 UTC

I've confirmed that "LC_ALL=C" is a workaround in our case.

Could you please confirm which cases are affected by the memory leak as we need to apply this work around to a number of production systems?

1. sort
2. sort -M
3. Any other "sort" parameters?

If the bug only affects "sort -M" then we can make a more focussed change.

Many thanks

Comment 5 Pádraig Brady 2015-09-08 12:08:41 UTC

The issue should only be with sort -M I think.
I had a quick look and this should fix it:
https://github.com/pixelb/coreutils/commit/fbbe8c06

Comment 6 Ondrej Vasik 2015-09-10 08:07:16 UTC

Thanks Pádraig, patch makes sense to me, should I inform Bernie or do you plan to do that?

As to "what's affected" - I think LC_ALL=C is generally better idea for sorting in production environment - unless you rely on some locales specific collation rules. It is definitely safer and faster - as i18n patch is downstream and has history of various issues.
This particular bug is limited to sort -M only, but we can't rule out there are some other buggy parameters in multibyte path.

Comment 7 Pádraig Brady 2015-09-10 10:07:42 UTC

I'll inform Bernie (Suse).

Yep LC_ALL=C is definitely recommended anyway

Comment 8 Pádraig Brady 2015-09-11 01:31:06 UTC

I updated with a test at https://github.com/pixelb/coreutils/commit/4526a88d

I also found a crash in this function with some inputs!
That's fixed in: https://github.com/pixelb/coreutils/commit/0ca5ebdb

Comment 9 Kamil Dudka 2015-09-16 17:34:33 UTC

Thank you for the patches, Pádraig!

I will get them to Fedora...

Comment 10 Kamil Dudka 2015-09-16 18:25:12 UTC

fixed in coreutils-8.24-4.fc24

Comment 11 Fedora Update System 2015-09-17 06:11:31 UTC

coreutils-8.24-4.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-16076

Comment 12 Fedora Update System 2015-09-17 21:29:47 UTC

coreutils-8.24-4.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update coreutils'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16076

Comment 13 Fedora Update System 2015-09-18 16:24:20 UTC

coreutils-8.23-11.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update coreutils'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16075

Comment 14 Fedora Update System 2015-09-21 10:47:56 UTC

coreutils-8.24-4.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 15 Fedora Update System 2015-09-22 22:53:38 UTC

coreutils-8.23-11.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.