Description of problem: I believe I have encountered a major memory leak in coreutils sort when sorting by month "-M" Version-Release number of selected component (if applicable): [sam@deben coreutils]$ rpm -q coreutils coreutils-8.22-22.fc21.x86_64 [sam@deben coreutils]$ sort --version sort (GNU coreutils) 8.22 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Mike Haertel and Paul Eggert. How reproducible: Every time Steps to Reproduce: 1. Create a test file base64 /dev/urandom | head -n 10000 > 10000.txt 2. Run under valgrind (defaults) valgrind sort 10000.txt > /dev/null 3. Run under valgrind (-M) valgrind sort -M 10000.txt > /dev/null Actual results: [sam@deben coreutils]$ valgrind sort 10000.txt > /dev/null ==8382== Memcheck, a memory error detector ==8382== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==8382== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info ==8382== Command: sort 10000.txt ==8382== ==8382== ==8382== HEAP SUMMARY: ==8382== in use at exit: 192 bytes in 14 blocks ==8382== total heap usage: 60 allocs, 46 frees, 74,697,309 bytes allocated ==8382== ==8382== LEAK SUMMARY: ==8382== definitely lost: 0 bytes in 0 blocks ==8382== indirectly lost: 0 bytes in 0 blocks ==8382== possibly lost: 0 bytes in 0 blocks ==8382== still reachable: 192 bytes in 14 blocks ==8382== suppressed: 0 bytes in 0 blocks ==8382== Rerun with --leak-check=full to see details of leaked memory ==8382== ==8382== For counts of detected and suppressed errors, rerun with: -v ==8382== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) [sam@deben coreutils]$ valgrind sort -M 10000.txt > /dev/null ==8312== Memcheck, a memory error detector ==8312== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==8312== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info ==8312== Command: sort -M 10000.txt ==8312== ==8312== ==8312== HEAP SUMMARY: ==8312== in use at exit: 92,753,702 bytes in 481,851 blocks ==8312== total heap usage: 722,815 allocs, 240,964 frees, 186,001,505 bytes allocated ==8312== ==8312== LEAK SUMMARY: ==8312== definitely lost: 92,731,870 bytes in 481,751 blocks ==8312== indirectly lost: 0 bytes in 0 blocks ==8312== possibly lost: 21,021 bytes in 78 blocks ==8312== still reachable: 811 bytes in 22 blocks ==8312== suppressed: 0 bytes in 0 blocks ==8312== Rerun with --leak-check=full to see details of leaked memory ==8312== ==8312== For counts of detected and suppressed errors, rerun with: -v ==8312== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) Expected results: No "definitely lost" blocks when using -M Additional info: This issue was found when a set of machines were receiving intermittent OOM killer triggers. We tracked it down to a "sort -M <large file>" command being run as root which was using all the available memory.
> Additional info: > > This issue was found when a set of machines were receiving intermittent OOM > > killer triggers. We tracked it down to a "sort -M <large file>" command being > run as root which was using all the available memory. To clarify the <large file> that lead to finding this issue was around 1GB of syslog text data. I believe the test case I have included in the previous comment demonstrates the memory leak exists regardless of the input size.
I've not looked into it, but it does seem restricted to the i18n patch. There are an amazing amount of allocs to begin with, never mind the number of frees don't match. $ valgrind sort -M 10000.txt > /dev/null HEAP SUMMARY: in use at exit: 92,797,172 bytes in 482,074 blocks total heap usage: 723,145 allocs, 241,071 frees, 186,052,277 bytes allocated LEAK SUMMARY: definitely lost: 92,794,835 bytes in 482,054 blocks indirectly lost: 1,088 bytes in 2 blocks possibly lost: 1,001 bytes in 4 blocks still reachable: 248 bytes in 14 blocks $ valgrind sort-upstream -M 10000.txt > /dev/null HEAP SUMMARY: in use at exit: 272 bytes in 15 blocks total heap usage: 53 allocs, 38 frees, 74,696,851 bytes allocated LEAK SUMMARY: definitely lost: 24 bytes in 1 blocks indirectly lost: 0 bytes in 0 blocks possibly lost: 0 bytes in 0 blocks still reachable: 248 bytes in 14 blocks $ export LC_ALL=C $ valgrind sort -M 10000.txt > /dev/null HEAP SUMMARY: in use at exit: 1,344 bytes in 6 blocks total heap usage: 12 allocs, 6 frees, 74,693,156 bytes allocated LEAK SUMMARY: definitely lost: 56 bytes in 2 blocks indirectly lost: 1,088 bytes in 2 blocks possibly lost: 0 bytes in 0 blocks still reachable: 200 bytes in 2 blocks suppressed: 0 bytes in 0 blocks
Yes, likely an issue with i18n patch as Pádraig wrote. This patch is a nightmare and we recommend to use C locales when running scripts - as this patch heavily affects performance, memory consumption (because of the leaks) and reliability. We don't want to invest too much time into i18n patch improvements, as there is an effort to do upstream-able rewrite. Anyway Ondrej Oprala did several improvements in sort part of this patch, so keeping it opened for investigation.
I've confirmed that "LC_ALL=C" is a workaround in our case. Could you please confirm which cases are affected by the memory leak as we need to apply this work around to a number of production systems? 1. sort 2. sort -M 3. Any other "sort" parameters? If the bug only affects "sort -M" then we can make a more focussed change. Many thanks
The issue should only be with sort -M I think. I had a quick look and this should fix it: https://github.com/pixelb/coreutils/commit/fbbe8c06
Thanks Pádraig, patch makes sense to me, should I inform Bernie or do you plan to do that? As to "what's affected" - I think LC_ALL=C is generally better idea for sorting in production environment - unless you rely on some locales specific collation rules. It is definitely safer and faster - as i18n patch is downstream and has history of various issues. This particular bug is limited to sort -M only, but we can't rule out there are some other buggy parameters in multibyte path.
I'll inform Bernie (Suse). Yep LC_ALL=C is definitely recommended anyway
I updated with a test at https://github.com/pixelb/coreutils/commit/4526a88d I also found a crash in this function with some inputs! That's fixed in: https://github.com/pixelb/coreutils/commit/0ca5ebdb
Thank you for the patches, Pádraig! I will get them to Fedora...
fixed in coreutils-8.24-4.fc24
coreutils-8.24-4.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-16076
coreutils-8.24-4.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update coreutils'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16076
coreutils-8.23-11.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.\nIf you want to test the update, you can install it with \n su -c 'yum --enablerepo=updates-testing update coreutils'. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16075
coreutils-8.24-4.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.
coreutils-8.23-11.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.