Bug 734335
Summary: | [abrt] coreutils-8.10-2.fc15: different_multi: Process /usr/bin/uniq was killed by signal 11 (SIGSEGV) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | seoaqua <seoaqua> |
Component: | coreutils | Assignee: | Ondrej Vasik <ovasik> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 15 | CC: | kdudka, maxamillion, ovasik, rrakus, twaugh, vvitek |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | abrt_hash:9ba38693068905d711c4e5638b8ac2630c7ab785 | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-07-16 11:38:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
seoaqua
2011-08-30 06:13:46 UTC
i think its caused by using one super long string as one line i forgot to put "\n" at the end of each line in my program Thanks for report ... 2 questions: 1) Is it possible to get this "a_big_text_file" ? Even directly via email, if you don't want to attach it to bugzilla... I tried 1G4 file and it didn't crash... 2) Could you please try to run the same with C locales? (I mean LC_ALL=C uniq ... ) sorry it was a temp file in my program.and i think u r right that long string didnt make crash. im trying to reproduce the issue in my old way, and i dont know what u r talking about in Q2,sorry >.< Thanks for the update...Please let me know if you manage to reproduce the crash scenario. About the Q2 - based on the LANG environmental variable (en_US.utf8) automatically sent by ABRT, you use the multibyte support in uniq. I just wanted to be sure that the crash occurs with single byte (e.g. C) locales. To run the uniq utility with different locales you could just run it with LC_ALL=C in the front of command - so the command from your example will look like: LC_ALL=C uniq a_big_text_file > target file In that case, uniq should work just fine. (and faster, as multibyte functions are very slow). sorry i need to process chinese under utf-8, but i'll try to reproduce in the old stupid way^_^ i can't reproduce. the code wasn't under version control. just imagine 16,000,000 line words, with 5000 thousand in one line, and maybe 15000 in another line and 99% single words in the rest lines maybe you should close this now, sorry to bother :P if possible, could you tell me how to represent unicode range in shell? like: sed 's/\u000-\u001/xxx/' filename produces range error many thanks ! No closing is not necessary atm. - as the backtrace is almost complete - so I could analyze the issue even without reproducer ... but it would be better to have it for verification. As for your question about unicode range representation - I don't know - so adding sed maintainer to cc ... we'll see :) (In reply to comment #7) > sed 's/\u000-\u001/xxx/' filename Ifaik, you can't specify Unicode character with a '\uXXXX' escape sequence in sed yet, thus you can't even make a range like that.. But you can specify the ranges with the Unicode characters themselves: $ echo -e "øùúûüý" | sed 's/[û-ü]/_/g' øùú__ý Or you can use their hexadecimal representation: $ echo -e "øùúûüý" | sed 's/[\xC3\xBB-\xC3\xBC]/_/g' øùú__ý Please, beware of strange behaviour while not using plain ASCII ordering (LC_ALL=C). See http://lists.gnu.org/archive/html/bug-gnu-utils/2011-04/msg00016.html for more details.. i have been search this answer for three days thank you so much! in chinese: 非常感谢^_^ (In reply to comment #9) > (In reply to comment #7) > > sed 's/\u000-\u001/xxx/' filename > > Ifaik, you can't specify Unicode character with a '\uXXXX' escape sequence in > sed yet, thus you can't even make a range like that.. > > But you can specify the ranges with the Unicode characters themselves: > $ echo -e "øùúûüý" | sed 's/[û-ü]/_/g' > øùú__ý > > Or you can use their hexadecimal representation: > $ echo -e "øùúûüý" | sed 's/[\xC3\xBB-\xC3\xBC]/_/g' > øùú__ý > > Please, beware of strange behaviour while not using plain ASCII ordering > (LC_ALL=C). See > http://lists.gnu.org/archive/html/bug-gnu-utils/2011-04/msg00016.html for more > details.. sorry to bother you again, actually i want to replace all non-chinese characters with NULL i think there are several chinese characters missing from the sed range it almost works with =============command=============== sed 's/[一-龥]//g' filename =============command=============== or ====================command==================================== sed 's/[\xE4\xB8\x80-\xE9\xBE\xA5]//g' filename ====================command==================================== but there are some other chinese characters =====================command========================== sed 's/[\xE4\xB8\x80-\xE9\xBE\xA6]//g' filename =====================command========================== will cause error: sed: -e expression #1, char 14: Invalid collation character (In reply to comment #9) > (In reply to comment #7) > > sed 's/\u000-\u001/xxx/' filename > > Ifaik, you can't specify Unicode character with a '\uXXXX' escape sequence in > sed yet, thus you can't even make a range like that.. > > But you can specify the ranges with the Unicode characters themselves: > $ echo -e "øùúûüý" | sed 's/[û-ü]/_/g' > øùú__ý > > Or you can use their hexadecimal representation: > $ echo -e "øùúûüý" | sed 's/[\xC3\xBB-\xC3\xBC]/_/g' > øùú__ý > > Please, beware of strange behaviour while not using plain ASCII ordering > (LC_ALL=C). See > http://lists.gnu.org/archive/html/bug-gnu-utils/2011-04/msg00016.html for more > details.. Cleanup, sorry for no solution of this bug report, closing INSUFFICIENT_DATA - I'm not able to reproduce it, reporter is not able to reproduce it and I don't see any clear flaw in the code in question. (ok, some additional safety checks could be done there, but it will slow down everything for uncertain gain). Anyone could reopen this bugzilla if he is able to reproduce the issue. |