| Summary: | uniq: weird behavior on kind of binary input in non-C locale | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Pavel Raiskup <praiskup> | ||||
| Component: | coreutils | Assignee: | Ondrej Vasik <ovasik> | ||||
| Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | 20 | CC: | admiller, kdudka, kzak, mail, ooprala, ovasik, p, twaugh, yaneti | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | coreutils-8.22-3.fc21 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-12-22 23:30:54 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Thanks for report, reproducible. Just another issue with multibyte patch. For reference - with space added before the "ï" characters it works just fine even for non-c locales in coreutils-8.17 and older (not with latest i18n patch), so there is something rotten anyway. Looking at the debug - mbrtowc returns -2 for the first character - which means incomplete. The switch sets mblength to 1 and falls through - and j is incremented. However, this causes the end of the for cycle. As no memcpy was performed, xmalloced memory is compared - thus they are different. One of the solutions is to clean the copy buffer memory upon inicialization - either with memset or by using xcalloc. Similar issue can occur in join. Referring to the patch [1]. > -+ copy[i] = xmalloc (len[i] + 1); > ++ copy[i] = xcalloc (0, len[i] + 1); xcalloc (0, ...) allocates zero bytes of memory, this is probably not intended. Better stick to xmalloc() + memset() for clarity? http://pkgs.fedoraproject.org/cgit/coreutils.git/commit/?id=f1ce0c90 Yep, for clarity it would definitely be better. Thanks for spotting this... patch[1] causes a reproducible malloc protection failure in a certain use case I have here ( basically clone gnome-menus from gnome-git, autogen, make -j7)
*** Error in `uniq': free(): invalid next size (fast): 0x00000000013b6290 ***
Core was generated by `uniq'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fd9456531c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x00007fd9456531c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fd9456548d8 in __GI_abort () at abort.c:89
#2 0x00007fd945694d94 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fd9457a0568 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007fd94569c71b in malloc_printerr (ptr=<optimized out>, str=0x7fd9457a08b8 "free(): invalid next size (fast)", action=3) at malloc.c:4888
#4 _int_free (av=0x7fd9459dd760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3762
#5 0x0000000000402d1e in different_multi (old=old@entry=0x1280f60 "/desktop-directories/Makefile.in\n", new=new@entry=0x1280ff0 "/depcomp\ne\ntus.lineno\n",
oldlen=oldlen@entry=32, newlen=newlen@entry=8, oldstate=..., oldstate@entry=..., newstate=...) at src/uniq.c:480
#6 0x00000000004023bc in check_file (delimiter=10 '\n', outfile=<optimized out>, infile=0x40718a "-") at src/uniq.c:576
#7 main (argc=<optimized out>, argv=<optimized out>) at src/uniq.c:925
Yes, it has to be fixed (xmalloc + memset instead of xcalloc), I'm aware of it... but can't do that sooner than on Monday. (If you need to have this fixed sooner, just untag the coreutils-8.22-2.fc21 from rawhide and ensure that coreutils-8.22-1.fc21 is tagged.) Changing arg#1 from 0 to 1 would also be okay: -+ copy[i] = xcalloc (0, len[i] + 1); ++ copy[i] = xcalloc (1, len[i] + 1); (no renumbering of the patch hunk lines then ... ;-) I'll stay with xmalloc and memset... (Built as coreutils-8.22-3.fc21 btw. - closing Rawhide, as the actual issue is only for incomplete multibyte chars - so very rare and to some extent rubbish in - rubbish out case) |
Created attachment 830960 [details] Data file. I tried to clean my ~/.bash_history today and I noticed that the uniq utility behaves differently then I would expect — even if that seems to be really corner case. So marking this as low priority. I tried to build upstream coreutils and it works as expected (and in virtual debian 7 it works also OK). Try the behavior of uniq in FC20 (x86_64) on attached file: $ cat data | LC_ALL=C uniq xrandr --output VGA1 --left-of LVDS1 � $ cat data | LC_ALL=en_US.utf8 uniq xrandr --output VGA1 --left-of LVDS1 � � Pavel