Bug 1036289 - uniq: weird behavior on kind of binary input in non-C locale
Summary: uniq: weird behavior on kind of binary input in non-C locale
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: coreutils
Version: 20
Hardware: Unspecified
OS: Unspecified
low
unspecified
Target Milestone: ---
Assignee: Ondrej Vasik
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-11-30 14:48 UTC by Pavel Raiskup
Modified: 2013-12-22 23:30 UTC (History)
9 users (show)

Fixed In Version: coreutils-8.22-3.fc21
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-22 23:30:54 UTC
Type: Bug


Attachments (Terms of Use)
Data file. (132 bytes, text/plain)
2013-11-30 14:48 UTC, Pavel Raiskup
no flags Details

Description Pavel Raiskup 2013-11-30 14:48:53 UTC
Created attachment 830960 [details]
Data file.

I tried to clean my ~/.bash_history today and I noticed that the uniq utility
behaves differently then I would expect — even if that seems to be really
corner case.  So marking this as low priority.  I tried to build upstream
coreutils and it works as expected (and in virtual debian 7 it works also OK).

Try the behavior of uniq in FC20 (x86_64) on attached file:

  $ cat data | LC_ALL=C uniq
  xrandr --output VGA1 --left-of LVDS1
  �
  $ cat data | LC_ALL=en_US.utf8 uniq
  xrandr --output VGA1 --left-of LVDS1
  �
  �

Pavel

Comment 3 Ondrej Vasik 2013-12-01 07:43:02 UTC
Thanks for report, reproducible. Just another issue with multibyte patch. For reference - with space added before the "ï" characters it works just fine even for non-c locales in coreutils-8.17 and older (not with latest i18n patch), so there is something rotten anyway.

Comment 4 Ondrej Vasik 2013-12-18 21:35:38 UTC
Looking at the debug - mbrtowc returns -2 for the first character - which means incomplete. The switch sets mblength to 1 and falls through - and j is incremented. However, this causes the end of the for cycle. As no memcpy was performed, xmalloced memory is compared - thus they are different. One of the solutions is to clean the copy buffer memory upon inicialization - either with memset or by using xcalloc. Similar issue can occur in join.

Comment 5 Bernhard Voelker 2013-12-20 08:38:04 UTC
Referring to the patch [1].

> -+ copy[i] = xmalloc (len[i] + 1);
> ++ copy[i] = xcalloc (0, len[i] + 1);

xcalloc (0, ...) allocates zero bytes of memory, this is probably not intended.
Better stick to xmalloc() + memset() for clarity?

http://pkgs.fedoraproject.org/cgit/coreutils.git/commit/?id=f1ce0c90

Comment 6 Ondrej Vasik 2013-12-20 09:48:35 UTC
Yep, for clarity it would definitely be better. Thanks for spotting this...

Comment 7 Yanko Kaneti 2013-12-21 11:24:30 UTC
patch[1] causes a reproducible malloc protection failure in a certain use case I have here ( basically clone gnome-menus from gnome-git, autogen, make -j7)

*** Error in `uniq': free(): invalid next size (fast): 0x00000000013b6290 ***


Core was generated by `uniq'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fd9456531c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007fd9456531c9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fd9456548d8 in __GI_abort () at abort.c:89
#2  0x00007fd945694d94 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fd9457a0568 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007fd94569c71b in malloc_printerr (ptr=<optimized out>, str=0x7fd9457a08b8 "free(): invalid next size (fast)", action=3) at malloc.c:4888
#4  _int_free (av=0x7fd9459dd760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3762
#5  0x0000000000402d1e in different_multi (old=old@entry=0x1280f60 "/desktop-directories/Makefile.in\n", new=new@entry=0x1280ff0 "/depcomp\ne\ntus.lineno\n", 
    oldlen=oldlen@entry=32, newlen=newlen@entry=8, oldstate=..., oldstate@entry=..., newstate=...) at src/uniq.c:480
#6  0x00000000004023bc in check_file (delimiter=10 '\n', outfile=<optimized out>, infile=0x40718a "-") at src/uniq.c:576
#7  main (argc=<optimized out>, argv=<optimized out>) at src/uniq.c:925

Comment 8 Ondrej Vasik 2013-12-21 15:19:00 UTC
Yes, it has to be fixed (xmalloc + memset instead of xcalloc), I'm aware of it... but can't do that sooner than on Monday.

Comment 9 Ondrej Vasik 2013-12-21 15:21:22 UTC
(If you need to have this fixed sooner, just untag the coreutils-8.22-2.fc21 from rawhide and ensure that coreutils-8.22-1.fc21 is tagged.)

Comment 10 Bernhard Voelker 2013-12-22 20:58:33 UTC
Changing arg#1 from 0 to 1 would also be okay:

-+ copy[i] = xcalloc (0, len[i] + 1);
++ copy[i] = xcalloc (1, len[i] + 1);

(no renumbering of the patch hunk lines then ... ;-)

Comment 11 Ondrej Vasik 2013-12-22 23:25:42 UTC
I'll stay with xmalloc and memset...

Comment 12 Ondrej Vasik 2013-12-22 23:30:54 UTC
(Built as coreutils-8.22-3.fc21 btw. - closing Rawhide, as the actual issue is only for incomplete multibyte chars - so very rare and to some extent rubbish in - rubbish out case)


Note You need to log in before you can comment on or make changes to this bug.