Bug 238406

Summary:	printf segfault (buffer overrun?) for large precision in multi-byte locale
Product:	[Fedora] Fedora	Reporter:	Jim Meyering <meyering>
Component:	glibc	Assignee:	Jakub Jelinek <jakub>
Status:	CLOSED RAWHIDE	QA Contact:	Brian Brock <bbrock>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	rawhide	CC:	fweimer
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	2.5.90-22	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2007-05-08 07:05:14 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jim Meyering 2007-04-30 11:07:23 UTC

Description of problem:
Excessive precision in a printf format string provokes a segfault.
I.e., "%1.Ns" for large N.


Version-Release number of selected component (if applicable):


How reproducible:
Consistently, on rawhide (updated a day or two ago) and RHEL 3 & 4.


Steps to Reproduce:
1. LC_ALL=en_US.UTF-8 /usr/bin/printf %1.25000000s x; echo
  
Actual results:
Segmentation fault

Expected results:
x

Additional info:
You can reproduce the failure using a simple C program, too:

$ cat kk.c
#include <stdio.h>
#include <locale.h>
int
main ()
{
  setlocale (LC_ALL, "");
  printf ("%1.25000000s", "x");
  return 0;
}
$ gcc -O kk.c && LC_ALL=fr_FR.utf8 ./a.out
zsh: segmentation fault  LC_ALL=fr_FR.utf8 ./a.out
[Exit 139 (SEGV)]

I looked at libc/stdio-common/vfprintf.c's process_string_arg macro, and spotted
this:

  len = prec != -1 ? (size_t) prec : strlen (mbs);	 \
  if (__libc_use_alloca (len * sizeof (wchar_t)))	 \
    string = (CHAR_T *) alloca (len * sizeof (wchar_t)); \

I'm not sure it's related -- haven't used a debugger or rebuilt -- but in that
test, "len * sizeof (wchar_t)" can overflow, which leads to allocating far less
space than is eventually used -> buffer overrun.

For the record, this started with a report filed against coreutils' printf
command: <http://bugs.debian.org/421555>.

Comment 1 Victor Stinner 2007-04-30 12:40:53 UTC

(Copy of my email sent yesterday at 3:00AM.)
Hi,

I use your personnal emails [of few libc developers] because the bug might be 
a security vulnerability (I don't know Linux kernel enough to guess). If it is 
not, I can open a bug report on Bugzilla if you would like to.

I found a bug in dpkg program (from apt-get of Debian project):
   COLUMNS=10000000 dpkg -l
=> Crash with segfault (SIGSEGV)

After long investigation (around one week :-)), I'm certain that the bug comes
from GNU libc. The crash is not specific to this program, any program allowing
to change format string of printf() may crash. Smallest C testcase:
-------------------------------------------------------------
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>

int main()
{
    setlocale (LC_CTYPE, "");
    printf("%-1.30500200s\n", "Hello");
    return 0;
}
-------------------------------------------------------------

If your locale is not UTF-8, specify another multibyte locale to setlocale(). 
The value "30500200" just have to be bigger than current stack size limit.

You can also try with bash/core-utils printf:
-------------------------------------------------------------
printf '%-1.25000000s' 'Hello'
-------------------------------------------------------------


The bug is located in stdio-common/vfprintf.c, macro "process_string_arg", in 
this block:
-------------------------------------------------------------
   if (prec != -1)
     {
       /* Search for the end of the string, but don't search past
          the length (in bytes) specified by the precision.  Also
          don't use incomplete characters.  */
       if (_NL_CURRENT_WORD (LC_CTYPE, _NL_CTYPE_MB_CUR_MAX) == 1)
         len = __strnlen (string, prec);
       else
         {
           /* In case we have a multibyte character set the
              situation is more compilcated.  We must not copy
              bytes at the end which form an incomplete character. */
           wchar_t ignore[prec];
           const char *str2 = string;
           mbstate_t ps;

           memset (&ps, '\0', sizeof (ps));
           if (__mbsnrtowcs (ignore, &str2, prec, prec, &ps)
        == (size_t) -1)
             {
        done = -1;
        goto all_done;
             }
           if (str2 == NULL)
             len = strlen (string);
           else
             len = str2 - string - (ps.__count & 7);
         }
     }
   else
     len = strlen (string);
-------------------------------------------------------------

If 1 < prec and 1 < LC_CTYPE[_NL_CTYPE_MB_CUR_MAX], we go in "complicated" 
block :-)

Now imagine that prec is equal to 30500200: 30 MB will be "allocated" on the 
stack (by "wchar_t ignore[prec]") whereas Linux use 8 MB (in default config) 
for stack limit. Stack *should* grow up/down, but on my compute (i386) gcc 
just use "sub $eax, $esp" instruction to allocated memory and Linux just 
raises the signal SIGSEGV.

I don't know enough locale API (mbsnrtowcs() function) to fix the bug.

Victor Stinner
http://www.inl.fr/

Comment 2 Jakub Jelinek 2007-05-01 11:16:59 UTC

Fixed upstream.

Comment 3 Jakub Jelinek 2007-05-08 07:05:14 UTC

Fixed in glibc-2.5.90-22.