Bug 238406

Summary: printf segfault (buffer overrun?) for large precision in multi-byte locale
Product: [Fedora] Fedora Reporter: Jim Meyering <meyering>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: fweimer
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.5.90-22 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-08 07:05:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jim Meyering 2007-04-30 11:07:23 UTC
Description of problem:
Excessive precision in a printf format string provokes a segfault.
I.e., "%1.Ns" for large N.


Version-Release number of selected component (if applicable):


How reproducible:
Consistently, on rawhide (updated a day or two ago) and RHEL 3 & 4.


Steps to Reproduce:
1. LC_ALL=en_US.UTF-8 /usr/bin/printf %1.25000000s x; echo
  
Actual results:
Segmentation fault

Expected results:
x

Additional info:
You can reproduce the failure using a simple C program, too:

$ cat kk.c
#include <stdio.h>
#include <locale.h>
int
main ()
{
  setlocale (LC_ALL, "");
  printf ("%1.25000000s", "x");
  return 0;
}
$ gcc -O kk.c && LC_ALL=fr_FR.utf8 ./a.out
zsh: segmentation fault  LC_ALL=fr_FR.utf8 ./a.out
[Exit 139 (SEGV)]

I looked at libc/stdio-common/vfprintf.c's process_string_arg macro, and spotted
this:

  len = prec != -1 ? (size_t) prec : strlen (mbs);	 \
  if (__libc_use_alloca (len * sizeof (wchar_t)))	 \
    string = (CHAR_T *) alloca (len * sizeof (wchar_t)); \

I'm not sure it's related -- haven't used a debugger or rebuilt -- but in that
test, "len * sizeof (wchar_t)" can overflow, which leads to allocating far less
space than is eventually used -> buffer overrun.

For the record, this started with a report filed against coreutils' printf
command: <http://bugs.debian.org/421555>.

Comment 1 Victor Stinner 2007-04-30 12:40:53 UTC
(Copy of my email sent yesterday at 3:00AM.)
Hi,

I use your personnal emails [of few libc developers] because the bug might be 
a security vulnerability (I don't know Linux kernel enough to guess). If it is 
not, I can open a bug report on Bugzilla if you would like to.

I found a bug in dpkg program (from apt-get of Debian project):
   COLUMNS=10000000 dpkg -l
=> Crash with segfault (SIGSEGV)

After long investigation (around one week :-)), I'm certain that the bug comes
from GNU libc. The crash is not specific to this program, any program allowing
to change format string of printf() may crash. Smallest C testcase:
-------------------------------------------------------------
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>

int main()
{
    setlocale (LC_CTYPE, "");
    printf("%-1.30500200s\n", "Hello");
    return 0;
}
-------------------------------------------------------------

If your locale is not UTF-8, specify another multibyte locale to setlocale(). 
The value "30500200" just have to be bigger than current stack size limit.

You can also try with bash/core-utils printf:
-------------------------------------------------------------
printf '%-1.25000000s' 'Hello'
-------------------------------------------------------------


The bug is located in stdio-common/vfprintf.c, macro "process_string_arg", in 
this block:
-------------------------------------------------------------
   if (prec != -1)
     {
       /* Search for the end of the string, but don't search past
          the length (in bytes) specified by the precision.  Also
          don't use incomplete characters.  */
       if (_NL_CURRENT_WORD (LC_CTYPE, _NL_CTYPE_MB_CUR_MAX) == 1)
         len = __strnlen (string, prec);
       else
         {
           /* In case we have a multibyte character set the
              situation is more compilcated.  We must not copy
              bytes at the end which form an incomplete character. */
           wchar_t ignore[prec];
           const char *str2 = string;
           mbstate_t ps;

           memset (&ps, '\0', sizeof (ps));
           if (__mbsnrtowcs (ignore, &str2, prec, prec, &ps)
        == (size_t) -1)
             {
        done = -1;
        goto all_done;
             }
           if (str2 == NULL)
             len = strlen (string);
           else
             len = str2 - string - (ps.__count & 7);
         }
     }
   else
     len = strlen (string);
-------------------------------------------------------------

If 1 < prec and 1 < LC_CTYPE[_NL_CTYPE_MB_CUR_MAX], we go in "complicated" 
block :-)

Now imagine that prec is equal to 30500200: 30 MB will be "allocated" on the 
stack (by "wchar_t ignore[prec]") whereas Linux use 8 MB (in default config) 
for stack limit. Stack *should* grow up/down, but on my compute (i386) gcc 
just use "sub $eax, $esp" instruction to allocated memory and Linux just 
raises the signal SIGSEGV.

I don't know enough locale API (mbsnrtowcs() function) to fix the bug.

Victor Stinner
http://www.inl.fr/

Comment 2 Jakub Jelinek 2007-05-01 11:16:59 UTC
Fixed upstream.

Comment 3 Jakub Jelinek 2007-05-08 07:05:14 UTC
Fixed in glibc-2.5.90-22.