Bug 1751208 - Valgrind warning in tokenize.py
Summary: Valgrind warning in tokenize.py
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: python3
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Charalampos Stratakis
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-09-11 12:26 UTC by Alexander Larsson
Modified: 2019-09-13 08:21 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-09-12 07:47:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Python 38118 0 None None None 2019-09-11 15:29:23 UTC

Description Alexander Larsson 2019-09-11 12:26:54 UTC
When I was debugging a different issue (https://github.com/alexlarsson/gthree/issues/66) I noticed that importing "tokenize" in python 3 was causing a warning in valgrind. It can be seen as this:

$ valgrind python3 /usr/lib64/python3.7/tokenize.py
==12881== Memcheck, a memory error detector
==12881== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12881== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==12881== Command: python3 /usr/lib64/python3.7/tokenize.py
==12881== 
==12881== Conditional jump or move depends on uninitialised value(s)
==12881==    at 0x4B74CB7: PyUnicode_Decode (unicodeobject.c:3213)
==12881==    by 0x4B74F40: PyUnicode_FromEncodedObject (unicodeobject.c:3096)
==12881==    by 0x4B8B3D6: unicode_new (unicodeobject.c:15042)
==12881==    by 0x4B73164: UnknownInlinedFun (typeobject.c:951)
==12881==    by 0x4B73164: _PyObject_FastCallKeywords (call.c:199)
==12881==    by 0x4B74828: call_function (ceval.c:4619)
==12881==    by 0x4BABFC7: _PyEval_EvalFrameDefault (ceval.c:3124)
==12881==    by 0x4B63549: UnknownInlinedFun (call.c:283)
==12881==    by 0x4B63549: _PyFunction_FastCallDict (call.c:322)
==12881==    by 0x4B379C5: _PyObject_Call_Prepend (call.c:908)
==12881==    by 0x4B722E2: slot_tp_init (typeobject.c:6636)
==12881==    by 0x4B731C6: UnknownInlinedFun (typeobject.c:971)
==12881==    by 0x4B731C6: _PyObject_FastCallKeywords (call.c:199)
==12881==    by 0x4B74828: call_function (ceval.c:4619)
==12881==    by 0x4BABFC7: _PyEval_EvalFrameDefault (ceval.c:3124)


Further debugging localizes this to this call in tokenize.py:
  blank_re = re.compile(br'^[ \t\f]*(?:[#\r\n]|$)', re.ASCII)

It seems like the cause of it is the passing of a binary regexp for compilation:

$ valgrind python3 -c "import re; re.compile(br'a', re.ASCII)"
==12999== Memcheck, a memory error detector
==12999== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12999== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==12999== Command: python3 -c import\ re;\ re.compile(br'a',\ re.ASCII)
==12999== 
==12999== Conditional jump or move depends on uninitialised value(s)
==12999==    at 0x4B74CB7: PyUnicode_Decode (unicodeobject.c:3213)
==12999==    by 0x4B74F40: PyUnicode_FromEncodedObject (unicodeobject.c:3096)
==12999==    by 0x4B8B3D6: unicode_new (unicodeobject.c:15042)
==12999==    by 0x4B73164: UnknownInlinedFun (typeobject.c:951)
==12999==    by 0x4B73164: _PyObject_FastCallKeywords (call.c:199)
==12999==    by 0x4B74828: call_function (ceval.c:4619)
==12999==    by 0x4BABFC7: _PyEval_EvalFrameDefault (ceval.c:3124)
==12999==    by 0x4B63549: UnknownInlinedFun (call.c:283)
==12999==    by 0x4B63549: _PyFunction_FastCallDict (call.c:322)
==12999==    by 0x4B379C5: _PyObject_Call_Prepend (call.c:908)
==12999==    by 0x4B722E2: slot_tp_init (typeobject.c:6636)
==12999==    by 0x4B731C6: UnknownInlinedFun (typeobject.c:971)
==12999==    by 0x4B731C6: _PyObject_FastCallKeywords (call.c:199)
==12999==    by 0x4B74828: call_function (ceval.c:4619)
==12999==    by 0x4BABFC7: _PyEval_EvalFrameDefault (ceval.c:3124)

Its unclear at this point if this is the cause of the original issue or not, but it should still be fixed.

Comment 1 Miro Hrončok 2019-09-11 12:57:48 UTC
I can reproduce this with Python 3.5 to 3.8.

$ valgrind python3 -c "import re; re.compile(b'a')"
...
==19015== Conditional jump or move depends on uninitialised value(s)
==19015==    at 0x4B6BCB7: PyUnicode_Decode (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B6BF40: PyUnicode_FromEncodedObject (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B823D6: ??? (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B6A164: _PyObject_FastCallKeywords (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B6B828: ??? (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4BA2FC7: _PyEval_EvalFrameDefault (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B5A549: _PyFunction_FastCallDict (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B2E9C5: _PyObject_Call_Prepend (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B692E2: ??? (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B6A1C6: _PyObject_FastCallKeywords (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B6B828: ??? (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4BA2FC7: _PyEval_EvalFrameDefault (in /usr/lib64/libpython3.7m.so.1.0)

$ sudo dnf debuginfo-install python3-libs
...
$ valgrind python3 -c "import re; re.compile(b'a')"
...
==31338== Conditional jump or move depends on uninitialised value(s)
==31338==    at 0x4B6BCB7: PyUnicode_Decode (unicodeobject.c:3213)
==31338==    by 0x4B6BF40: PyUnicode_FromEncodedObject (unicodeobject.c:3096)
==31338==    by 0x4B823D6: unicode_new (unicodeobject.c:15042)
==31338==    by 0x4B6A164: UnknownInlinedFun (typeobject.c:951)
==31338==    by 0x4B6A164: _PyObject_FastCallKeywords (call.c:199)
==31338==    by 0x4B6B828: call_function (ceval.c:4619)
==31338==    by 0x4BA2FC7: _PyEval_EvalFrameDefault (ceval.c:3124)
==31338==    by 0x4B5A549: UnknownInlinedFun (call.c:283)
==31338==    by 0x4B5A549: _PyFunction_FastCallDict (call.c:322)
==31338==    by 0x4B2E9C5: _PyObject_Call_Prepend (call.c:908)
==31338==    by 0x4B692E2: slot_tp_init (typeobject.c:6636)
==31338==    by 0x4B6A1C6: UnknownInlinedFun (typeobject.c:971)
==31338==    by 0x4B6A1C6: _PyObject_FastCallKeywords (call.c:199)
==31338==    by 0x4B6B828: call_function (ceval.c:4619)
==31338==    by 0x4BA2FC7: _PyEval_EvalFrameDefault (ceval.c:3124)


I cannot reproduce with 3.4 (or 2.7).

Comment 2 Miro Hrončok 2019-09-11 13:02:45 UTC
On CPython master:


$ ./configure --with-pydebug --enable-shared && make
$ LD_LIBRARY_PATH=. valgrind -s ./python -c "import re; re.compile(b'a')"
==9395== Memcheck, a memory error detector
==9395== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9395== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==9395== Command: ./python -c import\ re;\ re.compile(b'a')
==9395== 
==9395== Invalid read of size 4
==9395==    at 0x4934451: address_in_range (obmalloc.c:1420)
==9395==    by 0x4935F76: pymalloc_free (obmalloc.c:1866)
==9395==    by 0x4935F76: _PyObject_Free (obmalloc.c:1920)
==9395==    by 0x4935150: _PyMem_DebugRawFree (obmalloc.c:2226)
==9395==    by 0x4935174: _PyMem_DebugFree (obmalloc.c:2356)
==9395==    by 0x49362A3: PyObject_Free (obmalloc.c:709)
==9395==    by 0x4918AC4: dictresize (dictobject.c:1255)
==9395==    by 0x4918ADA: insertion_resize (dictobject.c:1015)
==9395==    by 0x4920BC2: PyDict_SetDefault (dictobject.c:2917)
==9395==    by 0x499A2C4: PyUnicode_InternInPlace (unicodeobject.c:15366)
==9395==    by 0x499A641: PyUnicode_InternFromString (unicodeobject.c:15399)
==9395==    by 0x494786D: init_slotdefs (typeobject.c:7393)
==9395==    by 0x494BAA5: add_operators (typeobject.c:7623)
==9395==  Address 0x4fd1020 is 20 bytes after a block of size 220 alloc'd
==9395==    at 0x483880B: malloc (vg_replace_malloc.c:309)
==9395==    by 0x4934C17: _PyMem_RawMalloc (obmalloc.c:99)
==9395==    by 0x4934B0C: _PyMem_DebugRawAlloc (obmalloc.c:2159)
==9395==    by 0x4934B9F: _PyMem_DebugRawMalloc (obmalloc.c:2192)
==9395==    by 0x4935EC8: PyMem_RawMalloc (obmalloc.c:572)
==9395==    by 0x49360F8: _PyMem_RawWcsdup (obmalloc.c:644)
==9395==    by 0x4A1707E: PyConfig_SetString (initconfig.c:680)
==9395==    by 0x4A18852: _PyConfig_Copy (initconfig.c:790)
==9395==    by 0x4A243C5: pycore_create_interpreter (pylifecycle.c:531)
==9395==    by 0x4A2572D: pyinit_config (pylifecycle.c:679)
==9395==    by 0x4A277F9: pyinit_core (pylifecycle.c:853)
==9395==    by 0x4A28201: Py_InitializeFromConfig (pylifecycle.c:1028)
==9395== 
==9395== Invalid read of size 4
==9395==    at 0x4934451: address_in_range (obmalloc.c:1420)
==9395==    by 0x4937148: pymalloc_realloc (obmalloc.c:1954)
==9395==    by 0x4937202: _PyObject_Realloc (obmalloc.c:2007)
==9395==    by 0x493520C: _PyMem_DebugRawRealloc (obmalloc.c:2279)
==9395==    by 0x493540E: _PyMem_DebugRealloc (obmalloc.c:2364)
==9395==    by 0x49360A4: PyMem_Realloc (obmalloc.c:623)
==9395==    by 0x4903CEB: list_resize (listobject.c:70)
==9395==    by 0x4903E78: app1 (listobject.c:340)
==9395==    by 0x49097ED: PyList_Append (listobject.c:352)
==9395==    by 0x4A1A374: r_ref (marshal.c:945)
==9395==    by 0x4A1C1E9: r_object (marshal.c:1139)
==9395==    by 0x4A1C378: r_object (marshal.c:1195)
==9395==  Address 0x4ff7020 is 16 bytes after a block of size 640 in arena "client"
==9395== 
==9395== 
==9395== HEAP SUMMARY:
==9395==     in use at exit: 326,149 bytes in 159 blocks
==9395==   total heap usage: 2,243 allocs, 2,084 frees, 3,026,903 bytes allocated
==9395== 
==9395== LEAK SUMMARY:
==9395==    definitely lost: 0 bytes in 0 blocks
==9395==    indirectly lost: 0 bytes in 0 blocks
==9395==      possibly lost: 324,553 bytes in 155 blocks
==9395==    still reachable: 1,596 bytes in 4 blocks
==9395==         suppressed: 0 bytes in 0 blocks
==9395== Rerun with --leak-check=full to see details of leaked memory
==9395== 
==9395== ERROR SUMMARY: 1002 errors from 2 contexts (suppressed: 0 from 0)
==9395== 
==9395== 69 errors in context 1 of 2:
==9395== Invalid read of size 4
==9395==    at 0x4934451: address_in_range (obmalloc.c:1420)
==9395==    by 0x4937148: pymalloc_realloc (obmalloc.c:1954)
==9395==    by 0x4937202: _PyObject_Realloc (obmalloc.c:2007)
==9395==    by 0x493520C: _PyMem_DebugRawRealloc (obmalloc.c:2279)
==9395==    by 0x493540E: _PyMem_DebugRealloc (obmalloc.c:2364)
==9395==    by 0x49360A4: PyMem_Realloc (obmalloc.c:623)
==9395==    by 0x4903CEB: list_resize (listobject.c:70)
==9395==    by 0x4903E78: app1 (listobject.c:340)
==9395==    by 0x49097ED: PyList_Append (listobject.c:352)
==9395==    by 0x4A1A374: r_ref (marshal.c:945)
==9395==    by 0x4A1C1E9: r_object (marshal.c:1139)
==9395==    by 0x4A1C378: r_object (marshal.c:1195)
==9395==  Address 0x4ff7020 is 16 bytes after a block of size 640 in arena "client"
==9395== 
==9395== 
==9395== 933 errors in context 2 of 2:
==9395== Invalid read of size 4
==9395==    at 0x4934451: address_in_range (obmalloc.c:1420)
==9395==    by 0x4935F76: pymalloc_free (obmalloc.c:1866)
==9395==    by 0x4935F76: _PyObject_Free (obmalloc.c:1920)
==9395==    by 0x4935150: _PyMem_DebugRawFree (obmalloc.c:2226)
==9395==    by 0x4935174: _PyMem_DebugFree (obmalloc.c:2356)
==9395==    by 0x49362A3: PyObject_Free (obmalloc.c:709)
==9395==    by 0x4918AC4: dictresize (dictobject.c:1255)
==9395==    by 0x4918ADA: insertion_resize (dictobject.c:1015)
==9395==    by 0x4920BC2: PyDict_SetDefault (dictobject.c:2917)
==9395==    by 0x499A2C4: PyUnicode_InternInPlace (unicodeobject.c:15366)
==9395==    by 0x499A641: PyUnicode_InternFromString (unicodeobject.c:15399)
==9395==    by 0x494786D: init_slotdefs (typeobject.c:7393)
==9395==    by 0x494BAA5: add_operators (typeobject.c:7623)
==9395==  Address 0x4fd1020 is 20 bytes after a block of size 220 alloc'd
==9395==    at 0x483880B: malloc (vg_replace_malloc.c:309)
==9395==    by 0x4934C17: _PyMem_RawMalloc (obmalloc.c:99)
==9395==    by 0x4934B0C: _PyMem_DebugRawAlloc (obmalloc.c:2159)
==9395==    by 0x4934B9F: _PyMem_DebugRawMalloc (obmalloc.c:2192)
==9395==    by 0x4935EC8: PyMem_RawMalloc (obmalloc.c:572)
==9395==    by 0x49360F8: _PyMem_RawWcsdup (obmalloc.c:644)
==9395==    by 0x4A1707E: PyConfig_SetString (initconfig.c:680)
==9395==    by 0x4A18852: _PyConfig_Copy (initconfig.c:790)
==9395==    by 0x4A243C5: pycore_create_interpreter (pylifecycle.c:531)
==9395==    by 0x4A2572D: pyinit_config (pylifecycle.c:679)
==9395==    by 0x4A277F9: pyinit_core (pylifecycle.c:853)
==9395==    by 0x4A28201: Py_InitializeFromConfig (pylifecycle.c:1028)
==9395== 
==9395== ERROR SUMMARY: 1002 errors from 2 contexts (suppressed: 0 from 0)

Comment 3 Victor Stinner 2019-09-11 15:29:24 UTC
I reported the issue to Python upstream: https://bugs.python.org/issue38118

Comment 4 Miro Hrončok 2019-09-12 07:47:56 UTC
Thanks Victor. 


Alexander, I'm closing this, ti will most likely eventually get fixed upstream. If you need us to backport the fix or actively work on this, please reopen and let us know.

Comment 5 Alexander Larsson 2019-09-12 16:05:26 UTC
Sigh, it was closed because one of the warnings was already fixed :(

Comment 6 Miro Hrončok 2019-09-12 16:32:15 UTC
Yes, it will be fixed in the next 3.8 and 3.7 release. Why :( ?

Comment 7 Victor Stinner 2019-09-13 08:09:50 UTC
> Sigh, it was closed because one of the warnings was already fixed :(

If you are talking about https://bugs.python.org/issue38118 : I just reopened it.

Comment 8 Alexander Larsson 2019-09-13 08:21:04 UTC
Thanks!


Note You need to log in before you can comment on or make changes to this bug.