Bug 1751208

Summary: Valgrind warning in tokenize.py
Product: [Fedora] Fedora Reporter: Alexander Larsson <alexl>
Component: python3Assignee: Charalampos Stratakis <cstratak>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 30CC: cstratak, dmalcolm, m.cyprian, mhroncok, pviktori, rkuska, shcherbina.iryna, slavek.kabrda, tomspur, torsava, vstinner
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-12 07:47:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexander Larsson 2019-09-11 12:26:54 UTC
When I was debugging a different issue (https://github.com/alexlarsson/gthree/issues/66) I noticed that importing "tokenize" in python 3 was causing a warning in valgrind. It can be seen as this:

$ valgrind python3 /usr/lib64/python3.7/tokenize.py
==12881== Memcheck, a memory error detector
==12881== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12881== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==12881== Command: python3 /usr/lib64/python3.7/tokenize.py
==12881== 
==12881== Conditional jump or move depends on uninitialised value(s)
==12881==    at 0x4B74CB7: PyUnicode_Decode (unicodeobject.c:3213)
==12881==    by 0x4B74F40: PyUnicode_FromEncodedObject (unicodeobject.c:3096)
==12881==    by 0x4B8B3D6: unicode_new (unicodeobject.c:15042)
==12881==    by 0x4B73164: UnknownInlinedFun (typeobject.c:951)
==12881==    by 0x4B73164: _PyObject_FastCallKeywords (call.c:199)
==12881==    by 0x4B74828: call_function (ceval.c:4619)
==12881==    by 0x4BABFC7: _PyEval_EvalFrameDefault (ceval.c:3124)
==12881==    by 0x4B63549: UnknownInlinedFun (call.c:283)
==12881==    by 0x4B63549: _PyFunction_FastCallDict (call.c:322)
==12881==    by 0x4B379C5: _PyObject_Call_Prepend (call.c:908)
==12881==    by 0x4B722E2: slot_tp_init (typeobject.c:6636)
==12881==    by 0x4B731C6: UnknownInlinedFun (typeobject.c:971)
==12881==    by 0x4B731C6: _PyObject_FastCallKeywords (call.c:199)
==12881==    by 0x4B74828: call_function (ceval.c:4619)
==12881==    by 0x4BABFC7: _PyEval_EvalFrameDefault (ceval.c:3124)


Further debugging localizes this to this call in tokenize.py:
  blank_re = re.compile(br'^[ \t\f]*(?:[#\r\n]|$)', re.ASCII)

It seems like the cause of it is the passing of a binary regexp for compilation:

$ valgrind python3 -c "import re; re.compile(br'a', re.ASCII)"
==12999== Memcheck, a memory error detector
==12999== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12999== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==12999== Command: python3 -c import\ re;\ re.compile(br'a',\ re.ASCII)
==12999== 
==12999== Conditional jump or move depends on uninitialised value(s)
==12999==    at 0x4B74CB7: PyUnicode_Decode (unicodeobject.c:3213)
==12999==    by 0x4B74F40: PyUnicode_FromEncodedObject (unicodeobject.c:3096)
==12999==    by 0x4B8B3D6: unicode_new (unicodeobject.c:15042)
==12999==    by 0x4B73164: UnknownInlinedFun (typeobject.c:951)
==12999==    by 0x4B73164: _PyObject_FastCallKeywords (call.c:199)
==12999==    by 0x4B74828: call_function (ceval.c:4619)
==12999==    by 0x4BABFC7: _PyEval_EvalFrameDefault (ceval.c:3124)
==12999==    by 0x4B63549: UnknownInlinedFun (call.c:283)
==12999==    by 0x4B63549: _PyFunction_FastCallDict (call.c:322)
==12999==    by 0x4B379C5: _PyObject_Call_Prepend (call.c:908)
==12999==    by 0x4B722E2: slot_tp_init (typeobject.c:6636)
==12999==    by 0x4B731C6: UnknownInlinedFun (typeobject.c:971)
==12999==    by 0x4B731C6: _PyObject_FastCallKeywords (call.c:199)
==12999==    by 0x4B74828: call_function (ceval.c:4619)
==12999==    by 0x4BABFC7: _PyEval_EvalFrameDefault (ceval.c:3124)

Its unclear at this point if this is the cause of the original issue or not, but it should still be fixed.

Comment 1 Miro Hrončok 2019-09-11 12:57:48 UTC
I can reproduce this with Python 3.5 to 3.8.

$ valgrind python3 -c "import re; re.compile(b'a')"
...
==19015== Conditional jump or move depends on uninitialised value(s)
==19015==    at 0x4B6BCB7: PyUnicode_Decode (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B6BF40: PyUnicode_FromEncodedObject (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B823D6: ??? (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B6A164: _PyObject_FastCallKeywords (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B6B828: ??? (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4BA2FC7: _PyEval_EvalFrameDefault (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B5A549: _PyFunction_FastCallDict (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B2E9C5: _PyObject_Call_Prepend (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B692E2: ??? (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B6A1C6: _PyObject_FastCallKeywords (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4B6B828: ??? (in /usr/lib64/libpython3.7m.so.1.0)
==19015==    by 0x4BA2FC7: _PyEval_EvalFrameDefault (in /usr/lib64/libpython3.7m.so.1.0)

$ sudo dnf debuginfo-install python3-libs
...
$ valgrind python3 -c "import re; re.compile(b'a')"
...
==31338== Conditional jump or move depends on uninitialised value(s)
==31338==    at 0x4B6BCB7: PyUnicode_Decode (unicodeobject.c:3213)
==31338==    by 0x4B6BF40: PyUnicode_FromEncodedObject (unicodeobject.c:3096)
==31338==    by 0x4B823D6: unicode_new (unicodeobject.c:15042)
==31338==    by 0x4B6A164: UnknownInlinedFun (typeobject.c:951)
==31338==    by 0x4B6A164: _PyObject_FastCallKeywords (call.c:199)
==31338==    by 0x4B6B828: call_function (ceval.c:4619)
==31338==    by 0x4BA2FC7: _PyEval_EvalFrameDefault (ceval.c:3124)
==31338==    by 0x4B5A549: UnknownInlinedFun (call.c:283)
==31338==    by 0x4B5A549: _PyFunction_FastCallDict (call.c:322)
==31338==    by 0x4B2E9C5: _PyObject_Call_Prepend (call.c:908)
==31338==    by 0x4B692E2: slot_tp_init (typeobject.c:6636)
==31338==    by 0x4B6A1C6: UnknownInlinedFun (typeobject.c:971)
==31338==    by 0x4B6A1C6: _PyObject_FastCallKeywords (call.c:199)
==31338==    by 0x4B6B828: call_function (ceval.c:4619)
==31338==    by 0x4BA2FC7: _PyEval_EvalFrameDefault (ceval.c:3124)


I cannot reproduce with 3.4 (or 2.7).

Comment 2 Miro Hrončok 2019-09-11 13:02:45 UTC
On CPython master:


$ ./configure --with-pydebug --enable-shared && make
$ LD_LIBRARY_PATH=. valgrind -s ./python -c "import re; re.compile(b'a')"
==9395== Memcheck, a memory error detector
==9395== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9395== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==9395== Command: ./python -c import\ re;\ re.compile(b'a')
==9395== 
==9395== Invalid read of size 4
==9395==    at 0x4934451: address_in_range (obmalloc.c:1420)
==9395==    by 0x4935F76: pymalloc_free (obmalloc.c:1866)
==9395==    by 0x4935F76: _PyObject_Free (obmalloc.c:1920)
==9395==    by 0x4935150: _PyMem_DebugRawFree (obmalloc.c:2226)
==9395==    by 0x4935174: _PyMem_DebugFree (obmalloc.c:2356)
==9395==    by 0x49362A3: PyObject_Free (obmalloc.c:709)
==9395==    by 0x4918AC4: dictresize (dictobject.c:1255)
==9395==    by 0x4918ADA: insertion_resize (dictobject.c:1015)
==9395==    by 0x4920BC2: PyDict_SetDefault (dictobject.c:2917)
==9395==    by 0x499A2C4: PyUnicode_InternInPlace (unicodeobject.c:15366)
==9395==    by 0x499A641: PyUnicode_InternFromString (unicodeobject.c:15399)
==9395==    by 0x494786D: init_slotdefs (typeobject.c:7393)
==9395==    by 0x494BAA5: add_operators (typeobject.c:7623)
==9395==  Address 0x4fd1020 is 20 bytes after a block of size 220 alloc'd
==9395==    at 0x483880B: malloc (vg_replace_malloc.c:309)
==9395==    by 0x4934C17: _PyMem_RawMalloc (obmalloc.c:99)
==9395==    by 0x4934B0C: _PyMem_DebugRawAlloc (obmalloc.c:2159)
==9395==    by 0x4934B9F: _PyMem_DebugRawMalloc (obmalloc.c:2192)
==9395==    by 0x4935EC8: PyMem_RawMalloc (obmalloc.c:572)
==9395==    by 0x49360F8: _PyMem_RawWcsdup (obmalloc.c:644)
==9395==    by 0x4A1707E: PyConfig_SetString (initconfig.c:680)
==9395==    by 0x4A18852: _PyConfig_Copy (initconfig.c:790)
==9395==    by 0x4A243C5: pycore_create_interpreter (pylifecycle.c:531)
==9395==    by 0x4A2572D: pyinit_config (pylifecycle.c:679)
==9395==    by 0x4A277F9: pyinit_core (pylifecycle.c:853)
==9395==    by 0x4A28201: Py_InitializeFromConfig (pylifecycle.c:1028)
==9395== 
==9395== Invalid read of size 4
==9395==    at 0x4934451: address_in_range (obmalloc.c:1420)
==9395==    by 0x4937148: pymalloc_realloc (obmalloc.c:1954)
==9395==    by 0x4937202: _PyObject_Realloc (obmalloc.c:2007)
==9395==    by 0x493520C: _PyMem_DebugRawRealloc (obmalloc.c:2279)
==9395==    by 0x493540E: _PyMem_DebugRealloc (obmalloc.c:2364)
==9395==    by 0x49360A4: PyMem_Realloc (obmalloc.c:623)
==9395==    by 0x4903CEB: list_resize (listobject.c:70)
==9395==    by 0x4903E78: app1 (listobject.c:340)
==9395==    by 0x49097ED: PyList_Append (listobject.c:352)
==9395==    by 0x4A1A374: r_ref (marshal.c:945)
==9395==    by 0x4A1C1E9: r_object (marshal.c:1139)
==9395==    by 0x4A1C378: r_object (marshal.c:1195)
==9395==  Address 0x4ff7020 is 16 bytes after a block of size 640 in arena "client"
==9395== 
==9395== 
==9395== HEAP SUMMARY:
==9395==     in use at exit: 326,149 bytes in 159 blocks
==9395==   total heap usage: 2,243 allocs, 2,084 frees, 3,026,903 bytes allocated
==9395== 
==9395== LEAK SUMMARY:
==9395==    definitely lost: 0 bytes in 0 blocks
==9395==    indirectly lost: 0 bytes in 0 blocks
==9395==      possibly lost: 324,553 bytes in 155 blocks
==9395==    still reachable: 1,596 bytes in 4 blocks
==9395==         suppressed: 0 bytes in 0 blocks
==9395== Rerun with --leak-check=full to see details of leaked memory
==9395== 
==9395== ERROR SUMMARY: 1002 errors from 2 contexts (suppressed: 0 from 0)
==9395== 
==9395== 69 errors in context 1 of 2:
==9395== Invalid read of size 4
==9395==    at 0x4934451: address_in_range (obmalloc.c:1420)
==9395==    by 0x4937148: pymalloc_realloc (obmalloc.c:1954)
==9395==    by 0x4937202: _PyObject_Realloc (obmalloc.c:2007)
==9395==    by 0x493520C: _PyMem_DebugRawRealloc (obmalloc.c:2279)
==9395==    by 0x493540E: _PyMem_DebugRealloc (obmalloc.c:2364)
==9395==    by 0x49360A4: PyMem_Realloc (obmalloc.c:623)
==9395==    by 0x4903CEB: list_resize (listobject.c:70)
==9395==    by 0x4903E78: app1 (listobject.c:340)
==9395==    by 0x49097ED: PyList_Append (listobject.c:352)
==9395==    by 0x4A1A374: r_ref (marshal.c:945)
==9395==    by 0x4A1C1E9: r_object (marshal.c:1139)
==9395==    by 0x4A1C378: r_object (marshal.c:1195)
==9395==  Address 0x4ff7020 is 16 bytes after a block of size 640 in arena "client"
==9395== 
==9395== 
==9395== 933 errors in context 2 of 2:
==9395== Invalid read of size 4
==9395==    at 0x4934451: address_in_range (obmalloc.c:1420)
==9395==    by 0x4935F76: pymalloc_free (obmalloc.c:1866)
==9395==    by 0x4935F76: _PyObject_Free (obmalloc.c:1920)
==9395==    by 0x4935150: _PyMem_DebugRawFree (obmalloc.c:2226)
==9395==    by 0x4935174: _PyMem_DebugFree (obmalloc.c:2356)
==9395==    by 0x49362A3: PyObject_Free (obmalloc.c:709)
==9395==    by 0x4918AC4: dictresize (dictobject.c:1255)
==9395==    by 0x4918ADA: insertion_resize (dictobject.c:1015)
==9395==    by 0x4920BC2: PyDict_SetDefault (dictobject.c:2917)
==9395==    by 0x499A2C4: PyUnicode_InternInPlace (unicodeobject.c:15366)
==9395==    by 0x499A641: PyUnicode_InternFromString (unicodeobject.c:15399)
==9395==    by 0x494786D: init_slotdefs (typeobject.c:7393)
==9395==    by 0x494BAA5: add_operators (typeobject.c:7623)
==9395==  Address 0x4fd1020 is 20 bytes after a block of size 220 alloc'd
==9395==    at 0x483880B: malloc (vg_replace_malloc.c:309)
==9395==    by 0x4934C17: _PyMem_RawMalloc (obmalloc.c:99)
==9395==    by 0x4934B0C: _PyMem_DebugRawAlloc (obmalloc.c:2159)
==9395==    by 0x4934B9F: _PyMem_DebugRawMalloc (obmalloc.c:2192)
==9395==    by 0x4935EC8: PyMem_RawMalloc (obmalloc.c:572)
==9395==    by 0x49360F8: _PyMem_RawWcsdup (obmalloc.c:644)
==9395==    by 0x4A1707E: PyConfig_SetString (initconfig.c:680)
==9395==    by 0x4A18852: _PyConfig_Copy (initconfig.c:790)
==9395==    by 0x4A243C5: pycore_create_interpreter (pylifecycle.c:531)
==9395==    by 0x4A2572D: pyinit_config (pylifecycle.c:679)
==9395==    by 0x4A277F9: pyinit_core (pylifecycle.c:853)
==9395==    by 0x4A28201: Py_InitializeFromConfig (pylifecycle.c:1028)
==9395== 
==9395== ERROR SUMMARY: 1002 errors from 2 contexts (suppressed: 0 from 0)

Comment 3 Victor Stinner 2019-09-11 15:29:24 UTC
I reported the issue to Python upstream: https://bugs.python.org/issue38118

Comment 4 Miro Hrončok 2019-09-12 07:47:56 UTC
Thanks Victor. 


Alexander, I'm closing this, ti will most likely eventually get fixed upstream. If you need us to backport the fix or actively work on this, please reopen and let us know.

Comment 5 Alexander Larsson 2019-09-12 16:05:26 UTC
Sigh, it was closed because one of the warnings was already fixed :(

Comment 6 Miro Hrončok 2019-09-12 16:32:15 UTC
Yes, it will be fixed in the next 3.8 and 3.7 release. Why :( ?

Comment 7 Victor Stinner 2019-09-13 08:09:50 UTC
> Sigh, it was closed because one of the warnings was already fixed :(

If you are talking about https://bugs.python.org/issue38118 : I just reopened it.

Comment 8 Alexander Larsson 2019-09-13 08:21:04 UTC
Thanks!