2028497 – "re.error: bad character range l-3 at position 140" in /usr/lib64/python3.10/sre_parse.py

Bug 2028497 - "re.error: bad character range l-3 at position 140" in /usr/lib64/python3.10/sre_parse.py

Summary: "re.error: bad character range l-3 at position 140" in /usr/lib64/python3.10/...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	python3.10
Sub Component:
Version:	35
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Python Maintainers
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-12-02 14:06 UTC by Renaud Métrich
Modified:	2022-01-04 10:25 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2022-01-04 10:25:19 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Renaud Métrich 2021-12-02 14:06:36 UTC

Description of problem:

Since upgrading my laptop to Fedora 35 (from Fedora 34), I get a weird error when using the pytester module and "fnmatch_lines()" method.
This happens when I match 2 non-consecutive lines and the 2nd line has square brackets.

/usr/lib64/python3.10/fnmatch.py:42: in fnmatch
    return fnmatchcase(name, pat)
/usr/lib64/python3.10/fnmatch.py:76: in fnmatchcase
    match = _compile_pattern(pat)
/usr/lib64/python3.10/fnmatch.py:52: in _compile_pattern
    return re.compile(res).match
/usr/lib64/python3.10/re.py:251: in compile
    return _compile(pattern, flags)
/usr/lib64/python3.10/re.py:303: in _compile
    p = sre_compile.compile(pattern, flags)
/usr/lib64/python3.10/sre_compile.py:764: in compile
    p = sre_parse.parse(p, flags)
/usr/lib64/python3.10/sre_parse.py:948: in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
/usr/lib64/python3.10/sre_parse.py:443: in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
/usr/lib64/python3.10/sre_parse.py:834: in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
/usr/lib64/python3.10/sre_parse.py:443: in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

source = <sre_parse.Tokenizer object at 0x7f4777bb86a0>, state = <sre_parse.State object at 0x7f4777bbb4f0>
verbose = 0, nested = 3, first = False

[...]
                            msg = "bad character range %s-%s" % (this, that)
>                           raise source.error(msg, len(this) + 1 + len(that))
E                           re.error: bad character range l-3 at position 26

/usr/lib64/python3.10/sre_parse.py:598: error


Version-Release number of selected component (if applicable):

python3-libs-3.10.0-1.fc35.x86_64


How reproducible:

Always


Steps to Reproduce:
1. Create "hello" script writing to stdout

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  #!/bin/sh

  echo "LINE 1"
  echo "DUMMY LINE"
  echo "LINE 2 ['/lib64/libnl-3.so.200']"
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

  $ chmod +x hello

2. Create pytester python script "test_hello.py"

  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
  import os
  import pytest
  pytest_plugins = 'pytester'
  
  EXPECTED = [
      "LINE 1",
      "LINE 2 ['/lib64/libnl-3.so.200']",
  ]
  
  def test_hello(pytester, request):
      testdir = os.path.dirname(request.fspath)
      prog = os.path.join(testdir, 'hello')
      args = [ prog ]
      result = pytester.run(*args)
      result.stdout.fnmatch_lines(EXPECTED)
  -------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

3. Execute pytest

  $ python3 -m pytest

Actual results:

error above

Expected results:

success, no error

Additional info:

This seems to happen only if lines to match are not consecutive: if I remove "DUMMY LINE" from "hello" script, this works fine.

Comment 1 Miro Hrončok 2021-12-02 15:22:29 UTC

fnmatch is supposed to match Unix-path globs, isn't it? The "LINE 2 ['/lib64/libnl-3.so.200']" glob contains [ and ] characters -- in glob language, it means any character from within that range.

However, the range contains repeated characters, which - when converted to regex - seems to be invalid for Python (3.9 and 3.10 alike):


>>> import re, fnmatch
>>> fnmatch.translate("LINE 2 ['/lib64/libnl-3.so.200']")
"(?s:LINE\\ 2\\ ['/lib64/libnl-3.so.200'])\\Z"
>>> re.match(fnmatch.translate("LINE 2 ['/lib64/libnl-3.so.200']"), "DUMMY LINE")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.9/re.py", line 191, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib64/python3.9/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib64/python3.9/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib64/python3.9/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib64/python3.9/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib64/python3.9/sre_parse.py", line 834, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "/usr/lib64/python3.9/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib64/python3.9/sre_parse.py", line 598, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range l-3 at position 26

We see that position 26 is the first character in that range that is repeated:

>>> fnmatch.translate("LINE 2 ['/lib64/libnl-3.so.200']")[:26]
"(?s:LINE\\ 2\\ ['/lib64/libn"
>>> fnmatch.translate("LINE 2 ['/lib64/libnl-3.so.200']")[26:]
"l-3.so.200'])\\Z"

This also happens when using fnmatch() directly:

>>> fnmatch.fnmatch("DUMMY LINE", "LINE 2 ['/lib64/libnl-3.so.200']")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.9/fnmatch.py", line 42, in fnmatch
    return fnmatchcase(name, pat)
  File "/usr/lib64/python3.9/fnmatch.py", line 76, in fnmatchcase
    match = _compile_pattern(pat)
  File "/usr/lib64/python3.9/fnmatch.py", line 52, in _compile_pattern
    return re.compile(res).match
  File "/usr/lib64/python3.9/re.py", line 252, in compile
    return _compile(pattern, flags)
  File "/usr/lib64/python3.9/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib64/python3.9/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib64/python3.9/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib64/python3.9/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib64/python3.9/sre_parse.py", line 834, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "/usr/lib64/python3.9/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib64/python3.9/sre_parse.py", line 598, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range l-3 at position 26



I don't know what changed here, but I suppose passing in an invalid glob like this was never supposed to work.

Comment 2 Miro Hrončok 2021-12-02 15:26:46 UTC

As for Python 3.9 vs Python 3.10, I can reproduce the error with your example even with Python 3.9 on Fedora 33 (I don't have a Fedora 34 machine available now).

Comment 3 Miro Hrončok 2021-12-02 15:28:29 UTC

I can reproduce this in Fedora 34 with Python 3.9 as well.

How sure you are that this is only happening "Since upgrading [your] laptop to Fedora 35 (from Fedora 34)"?

Comment 4 Renaud Métrich 2021-12-02 17:03:18 UTC

Thanks for looking into this.
Actually I have a CI for an internal project with these kind of strings running for months, and I'm 99% sure this wasn't happening before.

The fact that when matched lines are consecutive this works as I'm expecting puzzles me as well.

Anyway, thanks, I think I will go with escaping these lines using glob.escape().

Comment 5 Miro Hrončok 2021-12-02 18:11:40 UTC

> The fact that when matched lines are consecutive this works as I'm expecting puzzles me as well.

That is because of the way pytester implements the check. You can see here: https://github.com/pytest-dev/pytest/blob/6.2.5/src/_pytest/pytester.py#L1841

If the line matches exactly it is considered as a match, and only if it doesn't match exactly it goes further and calls match_func (which is fnamatch in this case).


> Actually I have a CI for an internal project with these kind of strings running for months, and I'm 99% sure this wasn't happening before.

It might be because it always got the exact match up until now?

Comment 6 Miro Hrončok 2022-01-04 10:25:19 UTC

Feel free to reopen if you still think this is actually a regression in Python.

Note You need to log in before you can comment on or make changes to this bug.