Bug 1986421 - glibc: New C.UTF-8 locale breaks sed
Summary: glibc: New C.UTF-8 locale breaks sed
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Carlos O'Donell
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1986428 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-27 13:52 UTC by Florian Weimer
Modified: 2021-08-03 13:18 UTC (History)
14 users (show)

Fixed In Version: glibc-2.33.9000-54.fc35
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-03 13:18:59 UTC
Type: Bug


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Fedora Pagure releng issue 10234 0 None None None 2021-07-27 18:24:12 UTC

Description Florian Weimer 2021-07-27 13:52:34 UTC
As reported on the devel list.  Seen with glibc-2.33.9000-53.fc35.x86_64.

$ echo '#define _AUD_PLUGIN_VERSION 48     /* 3.8-devel */' | LC_ALL=C.UTF-8 sed 's!.*_AUD_PLUGIN_VERSION[ ]*\([0-9]\+\).*!{{\1}}!'
{{48     /* 3.8-devel */}}

[0-9] matches non-digit characters, which is bad.

Source: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/4FLMNK57MYZMKBYEQDNIXO7S3TRNRKCJ/

Comment 1 Petr Pisar 2021-07-27 14:04:19 UTC
*** Bug 1986428 has been marked as a duplicate of this bug. ***

Comment 2 Carlos O'Donell 2021-07-27 15:00:27 UTC
I can reproduce this locally with sed-4.8-5.fc33.x86_64 in F33 and an upstream in-development glibc.

[carlos@athas glibc-work]$ echo '#define _AUD_PLUGIN_VERSION 48     /* 3.8-devel */' | env GCONV_PATH=/home/carlos/build/glibc-work/iconvdata LOCPATH=/home/carlos/build/glibc-work/localedata LC_ALL=C.UTF-8 /home/carlos/build/glibc-work/elf/ld.so --library-path /home/carlos/build/glibc-work:/home/carlos/build/glibc-work/nptl:/home/carlos/build/glibc-work/elf:/home/carlos/build/glibc-work/dlfcn /usr/bin/sed 's!.*_AUD_PLUGIN_VERSION[ ]*\([0-9]\+\).*!{{\1}}!'
{{48     /* 3.8-devel */}}

I'm reviewing.

Comment 3 Florian Weimer 2021-07-27 15:05:37 UTC
It's a busted locale compiler:

(gdb) print collseqmb
$22 = (const unsigned char *) 0x7ffff7f9c05c "UTF-8"

The codeset name is treated as a the collation sequence array.

Comment 4 Florian Weimer 2021-07-27 15:08:01 UTC
Note that GDB does not give me the right collseqmb variable in all places. The above output is from regecomp.c:3110, after the assignment (with -O0):

  collseqmb = (const unsigned char *)
    _NL_CURRENT (LC_COLLATE, _NL_COLLATE_COLLSEQMB);
  nrules = _NL_CURRENT_WORD (LC_COLLATE, _NL_COLLATE_NRULES);

So there's no nested function there.

Comment 5 Carlos O'Donell 2021-07-27 17:37:09 UTC
Fixing this requires additional changes.

I've pushed this to codonell/c-utf8

commit 3bca8f2cb69f1cc453511e1b8ac5be5cceb35bd7 (HEAD -> codonell/c-utf8, origin/codonell/c-utf8)
Author: Carlos O'Donell <carlos@redhat.com>
Date:   Tue Jul 27 13:32:26 2021 -0400

    Fix fnmatch and regcomp for zero collation rule locales.
    
    Add test coverage for a zero rule locale via additional
    testing in bug-regex1, bug-regex4, bug-regex6, bug-regex19,
    transbug, tst-fnmatch, tst-fnmatch7, tst-regcomp-truncated,
    and tst-regex using C.UTF-8 (zero collation rule locale).

I think I'd delete and rebase the branch with these changes split out a bit better e.g.

* All the fixes in fnmatch/regcomp first for zero collation rule locales.
* Add strcmp_collation that allows a zero collation rule locale to exist.
* Add C.UTF-8 with additional testing coverage.

Comment 6 Florian Weimer 2021-07-27 20:15:56 UTC
Build of glibc-2.33.9000-54.fc35 with the revert is finally running (it was blocked by broken gd in the buildroot).

Comment 7 Steven Usdansky 2021-07-30 01:59:42 UTC
Not sure if this is the same issue affecting falkon-3.1.0-9.fc35.x86_64, which worked with glibc-2.33.9000-43.fc35.x86_64 and does not work with glibc-2.33.9000-50.fc35.x86_64 or higher, including glibc-2.33.9000-54.fc35.x86_64. Does not work means can't load anything in a tab, tab throbber hangs tab text blinks as if it's being constantly refreshed. Sorry about the poor description. Reverting glibc and its dependents from 2.33.9000-50 to 2.33.9000-43 solved the problem. With today's updates too many dependents to revert, so I'm stuck with a non-working falkon. No problem with Firefox-90.0.2-1.fc35.x86_64

Comment 8 Florian Weimer 2021-07-30 05:37:47 UTC
(In reply to Steven Usdansky from comment #7)
> Not sure if this is the same issue affecting falkon-3.1.0-9.fc35.x86_64,
> which worked with glibc-2.33.9000-43.fc35.x86_64 and does not work with
> glibc-2.33.9000-50.fc35.x86_64 or higher, including
> glibc-2.33.9000-54.fc35.x86_64.

As falkon is Chromium-based, this is probably bug 1976893.

Comment 9 Petr Pisar 2021-07-30 07:42:57 UTC
I confirm that sed works again with glibc-2.33.9000-55.fc35.x86_64.


Note You need to log in before you can comment on or make changes to this bug.