Bug 549651 - Grep incorrect work with Unicode string
Summary: Grep incorrect work with Unicode string
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: grep
Version: 32
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Jaroslav Škarvada
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-12-22 09:16 UTC by Pavel Zhukov
Modified: 2021-05-25 18:42 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-25 18:42:52 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Pavel Zhukov 2009-12-22 09:16:03 UTC
Description of problem:

Grep incorrect work with Russian text

Version-Release number of selected component (if applicable):

Version     : 2.5.3                           
Release     : 4.fc11  

How reproducible:

Try to do such commands:

$ echo "Это просто текст" | grep '\<просто\>'
(no result)

$ echo "This is a text" | grep '\<is\>'
This is a text

  
Actual results:
sed and other utilites work properly, for example:

$ echo  "Это просто простой текст" | sed s/'\<просто\>'/'не очень'/
Это не очень простой текст

$ locale
LANG=ru_RU.UTF-8

in other *nix systems it works propely (Debian, RHEL and FreeBSD have been tested)

Comment 1 Jaroslav Škarvada 2010-04-13 07:45:49 UTC
The latest grep-2.6.3 is also affected, forwarded to upstream bugzilla.

Comment 5 Bug Zapper 2010-04-28 11:36:40 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 6 Bug Zapper 2010-07-30 10:48:58 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Jaroslav Škarvada 2010-09-21 14:15:39 UTC
Upstream bugreport: https://savannah.gnu.org/bugs/?29537

Comment 8 Russ Hammer 2010-10-20 23:31:00 UTC
I think this needs to be a higher severity than just medium! This is totally broken and will cause data corruption and incorrect results in scripts.

$ printf "%s\n" {a..z} {A..Z} | grep '[^a-z]'
Z
$ grep --version
GNU grep 2.6.3

Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Comment 9 Russ Hammer 2010-10-20 23:36:44 UTC
This is more than just case insensitivity. The severity needs to be upped.

$ printf "%s\n" {a..z} {A..Z} | grep '[^A-Z]'
a

$ rpm -q grep
grep-2.6.3-1.fc13.x86_64

Comment 10 Jaroslav Škarvada 2010-10-21 07:38:54 UTC
(In reply to comment #9)
> This is more than just case insensitivity. The severity needs to be upped.
> 
> $ printf "%s\n" {a..z} {A..Z} | grep '[^A-Z]'
> a
> 
> $ rpm -q grep
> grep-2.6.3-1.fc13.x86_64

This seems to be unrelated to this bug report and it was fixed in grep-2.7 that is currently available in F14.

Comment 11 Jaroslav Škarvada 2010-12-03 13:53:02 UTC
Reopened, it still does not work in RU locale.

Comment 12 Fedora End Of Life 2013-04-03 20:06:26 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 13 Andrey Zaslavskiy 2014-02-20 07:14:32 UTC
Same thing with --word-regexp

grep --version
grep (GNU grep) 2.16


cat rus-test 
Не стой столбом за правду жизни.
стойлол!
столбик
конец файла

grep -w стой rus-test 
Не стой столбом за правду жизни.
стойлол!


But there should be only one match!

Comment 14 Fedora End Of Life 2015-01-09 21:41:37 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 15 Fedora End Of Life 2015-11-04 15:45:01 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 16 Fedora End Of Life 2015-12-02 02:32:26 UTC
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 17 Pavel Zhukov 2015-12-02 09:36:17 UTC
Works fine in F23 btw

Comment 18 Andrew 2019-10-28 14:35:22 UTC
Seems broken again as an upstream issue:

$ echo йцукен | grep -w кен
йцукен

Works fine for English text:

$ echo qwerty | grep -w rty | wc -c
0

Seems not depending on locale (en_US.utf8 and uk_UA.utf8 are affected, at least).

Can anybody reopen this?

Comment 19 Jaroslav Škarvada 2019-10-30 10:13:14 UTC
I can reproduce with grep-3.1-9.fc30.x86_64

But it needs to be fixed upstream.

Comment 21 Ben Cotton 2020-04-30 20:45:31 UTC
This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 22 Pavel Zhukov 2020-05-01 15:11:11 UTC
grep-3.3-3.fc31.x86_64

Comment 23 Andrew 2020-05-06 18:17:23 UTC
Fedora 32 is affected as well
grep-3.3-4.fc32.x86_64

Comment 24 Fedora Program Management 2021-04-29 17:21:16 UTC
This message is a reminder that Fedora 32 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 32 on 2021-05-25.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '32'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 32 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 25 Andrew 2021-04-29 21:12:31 UTC
Upstream bugs in Comment 20 are still open, but now the case in Comment 18 does not reproduce for me (seem fixed somehow).

Comment 26 Ben Cotton 2021-05-25 18:42:52 UTC
Fedora 32 changed to end-of-life (EOL) status on 2021-05-25. Fedora 32 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.