Bug 142948 - wc -m option unable to count the number of multibyte characters in a file
wc -m option unable to count the number of multibyte characters in a file
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: coreutils (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Tim Waugh
David Lawrence
:
: 142949 (view as bug list)
Depends On:
Blocks: RHEL4-LI18NUX
  Show dependency treegraph
 
Reported: 2004-12-15 04:16 EST by Lawrence Lim
Modified: 2014-03-25 20:51 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-12-15 04:29:48 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
script with a single Test Case from LI18NUX Test for testing wc -m option (11.30 KB, application/octet-stream)
2004-12-15 04:20 EST, Lawrence Lim
no flags Details

  None (edit)
Description Lawrence Lim 2004-12-15 04:16:21 EST
Description of problem:
This bug is filed as it is one of the errors reported in the LI18nUX
test suite, part of the LSB conformance test.

It has been reported that when -m option is specified for the wc
utility, it cannot outputs the number of characters in the input file
even though the characters are multibyte characters.

I have stripped down the test suite to only the single test case. I
would really appreciate if you could have a look and let me know if
this is a glibc error or Test Case issue?

Version-Release number of selected component (if applicable):
glibc-devel-2.3.3-86
glibc-profile-2.3.3-86
glibc-headers-2.3.3-86
compat-glibc-headers-2.3.2-95.30
glibc-2.3.3-86
glibc-kernheaders-2.4-9.1.87
compat-glibc-2.3.2-95.30
glibc-common-2.3.3-86
glibc-utils-2.3.3-86

How reproducible:
Always

Steps to Reproduce:
1.Down the attachment, wc1.tar.gz
2.run ./wc1.sh
3.view the result in the output file, tet_xres
  
Actual results:
Can't count number of characters.

Expected results:
Able to count these characters:
この ファイル には 日本語も 含まれて います。 in a file

Additional info:
https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=108593&action=view
Comment 1 Lawrence Lim 2004-12-15 04:20:02 EST
Created attachment 108596 [details]
script with a single Test Case from LI18NUX Test for testing wc -m option
Comment 2 Jakub Jelinek 2004-12-15 04:29:48 EST
Here is my analysis of this for LSB 1.3.  Note that this has nothing
to do with glibc, but with coreutils:

10|570 /tset/LI18NUX2K.L1/utils/wc/wc 00:21:33|TC Start, scenario ref 574-0
520|570 1 6181 1 1|* When -m option is specified, verify this utility outputs the number of characters in each input file even if the characters are multibyte characters.
520|570 1 6181 1 2|
520|570 1 6181 1 3|Can't count number of characters.
220|570 1 1 00:21:33|FAIL
520|570 2 6181 1 1|* When this utility writes to the standard output the number of words, this utility correctly recognizes the boundaries of words. The boundaries are shown as white-space characters constituted in current locale.
520|570 2 6181 1 2|
520|570 2 6181 1 3|Can't count number of words.
220|570 2 1 00:21:33|FAIL

LSB here expects that wc violates POSIX standard and prints say
     26 text.txt
instead of
26 text.txt
POSIX requires "%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>
format (with omitting the numbers that are not printed), see
http://www.opengroup.org/onlinepubs/009695399/utilities/wc.html,
while:
The output file format pseudo- printf() string differs from the System V version of wc:
"%7d%7d%7d %s\n"
which produces possibly ambiguous and unparsable results for very large files, as it assumes no number shall exceed six
digits.
I'd say LSB testsuite should be changed to accept both at least.
This is fixed in LSB 2.0 already.
Comment 3 Jakub Jelinek 2004-12-15 04:40:04 EST
*** Bug 142949 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.