Red Hat Bugzilla – Bug 98119
cat <file> | sort -u > <file2>, without some words with accent
Last modified: 2007-04-18 12:55:20 EDT
Description of problem:
There is a problem with sort in Red Hat 9.0 that doesn't happen with Red Hat
7.2. In Brazilian portugues (I only saw the problem with this language), sort
removed some words with accent.
I used the following command:
cat <file> | sort -u
And, some words with accent disapear from the command. I tried the same commmand
with the same <file> in a machine with Red Hat 7.2 and the problem doesn't
occur. The <file> has 4 Mbytes and have around 53 thousand unique words. I
cannot send the original file or the results because it is a internal document
from my company.
Version-Release number of selected component (if applicable):
Red Hat 9.0
Steps to Reproduce:
1. cat <file>
2. sort -u
<without> jÂ´unio (I cannot write here with accent propery)
Could you send me a minimal test case (or provide a pointer to one) that
demonstrates the problem? Perhaps obscuring the words with "tr '[a-z]' x" would
Also what locale are you using? What does 'locale' say?
I am trying to find a minimum file that appers this error. I really cannot send
you the original file.
The problem is related to very large files. The original file has 8Mbytes with
1.3Mwords and 65K unique words. I couldn't reproduce the problem with a smaller
version of the file.
I notice that RedHat 9.0 and RedHat 7.2 have bugs in this case, but they are
differents bugs. In RedHat 7.2, there are a couple of non accent words missing,
but in RedHat 9.0, there are accented words missing. I cannot reproduce this
error with a small file.
I don't know if you can arrange a very big text file to test this. Unfortune, I
really cannot send you the file.
Need a test case before I can analyse the problem. :-/