Description of problem: There is a problem with sort in Red Hat 9.0 that doesn't happen with Red Hat 7.2. In Brazilian portugues (I only saw the problem with this language), sort removed some words with accent. I used the following command: cat <file> | sort -u And, some words with accent disapear from the command. I tried the same commmand with the same <file> in a machine with Red Hat 7.2 and the problem doesn't occur. The <file> has 4 Mbytes and have around 53 thousand unique words. I cannot send the original file or the results because it is a internal document from my company. Version-Release number of selected component (if applicable): Red Hat 9.0 How reproducible: All time. Steps to Reproduce: 1. cat <file> 2. sort -u 3. Actual results: <without> j´unio (I cannot write here with accent propery) Expected results: <with> j´unio Additional info:
Could you send me a minimal test case (or provide a pointer to one) that demonstrates the problem? Perhaps obscuring the words with "tr '[a-z]' x" would help? Also what locale are you using? What does 'locale' say?
I am trying to find a minimum file that appers this error. I really cannot send you the original file. The problem is related to very large files. The original file has 8Mbytes with 1.3Mwords and 65K unique words. I couldn't reproduce the problem with a smaller version of the file. I notice that RedHat 9.0 and RedHat 7.2 have bugs in this case, but they are differents bugs. In RedHat 7.2, there are a couple of non accent words missing, but in RedHat 9.0, there are accented words missing. I cannot reproduce this error with a small file. I don't know if you can arrange a very big text file to test this. Unfortune, I really cannot send you the file. Luis
Need a test case before I can analyse the problem. :-/