Bug 1158494 - expand and unexpand don't correctly manage files having a BOM header
Summary: expand and unexpand don't correctly manage files having a BOM header
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: coreutils
Version: 23
Hardware: x86_64
OS: Unspecified
unspecified
low
Target Milestone: ---
Assignee: Jakub Martisko
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords: Reopened
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-29 13:51 UTC by Berthault
Modified: 2016-11-03 23:54 UTC (History)
7 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2016-11-03 23:54:57 UTC


Attachments (Terms of Use)
UTF-8 file with BOM header for testing unexpand (37 bytes, application/octet-stream)
2014-10-29 14:18 UTC, Berthault
no flags Details
UTF-8 file with BOM header for testing expand (23 bytes, application/octet-stream)
2014-10-29 14:19 UTC, Berthault
no flags Details

Description Berthault 2014-10-29 13:51:31 UTC
Description of problem:
When the locale isn't UTF-8, expand and unexpand tools don't correctly manage files having an UTF-8 BOM header. With expand, some spaces are missing and with unexpand, there is some extra spaces so the result aren't correct.

Version-Release number of selected component (if applicable):
Fedora-20
coreutils-8.21-21.fc20.x86_64

How reproducible:
See below

Steps to Reproduce:
1. Write an UTF-8 file (file_spaces) having a BOM header with the following contents:
    é   à   ç
    é   à   ç

Two lines with four spaces + 'é' + three spaces + 'à' + three spaces + 'ç' + EOL

It's important to test with stressed characters. With ASCII characters, only the first line is incorrect and the second one is OK. With stressed characters, all lines are incorrect.

NB: For writing files with BOM header, I'm using the scite editor.

2. LANG=C unexpand -t4 file_spaces > file_unexpand

3. Edit the two files (e.g. with gedit or scite) and see the problem in file_unexpand

4. Write an UTF-8 file (file_tabs) having a BOM header with the following contents:
    é   à   ç
    é   à   ç

Two lines with one tab + 'é' + one tab + 'à' + one tab + 'ç' + EOL

5. LANG=C expand -t4 file_tabs > file_expand

6. Edit the two files (e.g. with gedit or scite) and see the problem in file_expand

Actual results:


Expected results:


Additional info:

Comment 1 Ondrej Vasik 2014-10-29 14:08:03 UTC
Thanks for report, can you please attach file_tabs file to this bugzilla so it is easier for everyone? E.g. scite is not in EPEL nor RHEL, so I have to compile it from sources...
From the description you provided, for UTF-8 locales, everything is ok, correct? With C locales, even multibyte characters are handled byte by byte and different path of code is used (Fedora has downstream i18n patch which is active in multibyte locales).

Comment 2 Berthault 2014-10-29 14:18:18 UTC
Created attachment 951805 [details]
UTF-8 file with BOM header for testing unexpand

Comment 3 Berthault 2014-10-29 14:19:07 UTC
Created attachment 951807 [details]
UTF-8 file with BOM header for testing expand

Comment 4 Berthault 2014-10-29 14:22:54 UTC
From the description you provided, for UTF-8 locales, everything is ok, correct?
Yes

Comment 5 Fedora End Of Life 2015-05-29 13:11:06 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 6 Ondrej Vasik 2015-07-09 11:50:26 UTC
Just as a side note. Ondrej is working on new implementation of i18n and he expects to fix this issue there. Fix not planned for the old implementation, though. Reassigning to him for now.

Comment 7 Jan Kurik 2015-07-15 14:37:01 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Comment 11 Fedora Update System 2016-10-31 17:46:02 UTC
coreutils-8.25-7.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-75adc7da4f

Comment 12 Fedora Update System 2016-11-01 18:22:43 UTC
coreutils-8.25-7.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-75adc7da4f

Comment 13 Fedora Update System 2016-11-03 23:54:57 UTC
coreutils-8.25-7.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.