Bug 434694 - file-not-utf8 for an OCaml source file
Summary: file-not-utf8 for an OCaml source file
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: rpmlint
Version: 9
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Ville Skyttä
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-02-24 14:56 UTC by Richard W.M. Jones
Modified: 2008-06-26 08:30 UTC (History)
2 users (show)

Fixed In Version: 0.83-1.fc9
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-06-26 08:29:59 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Richard W.M. Jones 2008-02-24 14:56:13 UTC
Example:

  ocaml-odbc-devel.i386: W: file-not-utf8
/usr/share/doc/ocaml-odbc-devel-2.15/Exemples/monitor.ml

The situation here is complicated.  A standard OCaml source file
can usually be considered ISO-8859-1 encoded.  More precisely:

Identifiers may contain ISO-8859-1 characters.  In particular if
identifiers are converted to UTF-8 the program will no longer
compile:

  $ echo -e 'let m\xe9 = 1' > test.ml
  $ hexdump -C test.ml
  00000000  6c 65 74 20 6d e9 20 3d  20 31 0a     |let m. = 1.|
  0000000b
  $ ocamlc test.ml
  $ iconv -f iso-8859-1 -t utf-8 < test.ml > testu.ml
  $ ocamlc testu.ml
  File "testu.ml", line 1, characters 6-7:
  Illegal character (\169)

Comments in the source can contain ISO-8859-1 characters (and
given that the primary developers are French, this not just a
theoretical consideration).

Literal strings in OCaml programs are really byte arrays and
as such could contain just about anything.

Literal strings in, say, OCaml GTK2 programs might contain UTF-8
because GTK itself would be expecting UTF-8 for labels, messages, etc.

All of the above are (in my opinion) very bad practice -- one shouldn't
be using ISO-8859-1 for identifiers for example, and strings which
could contain foreign characters are better stored either as \escapes
or better still as external resources.  And using ISO-8859-1 in
identifiers is just insane.  Nevertheless, all of the above
are possible.

So I think the best thing is to disable this warning for *.ml, *.mli,
*.mly and *.mll files, unless you can think of a better way of handling
this.

I'm open to discussing changing Fedora OCaml policy to forbid this
sort of thing.  Hopefully it's fairly rare outside comments.

Comment 1 Ville Skyttä 2008-03-01 10:34:22 UTC
Done upstream, will be in the next release:
http://rpmlint.zarb.org/cgi-bin/trac.cgi/changeset/1407

Comment 2 Bug Zapper 2008-05-14 05:36:50 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Fedora Update System 2008-06-09 18:33:04 UTC
rpmlint-0.83-1.fc9 has been submitted as an update for Fedora 9

Comment 4 Fedora Update System 2008-06-09 18:35:10 UTC
rpmlint-0.83-1.fc8 has been submitted as an update for Fedora 8

Comment 5 Fedora Update System 2008-06-11 04:34:29 UTC
rpmlint-0.83-1.fc8 has been pushed to the Fedora 8 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update rpmlint'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F8/FEDORA-2008-5185

Comment 6 Fedora Update System 2008-06-26 08:29:46 UTC
rpmlint-0.83-1.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 7 Fedora Update System 2008-06-26 08:30:46 UTC
rpmlint-0.83-1.fc8 has been pushed to the Fedora 8 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.