Bug 434694

Summary: file-not-utf8 for an OCaml source file
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: rpmlintAssignee: Ville Skyttä <ville.skytta>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 9CC: manuel.wolfshant, tmz
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 0.83-1.fc9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-06-26 08:29:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Richard W.M. Jones 2008-02-24 14:56:13 UTC
Example:

  ocaml-odbc-devel.i386: W: file-not-utf8
/usr/share/doc/ocaml-odbc-devel-2.15/Exemples/monitor.ml

The situation here is complicated.  A standard OCaml source file
can usually be considered ISO-8859-1 encoded.  More precisely:

Identifiers may contain ISO-8859-1 characters.  In particular if
identifiers are converted to UTF-8 the program will no longer
compile:

  $ echo -e 'let m\xe9 = 1' > test.ml
  $ hexdump -C test.ml
  00000000  6c 65 74 20 6d e9 20 3d  20 31 0a     |let m. = 1.|
  0000000b
  $ ocamlc test.ml
  $ iconv -f iso-8859-1 -t utf-8 < test.ml > testu.ml
  $ ocamlc testu.ml
  File "testu.ml", line 1, characters 6-7:
  Illegal character (\169)

Comments in the source can contain ISO-8859-1 characters (and
given that the primary developers are French, this not just a
theoretical consideration).

Literal strings in OCaml programs are really byte arrays and
as such could contain just about anything.

Literal strings in, say, OCaml GTK2 programs might contain UTF-8
because GTK itself would be expecting UTF-8 for labels, messages, etc.

All of the above are (in my opinion) very bad practice -- one shouldn't
be using ISO-8859-1 for identifiers for example, and strings which
could contain foreign characters are better stored either as \escapes
or better still as external resources.  And using ISO-8859-1 in
identifiers is just insane.  Nevertheless, all of the above
are possible.

So I think the best thing is to disable this warning for *.ml, *.mli,
*.mly and *.mll files, unless you can think of a better way of handling
this.

I'm open to discussing changing Fedora OCaml policy to forbid this
sort of thing.  Hopefully it's fairly rare outside comments.

Comment 1 Ville Skyttä 2008-03-01 10:34:22 UTC
Done upstream, will be in the next release:
http://rpmlint.zarb.org/cgi-bin/trac.cgi/changeset/1407

Comment 2 Bug Zapper 2008-05-14 05:36:50 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Fedora Update System 2008-06-09 18:33:04 UTC
rpmlint-0.83-1.fc9 has been submitted as an update for Fedora 9

Comment 4 Fedora Update System 2008-06-09 18:35:10 UTC
rpmlint-0.83-1.fc8 has been submitted as an update for Fedora 8

Comment 5 Fedora Update System 2008-06-11 04:34:29 UTC
rpmlint-0.83-1.fc8 has been pushed to the Fedora 8 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update rpmlint'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F8/FEDORA-2008-5185

Comment 6 Fedora Update System 2008-06-26 08:29:46 UTC
rpmlint-0.83-1.fc9 has been pushed to the Fedora 9 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 7 Fedora Update System 2008-06-26 08:30:46 UTC
rpmlint-0.83-1.fc8 has been pushed to the Fedora 8 stable repository.  If problems still persist, please make note of it in this bug report.