I'm packaging some third-party stuff for easier deployment, and when I build the package I'm getting nonsense errors which terminate the build, such as: error: Package xxx: invalid utf-8 encoding in Classdict: TrueType Font data, digitally signed, 19 tables, 1st "DSIG", 26 names, Macintosh, Digitized data copyright © 2010-2011, Google Corporation.Open SansItalic1.10;1ASC;OpenSans-Ital - Invalid or incomplete multibyte or wide character There are two problems with this. 1. RPM is not telling me the file with the invalid encoding, so that makes it hard to fix, if this is a real error. 2. There's actually nothing wrong with the file in question. RPM gets this by calling 'file' which outputs in the current locale, but it is trying to validate it as UTF-8, which won't work if your locale is not UTF-8. So either: (a) if your locale is not UTF-8, RPM should not try to validate it as UTF-8; or (b) if RPM wants UTF-8, it should set the locale to UTF-8 before calling 'file'. Of course there's an obvious workaround, which is to set the locale to UTF-8 before compiling the package. But this is a confusing error if you are not aware of that.
Such a problem can certainly occur, but there are invalid assumptions in your post: rpm does not call file, it uses libmagic API, and the locale does not affect the outcome because libmagic strings are not translated at all. Rpm cannot directly tell you the associated file because it's not checked in that context at all (instead, the encoding check is run on the entire header). Just run 'file' manually on the fonts in the buildroot to see what matches (+ possibly fix). For Fedora packages, utf-8 is mandatory but for your own purposes... if you don't care about the encoding, it's trivially worked around by adding the following to the spec: %global _invalid_encoding_terminates_build 0 (after which you can also associate the broken description to the file in question by running 'rpm -q --fileclass <pkg>' if you want to try fixing instead) So, not a bug, the check is doing exactly what it's meant to do.
"file" and "libmagic" are basically the same thing, and it's trivial to demonstrate that locale does make a difference because the exact same package builds fine with LANG=en_GB.UTF-8 when it errors out with LANG=en_GB.iso8859-1 (the font files came from a third party and are not changed during the build). So as I said, there is a trivial workaround; but the message is confusing because the files in the package are not broken.
Right, translation != encoding. I don't see what we could do to help that though, except a) better document the issue + workaround (short term) b) get rid of the libmagic classification strings in the first place (longer term) Rpm itself couldn't care less about the encoding but the world at large expects utf-8 these days, which is why the check is there to begin with. P.S. file and libmagic are of course quite literally "the same thing", but technically "calling file" is quite different involving forks and shells etc compared to using libmagic API as you surely know. A number of rpm scripts do "call file" instead, so the distinction matters.