Description of problem:
Since RHL 8.0 exists the problem of migrating filenames in ISO-8859-* encoding
to UTF-8 (or otherwise filenames with non ASCII-chars are not accessible by some
Convmv does this easily, it should definitly go in before next release.
It is very small (~20KB) and needs only some already included perl modules.
$ rpm -q --requires convmv
perl >= 0:5.008
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Maybe it should also be provided as an errata for Psyche and Shrike.
I can attach my spec file if you like.
what is the right component for package requests?
This can be done pretty simply with a shell wrapper around iconv: I don't think
we need a special program for this ATM.
This is not about file contents but file names.
And this is not simple enough with shell + iconv. (do you write it within 5
minutes? show it!)
The UTF-8 default in Psyche was a real nightmare, as many files were not
accessible with file managers. Renaming them manually in an xterm was a real
pain. And I'm sure not everybody has migrated yet.
for file in $(find foodir -type f)
base=$(basename $file | iconv -f foo -t UTF-8)
mv $file $dir/$base
is the simple variant someone has posted internally at some point.
Assigning to someone who was debating what to do about this at some point.
ftp://people.redhat.com/hp/recode-files.c is not world readable
things a conversion tool should be aware of:
- already converted files
and there should be a hint in the release notes that such a thing exists
Sorry, I chmod'd it now.
I don't think we'll add to release notes unless we add it to the distribution,
right now it's just something people can try out if they want and
comment on whether to include.
Changing symlink targets... hmm. Sounds hard but possible.
I don't know of any way to reliably handle already-converted files
(this is the same problem as "handle a filesystem with filenames of
mixed encoding" - and there's simply no reliable way to autodetect filename
encodings, so I don't see how you can handle this reliably).
I don't think it's very likely that valid UTF-8 strings make sense when
interpreted with other encodings. (maybe it's completely irrelevant in praxis)
And it would be better to not recode it than double encode it.
Created attachment 93196 [details]
convmv perl utility
it's really readable, you should have a look
> I don't think it's very likely that valid UTF-8 strings make sense when
> interpreted with other encodings
It's extremely likely, because all possible bytes are valid in the Latin-*
encodings and other 8-bit encodings...
The perl script does look interesting, we should consider it for sure.
I was just posting the C program since it's what we were discussing
Valid doesn't mean it also makes sense. I for myself haven't seen any utf-8
which looked useful when viewed as latin-*.
And I can only reiterate it's better to leave files untouched than double encode
Can you define "makes sense" in terms of a computer algorithm ;-)
This would be an unsolved problem in AI. Especially since filenames need not
be any kind of dictionary word or combination thereof.
If filenames contain non-ASCII characters, they are mostly dictionary based.
But it really doesn't matter that much, see my last sentence above. ;)
Havoc, ISO8859-* encodings and most other encodings can really not
be "guessed" very good but UTF-8 can be "guessed" *very* reliable
and convmv does that. Convmv can also convert from and to NFC and NFD
which is important for MAC OS X interoperability. Important is also
that convmv does efficien checks for invalid and unsufficient
Though I wrote convmv for UTF-8 locale migration it turned out that
it's mainly used this days for Samba repository conversions when
people migrate to Samba3 or change "unix charset" option.
I think this is useful but the bug is just in limbo while assigned to
me; we need someone who will in fact do the work to package/maintain.
inside or outside RH?
Outside is fine, though at the moment outside would have to be done
via fedora.us (hopefully this is changing soon...)
Inside is fine too of course.
convmv is available in extras.
I still think it should be in core.
REOPENED status has been deprecated. ASSIGNED with keyword of Reopened is preferred.
Core/Extras are merging, solving this problem.