Bug 100934 - add package convmv to fedora core
Summary: add package convmv to fedora core
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: distribution
Version: rawhide
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Bill Nottingham
QA Contact: Bill Nottingham
URL: http://j3e.de/linux/convmv/
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-07-27 16:40 UTC by Ronny Buchmann
Modified: 2014-03-17 02:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-02-05 19:31:13 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
convmv perl utility (14.80 KB, text/plain)
2003-07-28 20:17 UTC, Ronny Buchmann
no flags Details

Description Ronny Buchmann 2003-07-27 16:40:14 UTC
Description of problem:
Since RHL 8.0 exists the problem of migrating filenames in ISO-8859-* encoding
to UTF-8 (or otherwise filenames with non ASCII-chars are not accessible by some
applications)

Convmv does this easily, it should definitly go in before next release.

It is very small (~20KB) and needs only some already included perl modules.
$ rpm -q --requires convmv
/usr/bin/perl
perl >= 0:5.008
perl(Cwd)
perl(Encode)
perl(File::Basename)
perl(File::Compare)
perl(File::Find)
perl(Getopt::Long)
perl(Unicode::Normalize)
perl(bytes)
perl(utf8)
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1


Maybe it should also be provided as an errata for Psyche and Shrike.

I can attach my spec file if you like.

--
what is the right component for package requests?

Comment 1 Jeremy Katz 2003-07-28 16:02:30 UTC
distribution

Comment 2 Bill Nottingham 2003-07-28 16:08:07 UTC
This can be done pretty simply with a shell wrapper around iconv: I don't think
we need a special program for this ATM.

Comment 3 Ronny Buchmann 2003-07-28 18:30:56 UTC
This is not about file contents but file names.

And this is not simple enough with shell + iconv. (do you write it within 5
minutes? show it!)

The UTF-8 default in Psyche was a real nightmare, as many files were not
accessible with file managers. Renaming them manually in an xterm was a real
pain. And I'm sure not everybody has migrated yet.

Comment 4 Bill Nottingham 2003-07-28 18:36:39 UTC
for file in $(find foodir -type f)
do
        base=$(basename $file | iconv -f foo -t UTF-8)
        dir=$(dirname $file)
        mv $file $dir/$base
done

is the simple variant someone has posted internally at some point.

Assigning to someone who was debating what to do about this at some point.

Comment 5 Havoc Pennington 2003-07-28 18:39:15 UTC
See 
ftp://people.redhat.com/hp/recode-files.c

Comment 6 Ronny Buchmann 2003-07-28 19:27:27 UTC
ftp://people.redhat.com/hp/recode-files.c is not world readable

things a conversion tool should be aware of:
 - already converted files
 - symlinks

and there should be a hint in the release notes that such a thing exists

Comment 7 Havoc Pennington 2003-07-28 19:31:02 UTC
Sorry, I chmod'd it now.

I don't think we'll add to release notes unless we add it to the distribution,
right now it's just something people can try out if they want and 
comment on whether to include.

Changing symlink targets... hmm. Sounds hard but possible.

I don't know of any way to reliably handle already-converted files
(this is the same problem as "handle a filesystem with filenames of 
mixed encoding" - and there's simply no reliable way to autodetect filename
encodings, so I don't see how you can handle this reliably).

Comment 8 Ronny Buchmann 2003-07-28 20:12:08 UTC
I don't think it's very likely that valid UTF-8 strings make sense when
interpreted with other encodings. (maybe it's completely irrelevant in praxis)

And it would be better to not recode it than double encode it.

Comment 9 Ronny Buchmann 2003-07-28 20:17:59 UTC
Created attachment 93196 [details]
convmv perl utility

it's really readable, you should have a look

Comment 10 Havoc Pennington 2003-07-28 20:49:53 UTC
> I don't think it's very likely that valid UTF-8 strings make sense when
> interpreted with other encodings

It's extremely likely, because all possible bytes are valid in the Latin-*
encodings and other 8-bit encodings...

The perl script does look interesting, we should consider it for sure.
I was just posting the C program since it's what we were discussing 
previously.

Comment 11 Ronny Buchmann 2003-07-28 21:00:03 UTC
Valid doesn't mean it also makes sense. I for myself haven't seen any utf-8
which looked useful when viewed as latin-*.

And I can only reiterate it's better to leave files untouched than double encode
them.

Comment 12 Havoc Pennington 2003-07-28 22:50:14 UTC
Can you define "makes sense" in terms of a computer algorithm ;-)
This would be an unsolved problem in AI. Especially since filenames need not 
be any kind of dictionary word or combination thereof.

Comment 13 Ronny Buchmann 2003-07-29 06:51:23 UTC
If filenames contain non-ASCII characters, they are mostly dictionary based.

But it really doesn't matter that much, see my last sentence above. ;)

Comment 14 Björn Jacke 2003-11-23 20:27:32 UTC
Havoc, ISO8859-* encodings and most other encodings can really not
be  "guessed" very good but UTF-8 can be "guessed" *very* reliable
and convmv does that. Convmv can also convert from and to NFC and NFD
which is important for MAC OS X interoperability. Important is also
that convmv does efficien checks for invalid and unsufficient 
charsets.
Though I wrote convmv for UTF-8 locale migration it turned out that
it's mainly used this days for Samba repository conversions when
people migrate to Samba3 or change "unix charset" option.

Comment 15 Havoc Pennington 2004-05-25 18:08:26 UTC
I think this is useful but the bug is just in limbo while assigned to
me; we need someone who will in fact do the work to package/maintain.


Comment 16 Ronny Buchmann 2004-06-05 10:05:11 UTC
inside or outside RH?

Comment 17 Havoc Pennington 2004-06-06 18:32:43 UTC
Outside is fine, though at the moment outside would have to be done
via fedora.us (hopefully this is changing soon...)

Inside is fine too of course.

Comment 18 Ronny Buchmann 2005-05-15 13:02:05 UTC
convmv is available in extras.

I still think it should be in core.

Comment 19 Red Hat Bugzilla 2007-02-05 19:14:01 UTC
REOPENED status has been deprecated. ASSIGNED with keyword of Reopened is preferred.

Comment 20 Jesse Keating 2007-02-05 19:31:13 UTC
Core/Extras are merging, solving this problem.


Note You need to log in before you can comment on or make changes to this bug.