Bug 100934 - add package convmv to fedora core
add package convmv to fedora core
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: distribution (Show other bugs)
rawhide
All Linux
high Severity medium
: ---
: ---
Assigned To: Bill Nottingham
Bill Nottingham
http://j3e.de/linux/convmv/
: FutureFeature, Reopened, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2003-07-27 12:40 EDT by Ronny Buchmann
Modified: 2014-03-16 22:37 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-05 14:31:13 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
convmv perl utility (14.80 KB, text/plain)
2003-07-28 16:17 EDT, Ronny Buchmann
no flags Details

  None (edit)
Description Ronny Buchmann 2003-07-27 12:40:14 EDT
Description of problem:
Since RHL 8.0 exists the problem of migrating filenames in ISO-8859-* encoding
to UTF-8 (or otherwise filenames with non ASCII-chars are not accessible by some
applications)

Convmv does this easily, it should definitly go in before next release.

It is very small (~20KB) and needs only some already included perl modules.
$ rpm -q --requires convmv
/usr/bin/perl
perl >= 0:5.008
perl(Cwd)
perl(Encode)
perl(File::Basename)
perl(File::Compare)
perl(File::Find)
perl(Getopt::Long)
perl(Unicode::Normalize)
perl(bytes)
perl(utf8)
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1


Maybe it should also be provided as an errata for Psyche and Shrike.

I can attach my spec file if you like.

--
what is the right component for package requests?
Comment 1 Jeremy Katz 2003-07-28 12:02:30 EDT
distribution
Comment 2 Bill Nottingham 2003-07-28 12:08:07 EDT
This can be done pretty simply with a shell wrapper around iconv: I don't think
we need a special program for this ATM.
Comment 3 Ronny Buchmann 2003-07-28 14:30:56 EDT
This is not about file contents but file names.

And this is not simple enough with shell + iconv. (do you write it within 5
minutes? show it!)

The UTF-8 default in Psyche was a real nightmare, as many files were not
accessible with file managers. Renaming them manually in an xterm was a real
pain. And I'm sure not everybody has migrated yet.
Comment 4 Bill Nottingham 2003-07-28 14:36:39 EDT
for file in $(find foodir -type f)
do
        base=$(basename $file | iconv -f foo -t UTF-8)
        dir=$(dirname $file)
        mv $file $dir/$base
done

is the simple variant someone has posted internally at some point.

Assigning to someone who was debating what to do about this at some point.
Comment 5 Havoc Pennington 2003-07-28 14:39:15 EDT
See 
ftp://people.redhat.com/hp/recode-files.c
Comment 6 Ronny Buchmann 2003-07-28 15:27:27 EDT
ftp://people.redhat.com/hp/recode-files.c is not world readable

things a conversion tool should be aware of:
 - already converted files
 - symlinks

and there should be a hint in the release notes that such a thing exists
Comment 7 Havoc Pennington 2003-07-28 15:31:02 EDT
Sorry, I chmod'd it now.

I don't think we'll add to release notes unless we add it to the distribution,
right now it's just something people can try out if they want and 
comment on whether to include.

Changing symlink targets... hmm. Sounds hard but possible.

I don't know of any way to reliably handle already-converted files
(this is the same problem as "handle a filesystem with filenames of 
mixed encoding" - and there's simply no reliable way to autodetect filename
encodings, so I don't see how you can handle this reliably).
Comment 8 Ronny Buchmann 2003-07-28 16:12:08 EDT
I don't think it's very likely that valid UTF-8 strings make sense when
interpreted with other encodings. (maybe it's completely irrelevant in praxis)

And it would be better to not recode it than double encode it.
Comment 9 Ronny Buchmann 2003-07-28 16:17:59 EDT
Created attachment 93196 [details]
convmv perl utility

it's really readable, you should have a look
Comment 10 Havoc Pennington 2003-07-28 16:49:53 EDT
> I don't think it's very likely that valid UTF-8 strings make sense when
> interpreted with other encodings

It's extremely likely, because all possible bytes are valid in the Latin-*
encodings and other 8-bit encodings...

The perl script does look interesting, we should consider it for sure.
I was just posting the C program since it's what we were discussing 
previously.
Comment 11 Ronny Buchmann 2003-07-28 17:00:03 EDT
Valid doesn't mean it also makes sense. I for myself haven't seen any utf-8
which looked useful when viewed as latin-*.

And I can only reiterate it's better to leave files untouched than double encode
them.
Comment 12 Havoc Pennington 2003-07-28 18:50:14 EDT
Can you define "makes sense" in terms of a computer algorithm ;-)
This would be an unsolved problem in AI. Especially since filenames need not 
be any kind of dictionary word or combination thereof.
Comment 13 Ronny Buchmann 2003-07-29 02:51:23 EDT
If filenames contain non-ASCII characters, they are mostly dictionary based.

But it really doesn't matter that much, see my last sentence above. ;)
Comment 14 Björn Jacke 2003-11-23 15:27:32 EST
Havoc, ISO8859-* encodings and most other encodings can really not
be  "guessed" very good but UTF-8 can be "guessed" *very* reliable
and convmv does that. Convmv can also convert from and to NFC and NFD
which is important for MAC OS X interoperability. Important is also
that convmv does efficien checks for invalid and unsufficient 
charsets.
Though I wrote convmv for UTF-8 locale migration it turned out that
it's mainly used this days for Samba repository conversions when
people migrate to Samba3 or change "unix charset" option.
Comment 15 Havoc Pennington 2004-05-25 14:08:26 EDT
I think this is useful but the bug is just in limbo while assigned to
me; we need someone who will in fact do the work to package/maintain.
Comment 16 Ronny Buchmann 2004-06-05 06:05:11 EDT
inside or outside RH?
Comment 17 Havoc Pennington 2004-06-06 14:32:43 EDT
Outside is fine, though at the moment outside would have to be done
via fedora.us (hopefully this is changing soon...)

Inside is fine too of course.
Comment 18 Ronny Buchmann 2005-05-15 09:02:05 EDT
convmv is available in extras.

I still think it should be in core.
Comment 19 Red Hat Bugzilla 2007-02-05 14:14:01 EST
REOPENED status has been deprecated. ASSIGNED with keyword of Reopened is preferred.
Comment 20 Jesse Keating 2007-02-05 14:31:13 EST
Core/Extras are merging, solving this problem.

Note You need to log in before you can comment on or make changes to this bug.