Very often users are putting in archive files with names which contain non-ascii symbols. I.e., just name the file in the local language, then zip it. Currently, "unzip" assumes that so called "OEM encoding" is CP850 only, and that locale's encoding is CP1252. (If certainly zip archive was not created under Linux, in the such case the names seem to be stored transparently). Because of this, to obtain the correct filename under, say russian, locale, we need to do: "unzip -l file.zip | iconv -f CP1252 -t CP850 | iconv -f CP866", as actually CP866 are used under "russian" win32 locales to write into zip files. Therefore, instead of "CP850-->CP1252" conversion, suitable for latin1 users only, we need some more intelligent way. I've created a patch which tries to solve this issue. The idea is to inspect the current locale (under which "unzip" was invoked) and to determine the actual conversion needed. The obtaining of the result encoding is trivial (using nl_langinfo(3) and friends). To guess what "OEM" encoding was actually used by win32 systems, we inspect the language/country part of LC_ALL/LANG environment, and determine the needed CPxxx by it. There are two way to do this: either just use a table (i.e. "en"-->"CP850", "ru"-->"CP866", "jp"-->"CP932" etc.), or first get the "not-utf" encoding for this locale and use table with "ISO-8859-1"-->"CP850", "ISO-8859-5"-->"CP866", "EUC-JP"-->"CP932" ... Currently I prefer the second way. The patch attached was successfully tested under ru_RU locales. Please note, that I'm ready to do any further work with this if needed.
Created attachment 147018 [details] support for non-latin1 filenames in archive
Surely the basic idea is an assumption that Fedora host and zip-creator host both live in the same country... :)
I have a related encoding issue; filenames are encoded in CP737; the function Ext_ASCII_TO_Native() mangles the name so that when it's printed out (unzip -l) or used to create the file, the name is not in any valid encoding anymore, and so convmv can't be used on it, etc. If I comment out the function call then I can run conmv -f CP737 -t utf8 (my locale: el_GR.UTF8) and it works fine. This patch unfortunately does not work for me (applied to unzip-5.52-4.fc7.src.rpm). It produces unrecognizable output. Attachment follows with sample output...
Created attachment 157351 [details] sample output with unzip on archive with CP737-encoded filenames four runs. in order: output from /usr/bin/unzip as shipped; output from unzip with Dimitry's patch; output from unzip with Ext_ASCII_TO_Native() call commented out; output from unzip with the call commented out, piped to iconv, successful. maybe we could have a switch to allow the user to turn that call off; that's better than the present state of affairs (having to run WinZip under wine, or capture the list of names with zipnote and rename them manually).
It seems that my patch is just a bit incomplete. Currently for your locale ("el_GR") the patch determines non-utf charmap as "ISO-8859-7" and chooses correspond "CP869", whereas you need "CP737". It could be nice if you write some more words about ISO-8859-7 and el_GR, and I'll change my patch according to your information. Anyway, it is impossible to handle all locales at one moment -- I just have no all needed info about them. I've tried to handle what I know (assume most wide-used), then people will report bugs (as this one) and I or "unzip" maintainer will add support for new locales too. BTW, may be you know some resource where all the "xx_XX --> CPxxx" maps can be found?
The problem is that sometimes I will want cp737 -> utf8, and sometimes iso8859-7 -> utf-8, and sometimes something else, depending on the source of the files. For that matter, zip should not assume (without an option to override) that the files to be unzipped necessarily map to the user's locale. If you really want to encode conversion, let the user specify, as with iconv and convmv, from the command line. Perhaps code could be stolen from iconv. I don't know of a good list of cp -> utf-8. But a couple sources that might be useful to you are here: http://nlso-objects.sourceforge.net/languagedata.php?pageindex=5 and the comprehensive iana list is here: http://www.iana.org/assignments/character-sets But let me make one more pitch for *not converting*; we already have tools that will do conversion after the fact, something that will work for all users right out of the box.
> The problem is that sometimes I will want cp737 -> utf8, and sometimes > iso8859-7 -> utf-8, and sometimes something else, > depending on the source of the files. AFAIK Zip file standard is only capable of supporting a single language at a time, by using a single OEM code page for it. Now I've found a list of OEM code pages: ' 437 OEM - United States ' 737 OEM - Greek (formerly 437G) ' 775 OEM - Baltic ' 850 OEM - Multilingual Latin I ' 852 OEM - Latin II ' 855 OEM - Cyrillic (primarily Russian) ' 857 OEM - Turkish ' 858 OEM - Multlingual Latin I + Euro symbol ' 860 OEM - Portuguese ' 861 OEM - Icelandic ' 862 OEM - Hebrew ' 863 OEM - Canadian - French ' 864 OEM - Arabic ' 865 OEM - Nordic ' 866 OEM - Russian ' 869 OEM - Modern Greek ' 874 ANSI/OEM - Thai (same as 28605, ISO 8859-15) ' 932 ANSI/OEM - Japanese, Shift-JIS ' 936 ANSI/OEM - Simplified Chinese (PRC, Singapore) ' 949 ANSI/OEM - Korean (Unified Hangeul Code) ' 950 ANSI/OEM - Traditional Chinese (Taiwan; Hong Kong SAR, PRC) it seems that when you create a zip file, all filenames are preserved in one of these encodings. The problem is there are two "Greek": CP737 and CP869 . Is there some way to guess it at run-time? > we already have tools that will do conversion after the fact From the command line -- yes. But the main reason for writing this patch is wrong filenames in file-roller window. (Some sites give info for users just in zip files; when user go to such a link, browser downloads the zip file, then file-roller is invoked, which invokes unzip). Unfortunately it is impossible to patch a lot of applications who can invoke unzip (either to handle encoding by themselves or to invoke unzip with some "new" cmdline options), hence we should solve this issue in the unzip immediately. But maybe some envrionment variable could help? (i.e. put UNZIP_OEM=CP737 somewhere in /etc/profile.d/* ...) And does nothing if this variable is not set...
Even if fileroller can't utilize them, I think unzip should have command line switches. For file roller and other apps, it would be nice if unzip had a default behavior. The problem is that there isn't a good choice of default. All of cp737, cp869, and iso8859-7 are different encodings. I guess you could test for the existence of characters that are in the unpermitted range of one of these and rule it out that way, but that will only work in some of the cases. Were you thinking of providing a way for the user to select the encoding from a list in the fileroller gui, and then set the environment var based on that?
> I think unzip should have command line switches. I agree, but it is a task for upstream unzip team. Else different distros will have different flags for this, as usual :( > The problem is that there isn't a good choice of default. In the case of Greek or for other locales too? (Just to decide whether some "auto-guessing" could be useful at least for some wide-spread locales).
First of all, a core difference between Windows and Linux is that in Windows you can configure a basic legacy encoding; when a filename or text is not valid UTF-16, it is assumed to be of that legacy 8-bit encoding and auto-converted. Due to this, people end up with content of the legacy encoding. I do not know whether it would make sense to replicate this functionality. It should solve a big part of the migration issues. I think for the implementation of this, it would require some changes in glibc (???). I am not sure if WinZip works because it is smart about encodings or because it benefits from the basic legacy setting in Windows. It appears 7zip for Windows cannot automatically fix the encoding, but this problem might not be related (perhaps 7zip forces utf-8, bypassing any autoconversion). Secondly, the issue we are trying to fix with zip filenames is similar to the ID3 tags in music files. Here as well, 8-bit encodings are being used to create content. In addition to this, CDDB databases store song metadata in a variety of encodings and do not try to keep sanity of the encoding (they can't solve the problem). For example see http://bugzilla.mugshot.org/show_bug.cgi?id=724 The way this should be solved is at the point of entry of the text of dubious encoding. The way forward that I see is to use a library that deals with the issue of autoconverting text fragments that are not valid UTF-8 sequences to proper UTF-8 text. Apparently, such a library exists, http://trific.ath.cx/software/enca/ I think that unzip should indeed have an additional command line option that one can use to invoke the "autofixing" of the encoding, if required. If unzip finds the "libenca" library, it will request to fix the filename. Obviously, normal users of unzip that are not aware of the parameters will not be affected. file-roller, the KDE counterpart, etc, would need to make a small change in the code, to add the extra command line parameter for unzip. In general I see more complaints on ID3 tags with the wrong encoding than bad Zip filenames, prompting for a generic library solution. If it is too much effort to go for the library solution (or it's difficult to get the unzip developers to add the option), I think it would be ok to keep the cut-down solution proposed here.
"Enca" library does not work well on very short text fragments (i.e., short filenames). It is possible that 4 or 5-byte filename will be converted wrong. Unzip supports "UNZIP" environment variable, which allows to specify additional options for command line. This way no modifications needed for file roller and friends, but unzip authors should implement an option anyway... After the option will appear, we just need a /etc/profile.d/unzip.sh profile which specifies something like "UNZIP="-<option> CP747", or even computes the appropriate codepage from the current locale settings. Any thoughts?
Enca, provided that its developers welcome the idea, can be used as a basis for the purposes of matching the encoding. In the domain we are looking now (Zip filenames), and future domains (IDv3 tags), one can make simplifications when matching the encoding. Therefore, in the case of figuring out the encoding of a small text fragment, the library would need to take into account the locale of the system. If the ZIP file has several files in it, then one option would be to delegate the task of encoding detection to a level higher (file-roller application), because file-roller can extract all file names first and detect the correct encoding. In any case, the course of direction to sort out this issue depends on the developer(s) that will undertake the task. Depending on resources, the priority task could be to cover the biggest chunk of encoding problems, such as the legacy encodings used in Windows. Would it make sense to write up a blueprint on this?
> in the case of figuring out the encoding of a small text fragment, > the library would need to take into account the locale of the system. But this way the enca itself seems to not be needed -- for most of locales, we can determine the appropriate codepage just by the locale of the system. It is exactly what I do in the patch... But in general, hinting the language for the enca does not much help -- see "enca -l languages". F.e. for russian, the short filename can be determined either IMB866, or CP1251, or KOI8-R ... > Would it make sense to write up a blueprint on this? Perhaps. What do you mean exactly?
> > Would it make sense to write up a blueprint on this? > Perhaps. What do you mean exactly? Apparently I meant specification. Sorry. A "blueprint" is launchpad lingo and I created such a thing to follow the progress at https://blueprints.launchpad.net/unzip/+spec/unzip-detect-filename-encoding Please see the references for some extra information. What's missing is some Wiki page that will host the specification. Since the initiative is from this community, it should be good to host on http://fedoraproject.org/wiki/ What's a good place to put such a page with the specification? Please start the page and I'll help out with the specification. If there is not such suitable location, please tell me so we can find some other location. The specification should include 1. We put the change in "unzip" and not the applications that make use of "unzip" because unzip is the source of the problem. 2. We do not make the autodetection of the encoding of the filenames a default feature (at least not yet), so that all those utilities that use unzip will continue to exhibit the same expected behaviour 3. We add a command line option to unzip that would enable the attempt to autoconvert the filename encoding to UTF-8. 4. Tools such as file-roller would need a simple patch; if the system encoding is UTF-8, use the special command line option in unzip when converting filenames. 5. We need to contact the makers of unzip et al, at http://www.info-zip.org/pub/infozip/ Without their agreement, we cannot push this upstream, and we are stuck. In an extreme case, we might have to maintain a distribution-specific patch. 6. The code that does the autodetection should be "portable" (easy to self-compile on its own) so that it can be used with test cases that we are going to make. We do not know yet what are the common initial encodings that will work with most languages. We shall advertise the specification to interested parties and may get some input, then implement. Currently you are happy to implement this; please tell me if the process looks too slow or you want to change something.
Simos, Could you please repeat all your thoughts at fedora-devel-list ? I think this discussion should be moved there, for more people can participate then.
Dmitry, while writing the blueprint I found that AltLinux has a similar patch, which is now also used in Ubuntu, https://bugs.launchpad.net/debian/+source/unzip/+bug/10979 Can you have a look at it and provide some comments.
Just a note that I am happy to put some coding time into this, with the caveat that I cannot do cross-platform work, only code and testing on linux platforms. I'd want to see two sets of command line options though: one would be the "autoattempt" switch, and the other would allow the user to fully specify which codepage/character set the filenames are likely in, in case the automated guess fails.
@Ariel: By all means, go ahead and try out what you have in mind. I think we have covered all cases of prior work on this. It is nice to have a good look at the patch that has been added in Ubuntu.
for comment #16 : > AltLinux has a similar patch, I know about it. This patch adds two options "-I" and "-O" to specify "ISO" and "OEM" encodings respectively. But I'm doubt whether is is useful for end users. The destination charset can be determined from the current locale by "nl_langinfo(CODESET)" (It might be either UTF8 or some legacy 8bit encoding). Then we need only one option, to specify the source charset (both OEM and ISO -- just cause unzip to recode filenames properly). for comment #17 : > I'd want to see two sets of command line options though The autodetection stuff could be chosen just by using "auto" in the place of encoding name (f.e. "-F CP747", "-F CP866", "-F auto" etc.) Anyway, to avoid an extra options mess between different distributions, the choice of the new option(s) must be coordinated with upstream... BTW, the current unzip beta (upzip60c) has the ability to recode text files (using iconv()), see "-a" option. It seems thet unzip upstream is ready to implement filename recoding as well.
One more note, there are two environment variables that affect the interpretation of the file name encoding on current system: G_FILENAME_ENCODING. This environment variable can be set to a comma-separated list of character set names. GLib assumes that filenames are encoded in the first character set from that list rather than in UTF-8. The special token "@locale" can be used to specify the character set for the current locale. G_BROKEN_FILENAMES. If this environment variable is set, GLib assumes that filenames are in the locale encoding rather than in UTF-8. G_FILENAME_ENCODING takes priority over G_BROKEN_FILENAMES. I understand that this is somehow gtk-specific, but perhaps it would be good if zip would honor these variables to avoid unsynchronized behaviour of different programs.
Thanks for this addition Andrew. There is more documentation at http://library.gnome.org/devel/glib/unstable/glib-Character-Set-Conversion.html#file-name-encodings Per http://live.gnome.org/GuideForISVs G_BROKEN_FILENAMES appears to have become obsolete.
i have this issue with Hebrew filenames inside zip archives i get from windows xp systems. but... it appears, gunzip does decompresses and opens stored files with hebrew filenames that are encoded iso-8859-8 (or windows-1255) correctly.
i've patched unzip60c latest development sources with the attached "support for non-latin1 filenames in archive" and successfully opened zip files that came from windows xp including archived filenames that where made up of windows-1255 or iso-8859-8 characters. i had to make sure LANG was set to he_IL.utf8. using the cli unzip all went well. using KDE's Ark or GNOME's File-Roller i got latine1 file names inside the GUI view _BUT_ it was extracting properly after all.
little bash script to open zip files with hebrew encoded filenames that include spaces. ( using the patched unzip60c !!! from last comment #23) #!/bin/bash LANG=he_IL.utf8 FULLNAME="$1" NAME=${FULLNAME##*/} PATHPART=${FULLNAME%%$NAME} /usr/bin/unzip "$FULLNAME" -d "$PATHPART"
I'm one of the developers working on UnZip 6.0 and saw the mail to us. I've only scanned this thread quickly so far, so definitely tell me if I'm not following something. First, Info-ZIP has put together an extension to the Zip standard (AppNote) that allows storing and restoring UTF-8 paths. This allows zipping paths on, say, Windows and unzipping them on maybe Unix. The approach took a few months to coordinate but WinZip and PKWARE have agreed to our approach and it is in the latest AppNote update. In brief, the zip converts the local paths to UTF-8 and stores them in an extra field. The unzip can read the UTF-8 and convert to the local character set. This avoids translations between character sets and dependence on the utilities to do that. It also is completely automatic. Betas Zip 3.0f (published) and UnZip 6.0d (not published yet) support this and some testing has been done. Note that beta Zip 3.0g with mostly minor bug fixes is close to posting and Zip 3.0 is getting close to going out the door, but UnZip may have a little work left. Second, it looks like much of this discussion may still be worth looking at for implementation. Non-Unicode archives and tools will continue to exist and allowing UnZip to handle non-Unicode archives is definitely useful. It looks like this change only impacts UnZip, so Zip 3.0 probably can stay on schedule for release maybe in a month. Further, if it takes some time to get UnZip 6.0 released this patch may be all there is for now. I've quickly scanned the attached 5.52 patch. A question is the need to detect if the appropriate libraries exist on the system as we support various Unix platforms. For UnZip 6.0 we probably can use the new configure script. I'm not sure how the group will go on this, decide against it, patch the old code, patch the new code, or patch both. It's also possible we may just post the (possibly updated) patches but not include them in the main code. Though UnZip 6.0d is still being worked, it may be stable enough to post public soon, but I can't say when that will happen. It's possible for individuals to get access to our internal betas, but we strictly control those bug-ridden things until they're tested enough for public posting.
This issue should be solved by upstream maintainers so thank you for your work Dmitry could you send your patch to upstream and discuss it there.
I have no enough time for now to work with this further. OTOH I hope that upstream is laready aware of it...
(In reply to comment #26) > This issue should be solved by upstream maintainers so thank you for your work > Dmitry could you send your patch to upstream and discuss it there. The problem is not solved. Precisely. Till now has the described problems. They dare patch imposing. It is checked up on upstream version unzip 6.0.
Please be more specific as to what is not solved. As the conversion between character sets (code pages) generally requires the from and to character sets (code pages) to be identified, which can be difficult to impossible in some situations, Info-ZIP chose to go with storing UTF-8 encodings that then could be converted to the destination character set without knowledge of the from character set. This approach has been picked up by PKWare and WinZip and is reflected in the latest AppNote (Zip standard). New archives should be created with this information stored to allow filenames (and file comments if supported) to be converted to other character sets. However, this does not address older archives without UTF-8 information. As far as I know, the preference in the Zip developer community has been to require users to update to newer tools that support UTF-8 filename storage rather than support storing and converting specific character set encodings. That said, Info-ZIP has been looking at implementing this change to support older archives, but given other priorities on the queue (adding the latest compression methods, etc.) we haven't gotten to it or even come to consensus within the group to support it. There are also issues relying on outside libraries such as iconv to perform character set conversions, though they seem workable. So please be more specific as to what you want done. (Also, due to other things going on lately, Info-ZIP development has gone from slow to almost nonexistent lately. It should pick up to slow again shortly.)
I simply wish to receive the unsqueezed archives in the coding which probably to read. Without this patch I receive rhombus instead of cyrillic characters. Example: $ unzip Tracktor\ Bowling\ -\ Черта.zip Archive: Tracktor Bowling - Черта.zip extracting: Tracktor Bowling - �����/Tracktor Bowling - ���.mp3 extracting: Tracktor Bowling - �����/folder.jpg extracting: Tracktor Bowling - �����/Tracktor Bowling - �����.m3u After patch imposing: $ unzip Tracktor\ Bowling\ -\ Черта.zip Archive: Tracktor Bowling - Черта.zip extracting: Tracktor Bowling - Черта/Tracktor Bowling - Сны.mp3 extracting: Tracktor Bowling - Черта/folder.jpg extracting: Tracktor Bowling - Черта/Tracktor Bowling - Черта.m3u Alternative solution is: convmv -f iso8859-1 -t cp850 -r --notest --nosmart . convmv -f cp866 -t utf8 -r --notest --nosmart . What a additional information is needed?
Bug still not fixed.
We, Russian fedora team (https: / / fedoraproject.org / wiki / Ru_RU / Russian_Fedora), please include an existing patch in the assembly rpm in Fedora, and not wait for upstream zip.
Created attachment 513838 [details] support for not-latin1 filenames for version 6.0 The same patch as of 2007, but for the current unzip-6.0
Please look at the -I and -O options in the UnZip 6.10b beta. These come from a previous patch submission. If these do what you need, let us know. If not, please suggest improvements or, if they are hopeless, then alternative code. We might also be interested in other solutions if they make life easier for the user. It looks like we're getting close to releasing Zip 3.1 and UnZip 6.10, maybe in the next couple months for Zip and maybe September for UnZip. It would be good to get all this worked out before the final release candidate for UnZip goes out maybe late next month. Here's your chance to impact a release that may be out shortly. (Sorry Info-ZIP doesn't do that more often.)
The main problem with '-I'/'-O' design is the need of users to *manually* specify input and output encodings. It is inconveniently itself. Moreover, when unzip is invoked under, say, file-roller, there is no possibility to specify such options at all. My patch has another idea -- try to guess the codepage automatically by the information of the current Linux locale (by the language setting). Most cases the actual codepage corresponds to the same language under which the user is working. Hence we can avoid manual user intervention for specifying of codesets. Certainly the best idea was to have both '-I'/'-O' options (for manual setting) and an automagic way when some of '-I'/'-O' is not specified. Use my patch anyway to implement the final solution.
Do you think you can redo your patch against UnZip 6.10b? There has been changes to those files since UnZip 6.00 and we need to focus on getting the betas ready to go out. Your changes need to work with the -I/-O feature. Actually I remember a user proposing yet another approach using a character recognition library. At some point that might be the better approach, but when we looked at it a couple years ago I think the library was not generally available on many platforms and we decided to hold off on implementing that approach.
Well, 6.10b already has some logic for auto-detecting of OEM_CP. Not very correct logic, but as an initial step... :) Now, unix/unix.c:init_converstions_charsets() obtains the locale charset and then tryes to determine OEM_CP by it. But most modern systems now use UTF-8 as locale charset. Hence, you always get "UTF-8", which is not informative at all for our needs (in the 6.10b you always use "CP866" when utf-8 etc.) The true way is to obtain local _language_, not charset -- ie. when LANG=ru_RU.UTF-8 you get "ru_RU". Then, using some table with pairs like "<lang> -- <typical_OEM_CP_for_this_lang>" we should determine the most correct OEM_CP . For example: ... { "en_US", "CP850" }, { "en_UK", "CP850" }, { "ru_RU", "CP866" }, ... and so on. I can provide some initial patch for this (if it is needed), but for the full "lang<-->OEM" table some additional researches should be done by somebody else. > Actually I remember a user proposing yet another approach using a character > recognition library. Yes, there is "enca" library. But for the correct character recognition the inspected data should be long enough (several tens of bytes) and should be correct language phrases (for the statistical analysis). Actually, file names often are short (fe. one word of some bytes) and often are acronyms. Hence character recognition seems not applicable here.
Anything you can give us to start with should save us time, and it really comes down to selecting what we can do in the time we get, so saving us time puts this higher on the list. Also, you probably know more about this than me, though that may change by the time the patch you provide is researched, wired in, tested, and documented. Oh, any additional description of the patch's operation would be helpful. Feel free to correct any issues with the current -I/-O code as well, if you can. Would appreciate you doing what you can, including creating any needed tables and putting any mappings you know. I suspect once the framework is there others will want to add mappings to it. Agree that character recognition using relatively short file paths probably would not be reliable enough to be useful. Note that UTF-8 is still the current zip community approach to converting paths. Specifically, most modern zip utilities tend to either (1) store the UTF-8 path directly (and should then set the new UTF-8 path flag added a few years ago to the zip standard) or (2)
Not sure what happened there. Continuing... or (2) store the local path and use the UTF-8 extra fields. This approach is more for dealing with less capable utilities and older archives. We would prefer, for backward compatibility, that this auto detection require an option to enable. If you can live with that, feel free to suggest an option letter or we'll do that.
Created attachment 515107 [details] The patch (hypothetical) for 6.10b Well, I've create this patch with the idea of lang<-->codeset table. Certainly the table is not complete. Moreover, for some locales it might be needed to check full language name (as of xx_XX) instead of the short one (xx). I am not sure whether it is useful to enable such auto-detection explicitly only -- normally we should avoid our users from running unzip each time with '-O its_cp', or causing them to put 'alias unzip="unzip -I its_cp"' into its .bashrc -- it seems much better to enable auto-detect by default...
Got the patch. Typically we don't like making changes that impact default behavior in a way that is not backward compatible. Unless what Zip or UnZip are doing now is considered broke to the point of being near useless. Not sure about this situation. It seems that this change should do the same thing where it makes sense and only change things when it doesn't. Anyway, we'll probably need to discuss this in the development group, but this change may make it into the next UnZip public beta, which we are starting to prepare now. Might go out in a couple weeks.
Bug still not fixed. Why?
Created attachment 1033515 [details] screenshot of File Roller
There is a modern patch to fix thix. https://bugzilla.redhat.com/show_bug.cgi?id=890188 I think this task can be closed. And will hope that libnatspec patch will be applied.
Oh, I miss this bug. It's already patched in rawhide - update of released fedoras isn't planned. Alternatively for released fedora you can try unzip from copr repository, where patch is applied as well: # dnf copr enable pstodulk/unzip
(In reply to pstodulk from comment #45) > Oh, I miss this bug. It's already patched in rawhide - update of released > fedoras isn't planned. Alternatively for released fedora you can try unzip > from copr repository, where patch is applied as well: > > # dnf copr enable pstodulk/unzip I am install unzip from "pstodulk/unzip" but it not helps :(
Created attachment 1097704 [details] screenshot 1
Created attachment 1097705 [details] screenshot 2
I see, you didn't use -I parameter. See help
(In reply to pstodulk from comment #49) > I see, you didn't use -I parameter. See help $ unzip -I 866 otchety.zip After applying "I 866" option extracted files names are right, but in console still saw '??????????????'.
Created attachment 1097765 [details] screenshot 3
(In reply to pstodulk from comment #49) > I see, you didn't use -I parameter. See help And what about the graphics utilities such as file-roller and Midnight Commander? It is impossible setup option "I" for they. :(
(In reply to Mikhail from comment #50) > (In reply to pstodulk from comment #49) > > I see, you didn't use -I parameter. See help > > $ unzip -I 866 otchety.zip > > After applying "I 866" option extracted files names are right, but in > console still saw '??????????????'. Very interesting, I extracted files with correct file names without option -I too, but I couldn't get correct file names when I get list file names with option -l.
Hmmm...do you mean list of filenames inside archive or just uncompressed files on system? If it is the first case, can you provide/upload the archive for testing?
(In reply to pstodulk from comment #54) > Hmmm...do you mean list of filenames inside archive or just uncompressed > files on system? If it is the first case, can you provide/upload the archive > for testing? yes. uncompressed files on system is ok (without I option), but list of filenames inside archive still have this issue.
Created attachment 1097835 [details] archive for testing
Thanks Mikhail, that's stil issue from my point of view, which should be fixed. I will look at it.
Created attachment 1098716 [details] fix print issue
New build in copr will be completed soon. You can try it. Btw, for this you should use -O parameter instead of -I, but it can be guessed by unzip correctly for CP866.
Thanks, I see that issue is fixed for console unzip and Midnight Commander, please see my attached video. But still not fixed for file-roller.
Created attachment 1098839 [details] screencast
That's problem of file-roller now. It was already reported - see bug 1177950
(In reply to pstodulk from comment #62) > That's problem of file-roller now. It was already reported - see bug 1177950 I can't read and subscribe to this issue :( ==================================================== You are not authorized to access bug #1177950. Most likely the bug has been restricted for internal development processes and we cannot grant access. If you are a Red Hat customer with an active subscription, please visit the Red Hat Customer Portal for assistance with your issue If you are a Fedora Project user and require assistance, please consider using one of the mailing lists we host for the Fedora Project.
Ah. Sorry I miss that. So you can report it to fedora rawhide if you want.