Bug 825568 - Midnight commander lists whole archive contents before extracting each file
Midnight commander lists whole archive contents before extracting each file
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: mc (Show other bugs)
6.3
x86_64 Linux
unspecified Severity unspecified
: rc
: ---
Assigned To: Denys Vlasenko
BaseOS QE - Apps
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-27 17:37 EDT by Jiri Pospisil
Modified: 2013-07-04 04:13 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-25 06:20:21 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jiri Pospisil 2012-05-27 17:37:05 EDT
Description of problem:
When extracting a couple of files from a zip archive via mc, mc calls unzip -l on the archive before extraction of each file (it's probably checking if the file is present). This make extracting files a lot slower when working with large archive

Version-Release number of selected component (if applicable):
4.7.0.2

How reproducible:

Steps to Reproduce:
1. create a huge zip archive (e.g. a million small files)
2. extract a few (e.g. 5) files from the archive via unzip
3. navigate to the archive in mc, mark the same files and extract them
4. compare how long took steps 2. and 3.

Actual results:
Extraction via mc is a lot slower - in my case (1.6M files, extracting three of them) unzip is instant, extraction via mc takes about one minute

Expected results:
Extraction takes approximately the same time

Additional info:
Comment 2 RHEL Product and Program Management 2012-09-07 01:36:20 EDT
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.
Comment 3 Denys Vlasenko 2013-02-22 11:50:37 EST
Unpacking zip files is done by /usr/libexec/mc/extfs.d/uzip script (in mc source tree it is src/vfs/extfs/helpers/uzip.in).

This script will run the following command:

/usr/bin/unzip -p FILE.zip NAME_IN_ARCHIVE >/tmp/mc-root/extRANDOM

for each unpacked file. This shouldn't be much slower than unzip in "extract a few files" scenario: at worst, it will be 5 times slower than using unzip to extract all 5 files in one command invocation. I will take a look now why it is slow (maybe processing of huge file list is slow?)

Meanwhile, I have my doubts about that script safety wrt names with spaces,
like "file >/etc/passwd" :(
Comment 4 Denys Vlasenko 2013-02-22 13:14:25 EST
Copying out one file from 100000 file archive.

mc executes this:
/usr/libexec/mc/extfs.d/uzip copyout /path/z.zip z/f1054 /tmp/mc-root/extfsQmUzPaf1054

which in turn executes:
/usr/bin/unzip -Z -l -T \/path\/z\.zip

which is a command to list all files in the archive. which is slow. which isn't helped one iota by unzip not buffering its output at all and executing three syscalls to output one line:

19:05:52.130997 write(1, "z/f31064", 8) = 8
19:05:52.131100 ioctl(1, TIOCGWINSZ, 0xbfcd8ab8) = -1 ENOTTY (Inappropriate ioctl for device)
19:05:52.131212 write(1, "\n", 1)       = 1

This listing is triggered in zipfs_realpathname() call in uzip:
sub mczipfs_copyout {
HERE->  my ($qafile, $qfsfile) = map { &zipquotemeta(zipfs_realpathname($_)) } @_;
        &checkargs(1, 'archive file', @_);
        &checkargs(2, 'local file', @_);
        &safesystem("$cmd_extract $qarchive $qafile >$qfsfile", 11);
  exit;
}
...

Apparently zipfs_realpathname() is needed to fix some problem with mangled names:

# The Midnight Commander never calls this script with archive pathnames
# starting with either "./" or "../". Some ZIP files contain such names,
# so we need to build a translation table for them.
my $zipfs_realpathname_table = undef;
sub zipfs_realpathname($) {
    my ($fname) = @_;

    if (!defined($zipfs_realpathname_table)) {
        $zipfs_realpathname_table = {};
        if (!open(ZIP, "$cmd_list $qarchive |")) {
            return $fname;
        }
...
Comment 5 Denys Vlasenko 2013-02-22 13:21:49 EST
.tar.gz archives work even more slowly.
Comment 6 Denys Vlasenko 2013-02-25 06:20:21 EST
I think fixing this bug requires a serious overhaul of mc's virtual filesystem code.

Not only archive handling helpers need to be better (say, they need to handle filenames with all the weird characters such as newline), but it looks like data structures mc uses to represent list of files are not efficient for millions of files.

I think this needs to be fixed in upstream first.

Closing as WONTFIX.

Please reopen and explain if you think this really is an urgent thing to fix.

Note You need to log in before you can comment on or make changes to this bug.