Bug 128335

Summary: rpm doesn't handle sparse files efficiently
Product: [Fedora] Fedora Reporter: Jakub Jelinek <jakub>
Component: rpmAssignee: Paul Nasrat <nobody+pnasrat>
Status: CLOSED WONTFIX QA Contact: Mike McLean <mikem>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: drepper, herrold, laroche, nobody+pnasrat, yersinia.spiros
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-04-04 11:16:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 150224    

Description Jakub Jelinek 2004-07-21 20:37:40 UTC
Some ELF programs/shared libraries are sparse.
E.g. on x86-64 if -Wl,-z,relro is used, most of the shared libraries
have around 1MB of zeros in the middle.
$ rpm -qf --qf '%{name}-%{version}-%{release}.%{arch}\n' /usr/lib64/gconv/IBM904.so
glibc-2.3.3-37.x86_64
$ cp -a --sparse=always /usr/lib64/gconv{,.new}
$ du -sk /usr/lib64/gconv{,.new}
197972  /usr/lib64/gconv
6368    /usr/lib64/gconv.new

Either rpm could do what e.g. cpio --sparse does when unpacking,
or it could special case just ELF files which have sufficiently big
gap between PT_LOAD segments.

Comment 1 Jeff Johnson 2004-07-26 12:54:31 UTC
rpm decompresses into a mmap'd buffer, so sparse files
are filled in by zlib I believe.

Possibly blocks of zero's can be eliminated after installing.

Special handling for PT_LOAD and elf within cpio unpacking
is almost certainly not the right approach.

Comment 2 Ulrich Drepper 2004-07-26 15:07:19 UTC
> Special handling for PT_LOAD and elf within cpio unpacking
> is almost certainly not the right approach.

If you don't want to do this, it is required that you record somehow
somewere in the .rpm file which parts of the original files have been
sparsely allocated.  The problem is that you in general cannot just
force every file to be sparse.  There are noticeable difference: the
file layout will be non-sequential in some cases (-> performance), or
no disk space can be allocated for a file once data is written to the
sparse area.

On the o ther hand, these issues will never pop up with executables. 
So it makes sense to treat them special if you don't want to do it
100% right.

Comment 3 Jeff Johnson 2004-07-28 20:59:17 UTC
OK, I know how to implement sparse file handling in rpm for PT_LOAD
segment gaps.

If this change is actually necessary, rather than desirable,
then I suggest you expedite through other channels than bugzilla.

Comment 4 Jeff Johnson 2004-12-08 05:13:39 UTC
NEEDINFO, I know what to do in rpm, but perhaps installing
all executables sparsely needs to be carefully thought through
before attempting an implementation.

I've never seen any unix user who was not surprised by, say,
    cp -R /usr /var/tmp
when there are sparse files that are involved,
one's disk space mysteriously vanishes. And yes I know
there are options to handle sparse files correctly, I
just don't believe those options are widely known or used.

Comment 5 Ulrich Drepper 2005-01-04 20:58:37 UTC
I cannot see any problem with installing all binaries sparsely.  You
can recognize ELF binaries easily and then treat them appropriately.

Comment 6 Jeff Johnson 2005-01-08 01:10:08 UTC
Sure there's no problem installing elf files sparsely in rpm.

The problem is the change when users do "cp -R" or any other
command that does not copy files sparsely. Disk space can/will
be used when sparse blocks are filled in, that is invariably
surprising to end-users.

Again, not my call. You want sparse file handling in rpm, it's
easier to implement than to discuss the merits of sparse files.

Off to distribution, please attach to some tracking bug so that
I have a clue when and where sparse file rpm installs are desired.

Comment 7 Paul Nasrat 2005-02-08 19:33:02 UTC
Currently not being considered for U5 unless informed otherwise.