Bug 1848199 - RPM doesn't seem to include symlinks files for size calculation
Summary: RPM doesn't seem to include symlinks files for size calculation
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: rpm
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Packaging Maintenance Team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-17 23:03 UTC by Jan Pokorný [poki]
Modified: 2021-02-05 15:04 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-05 15:04:42 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Jan Pokorný [poki] 2020-06-17 23:03:53 UTC
See https://pagure.io/minimization/issue/20 for details.

Those files are not %ghost'd, they are carried right within *.rpm,
hence they are as physical as any other delivered bits.  Any exception
win size accounting is questionable, but perhaps it is merely
an unintended bug.

Comment 1 Jan Pokorný [poki] 2020-06-17 23:04:28 UTC
s/win/with/

Comment 2 Jan Pokorný [poki] 2020-06-17 23:07:18 UTC
$ rpm -q rpm
> rpm-4.16.0-0.beta1.2.fc33.1.x86_64

Comment 3 Mark Wielaard 2020-06-17 23:16:53 UTC
Looks like the issue is that those files are symlinks. If you take a package like xz which has multiple other symlinks you'll see those are also not included in the "Size".

Comment 4 Jan Pokorný [poki] 2020-06-18 05:46:27 UTC
Good catch ... but these files occupy the space on disk nonetheless,
right?  In a non-metadata sense, I mean.

Comment 5 Jan Pokorný [poki] 2020-06-18 06:03:43 UTC
What I'd see as an convincing argument to include size of the symlinks
themselves:

  When I ask system how much space on disk there's left (df), and
  assuming no thin provisioning/BTRFS fanciness, I obtain the
  space that I can deliberately use to fill up with symlinks only
  (there's a room for basic metadata for them like modification
  time stamp, ownerhip and permissions info already reserved to
  some extent, and when overflown, accounted separately, and those
  limits are likely to be hit only after the actual user-visible
  free space if filled up).

If I am not mistaken, this "bug" could also mean that artificial RPM
package with millions of symlinks would not refuse to install early
on when there is not enough free space to accommodate them as it
should be, but only later with the actual IO operations failing,
which is rather undesired, since the transaction is interrupted in
middle.

Comment 6 Panu Matilainen 2020-06-18 06:48:49 UTC
Mark's comment didn't imply there is no bug, just that it's not specifically about build-ids.

Fix submitted upstream now https://github.com/rpm-software-management/rpm/pull/1275 but this doesn't affect any rpm operations, individual file sizes are always used for disk-space accounting. The size tag is an informative thing mostly aimed at humans, as the total installed size can vary drastically depending on file policies such as whether documentation, translations etc are installed or not.

Inspired by this, I also submitted a patch to optionally exclude artifact files, which allows you to eliminate those build-id links entirely (such as for absolutely minimal installations)

Comment 7 Jan Pokorný [poki] 2020-06-18 08:00:14 UTC
> Mark's comment didn't imply there is no bug

To clarify, for me it wasn't implied by Mark, just by thought process
silently, reflecting on how mature mature/stable RPM is at this point (respect),
in addition to how widely it gets used.  That indicates me there are quite some
chances that any shenanigans may in fact be an intentional behaviour with
a twisted history (and I am perhaps too lazy to dive into commit log) :-)

> Fix submitted upstream

Thanks

> Inspired by this, I also submitted a patch to optionally exclude artifact
> files, which allows you to eliminate those build-id links entirely
> (such as for absolutely minimal installations)

Oh, thanks:
https://github.com/rpm-software-management/rpm/pull/1274

Shouldn't it be presented as something like "non-essential integration data
making it easier, for instance, to debug the packaged software"?

Are not .build-id links distro-specific?

Comment 8 Panu Matilainen 2020-06-18 08:27:41 UTC
> Shouldn't it be presented as something like "non-essential integration data
> making it easier, for instance, to debug the packaged software"?

The concept of "artifact" in rpm isn't strictly defined anywhere, but the idea is that it could be used for various things that are created as a by-product of package generation. I suppose "non-essential integration data" would not be a bad definition. For example, byte-compiled Python files could also be seen as artifacts (but are not currently marked as such), but build-id links are the only thing it's currently used for.

> Are not .build-id links distro-specific?

Not sure what you mean by that. There are some config options where the links get placed and that's kinda distro-specific, but in the variant Fedora uses, the links point to files in the same package.

Comment 9 Jan Pokorný [poki] 2020-06-18 09:00:35 UTC
> Not sure what you mean by that.

Oh, I've meant there was a time even Fedora packages did not carry
anything like that.  If that's rather optinal/configurable thing in
the RPM ecosystem, not sure if it deserves to be stated as a primary
example in upstream=generic sense in RPM documentation if there is
no further explanation what that means and regardless whether the
system at hand uses it all.

Vague description reflecting "we shall yet figure out the details"
would perhaps be better fit than such specifics.

Comment 10 Mark Wielaard 2020-06-22 10:05:45 UTC
(In reply to Jan Pokorný [poki] from comment #9)
> > Not sure what you mean by that.
> 
> Oh, I've meant there was a time even Fedora packages did not carry
> anything like that.  If that's rather optinal/configurable thing in
> the RPM ecosystem, not sure if it deserves to be stated as a primary
> example in upstream=generic sense in RPM documentation if there is
> no further explanation what that means and regardless whether the
> system at hand uses it all.
> 
> Vague description reflecting "we shall yet figure out the details"
> would perhaps be better fit than such specifics.

Are you looking for the documentation about the various ways macros can be used to instruct rpmbuild have to handle debuginfo?
That is all documented in the (upstream) macros file:

#
# Should an ELF file processed by find-debuginfo.sh having no build ID
# terminate a build?  This is left undefined to disable it and defined to
# enable.
#
#%_missing_build_ids_terminate_build    1

#
# Include minimal debug information in build binaries.
# Requires _enable_debug_packages.
#
#%_include_minidebuginfo        1

#
# Include a .gdb_index section in the .debug files.
# Requires _enable_debug_packages and gdb-add-index installed.
#
#%_include_gdb_index    1

#
# Defines how and if build_id links are generated for ELF files.
# The following settings are supported:
#
# - none
#   No build_id links are generated.
#
# - alldebug
#   build_id links are generated only when the __debug_package global is
#   defined. This will generate build_id links in the -debuginfo package
#   for both the main file as /usr/lib/debug/.build-id/xx/yyy and for
#   the .debug file as /usr/lib/debug/.build-id/xx/yyy.debug.
#   This is the old style build_id links as generated by the original
#   find-debuginfo.sh script.
#
# - separate
#   build_id links are generate for all binary packages. If this is a
#   main package (the __debug_package global isn't set) then the
#   build_id link is generated as /usr/lib/.build-id/xx/yyy. If this is
#   a -debuginfo package (the __debug_package global is set) then the
#   build_id link is generated as /usr/lib/debug/.build-id/xx/yyy.
#
# - compat
#   Same as for "separate" but if the __debug_package global is set then
#   the -debuginfo package will have a compatibility link for the main
#   ELF /usr/lib/debug/.build-id/xx/yyy -> /usr/lib/.build-id/xx/yyy
%_build_id_links compat

# Whether build-ids should be made unique between package version/releases
# when generating debuginfo packages. If set to 1 this will pass
# --build-id-seed "%{VERSION}-%{RELEASE}" to find-debuginfo.sh which will
# pass it onto debugedit --build-id-seed to be used to prime the build-id
# note hash.
%_unique_build_ids      1

# Do not recompute build-ids but keep whatever is in the ELF file already.
# Cannot be used together with _unique_build_ids (which forces recomputation).
# Defaults to undefined (unset).
#%_no_recompute_build_ids 1

# Whether .debug files should be made unique between package version,
# release and architecture. If set to 1 this will pass
# --unique-debug-suffix "-%{VERSION}-%{RELEASE}.%{_arch} find-debuginfo.sh
# to create debuginfo files which end in -<ver>-<rel>.<arch>.debug
# Requires _unique_build_ids.
%_unique_debug_names    1

# Whether the /usr/debug/src/<package> directories should be unique between
# package version, release and architecture. If set to 1 this will pass
# --unique-debug-src-base "%{name}-%{VERSION}-%{RELEASE}.%{_arch}" to
# find-debuginfo.sh to name the directory under /usr/debug/src as
# <name>-<ver>-<rel>.<arch>.
%_unique_debug_srcs     1

# Whether rpm should put debug source files into its own subpackage
#%_debugsource_packages 1

# Whether rpm should create extra debuginfo packages for each subpackage
#%_debuginfo_subpackages 1

Comment 11 Jan Pokorný [poki] 2020-06-23 10:26:08 UTC
re [comment 10]:

> Are you looking for the documentation about the various ways macros

Oh, to clarify some more, perhaps.

I think there should be some distinction between a common (~user) knowledge
and internal details that one shall not be concerned about normally.

It is more difficult in RPM world, since internal details get occasionally
exposed in the parts normally visible by users, such as with (fittingly
named!) _artificial_ .build-id links that will get visible with standard
user-triggered package listing/query.

So far so good, perhaps, even though I think, since that gets leaked
"publicly" like that, users would deserve to know more about that from
the most authoritative place -- rpm's man pages?  Definitely, it's not
something users shall seek explanation for in the internalsl, which
macros mostly are, correct?  (I mean, when you flip the mode and look
at it from packager's perspective, it becomes your API of sorts, but
not earlier.)

And this is the angle I looked at it from regarding explicit mention of
".build-id links" in rpm man page.  At that point, I think you mix two
rather different levels of familiarity with RPM, at the expense of leaving
plain users rather baffled and possibly up to said explanation of the
internals.

Trying to see the situation from this user's perspective, I indeed think
that ".build-id links" open loop shall not be started just for the sake
of example now, or the contrary, it shall be looped into a newly added
segment in RPM man pages that would detail the concept of artificial
files and how ".build-id links" are generated with rpmbuild by default,
plus briefly what it is good for.

Thanks for keeping (any) technical documentation as loophole-less as
possible!

(and don't get me wrong, I don't mean that everything needs a detailed
explanation, relative brevity is also important, and it's OK to consider
many of similar concepts, such as "documentation" and "licences" as
self-understood, it's really this specific ".build-id" reference
that, IMHO, opens unnecessary loophole towards the internals)

Comment 12 Panu Matilainen 2020-06-23 10:53:30 UTC
Closed-loop documentation is a nice ideology but the rpm reality is so far off it's just not applicable. There's simply no place to put such information as explaining %artifact (never mind build-ids) would require creating a manpage that explains every single feature of rpm spec files. It'd be nice to have of course, but it's a *daunting* task.

Most of the switches in rpm manual are aimed towards a rather advanced user, and --excludeartifacts and other build-id/debuginfo switches are no exception. I'm not going to withhold a useful hint from the intended target audience just because it might baffle the newbie.

Comment 13 Jan Pokorný [poki] 2020-06-23 21:08:07 UTC
I understand this practical perspective, and just to remind, having
slightly more information doesn't imply concurrently equalizing
the depth of information overall even though it would be then natural
in the long long term to provide coherent knowledge takeaway free of
code exhamination).  (I remember early days of my RPM packaging, it
was quite a steep curve just because of information and the answers to
questions like why this and what else can be done were too scattered;
eventually, it was liberating to "use the source", and only then
some things "clicked", like that some macro functions are actually
hardcoded in the parser).

Hopefully it was worthy to show this perspective, which is hopefully
clear by now, and it's OK to defer/refuse any follow up.

Comment 14 Ben Cotton 2020-08-11 13:38:47 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 15 Panu Matilainen 2021-02-05 15:04:42 UTC
This was fixed in rpm 4.16 final, only the bug has been left behind.

Thanks for the report!


Note You need to log in before you can comment on or make changes to this bug.