Bug 1380878 - glibc: Please consider adding RPM file triggers for ldconfig
Summary: glibc: Please consider adding RPM file triggers for ldconfig
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: 27
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: glibc team
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 1381764 1389956
Blocks: 1566485
TreeView+ depends on / blocked
 
Reported: 2016-09-30 22:36 UTC by Jason Tibbitts
Modified: 2018-04-12 12:18 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-24 10:04:31 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Patch adding file triggers to call ldconfig automatically. (1.03 KB, patch)
2016-09-30 22:36 UTC, Jason Tibbitts
no flags Details | Diff
DSOs without symbolic links with their soname (6.05 KB, text/plain)
2016-10-04 08:19 UTC, Florian Weimer
no flags Details
Packages which install alternative library search paths (84.39 KB, text/plain)
2016-10-04 08:21 UTC, Florian Weimer
no flags Details

Description Jason Tibbitts 2016-09-30 22:36:07 UTC
Created attachment 1206376 [details]
Patch adding file triggers to call ldconfig automatically.

All Fedora releases include a version of RPM which supports file triggers as described at http://www.rpm.org/wiki/FileTriggers.

The packaging committee would really like to be able to tell packagers that they no longer have to paste various arcane scriptlets into their packages.  Adding the relevant triggers to glibc should allow the ldconfig scriptlets to be dropped from essentially every library-containing package in rawhide.  (Or other releases if you wanted to add them there, though that's not what we're requesting.)  Several packages have already added such triggers as of F24 and they have proven to be stable and functional.  You can see the list at https://fedoraproject.org/wiki/Packaging:Scriptlets (noted in the boxes which say "F23 only" or the like).

Attached is a patch which adds two triggers (%transfiletriggerin and %transfiletriggerpostun) to glibc, which will run ldconfig automatically when an RPM operation adds or removes files in /lib, /usr/lib, /lib64, /usr/lib64 or /etc/ld.so.conf.d.  I have included comments, but the relevant lines are just: 

%transfiletriggerin -P 2000000 -- /lib /usr/lib /lib64 /usr/lib64 /etc/ld.so.conf.d
/usr/sbin/ldconfig

%transfiletriggerpostun -P 2000000 -- /lib /usr/lib /lib64 /usr/lib64 /etc/ld.so.conf.d
/usr/sbin/ldconfig

which is merely an extension of the example in the RPM documentation linked above.

Note that this does the ldconfig call up to twice at the end of the RPM transaction, which is technically a change in behavior since currently the commonly used scriptlets will run after each package is installed/removed.  My understanding is that ldconfig is merely a performance improvement and having it run only at the end of the transaction should cause no issues.  This mirrors what other distributions are currently doing.  However, if it is desirable to do this at the end of each individual package installation and removal, the triggers can be changed to %filetriggerin and %filetriggerpostun.  There will be no performance improvement from performing the operation only once at the end of the transaction, but packagers will still be able to remove the scriptlets from their packages.

I would be happy to update the package for you if you like.

Thanks!

Comment 1 Florian Weimer 2016-10-04 08:13:30 UTC
If we want to go down this route, we need to change the Package Guidelines in two aspects (in addition to dropping the explicit ldconfig invocations):

(1) All packages which ship shared objects need to ship the symbolic links which would be created by ldconfig.  At present, there are a few exceptions, listed in a forthcoming attachment.  Such packages, when installed, are not usable until the transaction ends.  This means that RPM dependencies do not work as expected during installation, and if an installation is aborted, the system is more likely to be in an unusable and difficult-to-recover state.

(2) Packages must not add additional library search paths via /etc/ld.so.conf.d.  Such directories are not searched by ld.so on its own, so the same problem as the missing symbolic links applies.

If the package guidelines are changed in these two ways, we won't have to run ldconfig in the middle of the installation process anymore.

Comment 2 Florian Weimer 2016-10-04 08:19:26 UTC
Created attachment 1207100 [details]
DSOs without symbolic links with their soname

symboldb query:

SELECT symboldb.nevra(package), file.name, elf_file.soname
  FROM symboldb.package
  JOIN symboldb.file USING (package_id)
  JOIN symboldb.elf_file USING (contents_id)
  WHERE (file.name ~ '^/usr/lib64/lib([^/]*)$'
         OR file.name ~ '^/lib64/lib([^/]*)$')
    AND elf_file.soname <> symboldb.basename(file.name)
    AND NOT EXISTS (SELECT 1 FROM symboldb.symlink
                      WHERE package_id = file.package_id
                        AND name = '/usr/lib64/' || elf_file.soname)
    AND NOT EXISTS (SELECT 1 FROM symboldb.symlink
                      WHERE package_id = file.package_id
                      AND name = '/lib64/' || elf_file.soname)
  ORDER BY 1, 2;

Comment 3 Florian Weimer 2016-10-04 08:21:25 UTC
Created attachment 1207101 [details]
Packages which install alternative library search paths

symboldb query:

SELECT symboldb.nevra(package), file.name,
    encode(file_contents.contents, 'escape')
  FROM symboldb.package
  JOIN symboldb.file USING (package_id)
  JOIN symboldb.file_contents USING (contents_id)
  WHERE file.name LIKE '/etc/ld.so.conf.d/%'
  ORDER BY 1, 2

Comment 4 Jason Tibbitts 2016-10-04 15:33:52 UTC
I guess there are really three options:

0) Do nothing

1) Use %filetrigger(in|postun)

2) Use %transfiletrigger(in|postun)

Doing nothing is always possible, but anything which simplifies life for packagers is good, so...

If I'm understanding correctly, #1 lets packagers drop the scriptlets with no other changes required.  It was probably what I should have suggested, but I was swayed by influence from what Mageia does.

#2 does what #1 does but also gives a performance bonus by only running ldconfig once at the end of the transaction.  However, it requires... other things.

Is my understanding correct there?  Can we do #1 now with little extra effort and then consider more deeply whether we want to move in the direction of #2?  I'm not even sure the performance savings would be all that worth it.

I'm curious as to how other distributions who use file triggers for this (which as I understand is pretty much all of them) handle this case.  I know Mageia does #2, and I think OpenSuse does as well, but they might also have additional requirements for libraries which might mirror the changes you mention above.

Also, what are those symboldb queries?  Is that database public somewhere?

Comment 5 Jakub Jelinek 2016-10-04 15:42:42 UTC
If you want to trigger ldconfig on any changes in e.g. /usr/lib or /usr/lib64 directories, won't that slow rpm updates down significantly?  I mean, shared libraries aren't the only files that are in those directories, static libraries are too, or if it applies also for all subdirectories, then lots of packages have tons of files that aren't shared libraries in there.
rpm knows what subpackages contain shared libraries, it computes e.g. the provides/requires for them, so if you want automation, it would be better to design some rpm extension for that.

Comment 6 Jason Tibbitts 2016-10-04 16:37:43 UTC
This is exactly the extension that RPM has designed to do this.  It's explicitly mentioned as the solution for this in the RPM documentation.  It's proven sufficient for other RPM-based distros for a while now.

To be clear, let's take %filetriggerin.  The glibc package declares something like this:

%filetriggerin -- /lib /usr/lib /lib64 /usr/lib64
shell
script
here

Now that shell script will get called after each package installation where something was installed into any of those four directories.  The script will be provided with all of the matching pathnames on standard input.

So if extraneous ldconfig calls are a concern, it is rather easy to grep for anything matching *.so* (or the pattern of your choice) and only call ldconfig when necessary.

Obviously a similar %filetriggerpostun is needed to handle the simple package removal case.

Mageia at least from what I can tell does not do any fanciness to limit ldconfig calls, but it also uses %transfiletriggerin/postun so that ldconfig only gets called at the end of the transaction.  And note that I'm not trying to ape Mageia here; it's simply the example I had at hand.  Also note that I'm far from an expert in this; FPC just has a goal of eliminating packager-provided boilerplate scriptlets wherever possible.

Comment 7 Zbigniew Jędrzejewski-Szmek 2016-10-04 23:17:23 UTC
Shouldn't those packages on the list just keep calling ldconfig as before for now? At least those in list 2. Those in list 1 are more complicted, so maybe they need to fixed, but anyway, it should safe to keep them calling ldconfig in %post like they do now.

Comment 8 Florian Weimer 2016-10-05 07:42:59 UTC
(In reply to Jason Tibbitts from comment #6)
> This is exactly the extension that RPM has designed to do this.  It's
> explicitly mentioned as the solution for this in the RPM documentation. 
> It's proven sufficient for other RPM-based distros for a while now.
> 
> To be clear, let's take %filetriggerin.  The glibc package declares
> something like this:
> 
> %filetriggerin -- /lib /usr/lib /lib64 /usr/lib64
> shell
> script
> here
> 
> Now that shell script will get called after each package installation where
> something was installed into any of those four directories.  The script will
> be provided with all of the matching pathnames on standard input.

And the additional shell invocation is really faster than calling ldconfig unconditionally?  You'll need many shared objects for it to matter.

The %filetriggerin approach is a rather big hammer.

> Mageia at least from what I can tell does not do any fanciness to limit
> ldconfig calls, but it also uses %transfiletriggerin/postun so that ldconfig
> only gets called at the end of the transaction.  And note that I'm not
> trying to ape Mageia here; it's simply the example I had at hand.  Also note
> that I'm far from an expert in this; FPC just has a goal of eliminating
> packager-provided boilerplate scriptlets wherever possible.

Do you have a proposal for the FPC wording change already?  My main concern here is that we end up with yet another half-finished transition.

Comment 9 Jason Tibbitts 2016-10-05 18:20:16 UTC
(In reply to Florian Weimer from comment #8)

> And the additional shell invocation is really faster than calling ldconfig
> unconditionally?  You'll need many shared objects for it to matter.

I don't know; Jakub stated that there could be a performance consideration; I indicated the way RPM provides to only call ldconfig when necessary.

> The %filetriggerin approach is a rather big hammer.

Well, OK.  I suggested %transfiletriggerin, was told it would require significant changes elsewhere.  I suggested %filetriggerin and am told that it's a big hammer.

> Do you have a proposal for the FPC wording change already?

I haven't filed one, and there's not much point in that without knowing exactly what of the various options would be chosen and just what would have to change to accommodate.

With %filetriggerin/postun, this would be just sticking the note at top of https://fedoraproject.org/wiki/Packaging:Scriptlets#Shared_libraries saying "F25 and older only".

> My main concern here is that we end up with yet another half-finished 
> transition.

In the case of %filetriggerin/postun, extra calls don't really hurt anything; I'd have a report that greps the specs to note the packages which are still using the scriptlets.

If we go for something more complicated, then we make a list of requirements, FPC takes a look at what needs to change, and we work on a plan.  If there's any way at all to automatically detect packages which need updating, then we make reports out of that, tell packagers what needs to change, and eventually get down to having some provenpackagers just making the required changes.

Neither FPC nor I personally have any interest in a half-finished solution.  Certainly this is more complicated than I'd originally but I don't think it's impossible. The real question is whether it's worth the trouble.

To go back to comment #1, is this actually the whole of the changes required if we were to use %transfiletriggerin/postun?  I just want to double check.

For the first point, basically the requirement is that the symlinks be included.  Are these easy to generate?  It doesn't seem like it should be difficult, but... if we just replace two single-line scriptlets with a call to something and some extra lines in %files, well, I'm not sure that simplifies things measurably.

As for the second point, I'm unclear if why that's a requirement.  Surely a %filetriggerin matching /etc/ld.so.conf.d and running ldconfig could handle the case.

Comment 10 Jakub Jelinek 2016-10-06 06:56:09 UTC
(In reply to Jason Tibbitts from comment #9)
> (In reply to Florian Weimer from comment #8)
> 
> > And the additional shell invocation is really faster than calling ldconfig
> > unconditionally?  You'll need many shared objects for it to matter.
> 
> I don't know; Jakub stated that there could be a performance consideration;
> I indicated the way RPM provides to only call ldconfig when necessary.

Running ldconfig 1000 times during a transaction certainly isn't in the noise, that problem has been raised many times in the past, just search bugzilla.
Running an extra shell script 10000 times during transaction (for each package that contains any files whatsoever in the /usr/lib64 etc. subdirectories) isn't free either, especially if you call grep etc. in there.  So, unless you can really narrow it into just packages that actually contain shared libraries, or say use a lua script there that will narrow it down to only those packages, I'd be worried about transaction slowdowns.

Comment 11 Jason Tibbitts 2016-10-06 22:21:32 UTC
I could probably write something in lua, yes, if you really think that's the way to go.  (I have to look at how lua would get the fil.)  It might be an interesting exercise in any case.

But still, testing has been done which apparently shows that full from-scratch installations work just fine if you never even call ldconfig.  Supposedly Mageia simply runs it once at the end of the transaction and never bothers packaging the symlinks.  It may be possible to construct a situation where problems would arise, but the other folks in the recent packaging committee meeting couldn't think of one.

Anyway, it should be reasonably easy to construct a mutant glibc package which has ldconfig linked to /bin/true and a call to a real ldconfig in %transfiletriggerin.  Doing some installs with that would at least be interesting and should point out any obvious problems.

Comment 12 Carlos O'Donell 2016-10-07 00:50:09 UTC
(In reply to Jason Tibbitts from comment #11)
> I could probably write something in lua, yes, if you really think that's the
> way to go.  (I have to look at how lua would get the fil.)  It might be an
> interesting exercise in any case.

Using lua is probably less costly than spawing a shell to process the regexp to find if a DSO is present and then call ldconfig, but benchmarking would be required.

I understand the desire to make packaging easier, but since this change is going to require active work (the eventual removal of old scriptlets) and educational changes to remove cargo-culting (copy-and-pasting the ldconfig scriptlet), we better end up with a final solution that is as optimal as our previous one or better.
 
> But still, testing has been done which apparently shows that full
> from-scratch installations work just fine if you never even call ldconfig. 

Except that this is not true for the packages in comment #3 which add custom library search paths under the assumption they need those paths. The use of ldconfig in this case is a requirement and not an optimization.

> Supposedly Mageia simply runs it once at the end of the transaction and
> never bothers packaging the symlinks.  It may be possible to construct a
> situation where problems would arise, but the other folks in the recent
> packaging committee meeting couldn't think of one.

It arises any time you have a shared library whose filename doesn't match the SONAME of the API being exported. It's a trivial case that we must get right.

cat >> bar.c <<EOF
/* Drop-in replacement for libfoo.so API.  */
#include <stdio.h>
void
bar (void)
{
  printf ("Called bar\n");
}
EOF
gcc -Wl,-soname,libfoo.so -fPIC -shared -o libbar.so bar.c
gcc -o main main.c -lbar -L.

$ ./main
./main: error while loading shared libraries: libfoo.so: cannot open shared object file: No such file or directory

$ ldconfig

$ ./main
Called bar

Without running ldconfig the libfoo.so -> libbar.so symlink is missing and the application cannot work.

The package shipping libbar.so must also ship libfoo.so as Florian states in comment 1 item 1.
 
> Anyway, it should be reasonably easy to construct a mutant glibc package
> which has ldconfig linked to /bin/true and a call to a real ldconfig in
> %transfiletriggerin.  Doing some installs with that would at least be
> interesting and should point out any obvious problems.

No, it would just leave you with a brittle system that might break in unexpected ways. This is not a robust solution.

To wind all the way back to a summary:

(a) The present day solution is optimal from a performance and technical design perspective. Only those packages that need or want to call ldconfig do so. The cache is kept up to date. The cost of running ldconfig is minimal (not minimized). The performance of the dynamic loader cache is the benefit.

(b) The present day solution requires shared library packagers that need custom search paths to know and understand how ldconfig and the dynamic loader cache work (a feature designed specifically for them).

(c) The present day solution requires shared library packagers looking to reduce their startup times to know and understand how ldconfig works and when to call it in their scriptlets.

(d) An optimal solution (fewer ldconfig calls and the removal of ldconfig scriptlets from all packages shipping DSOs) in the form of %transfiletriggerin et.al. exists, but in order to capitalize on it we must (again as Florian notes in comment 1) have packages ship all shared library links that ldconfig would have created (to satisfy RPM dependencies) and stop using custom search paths (they won't be searched until ldconfig runs at transaction end and is thus the same problem e.g. RPM dependency not satisfied).

(e) A big-hammer solution (potentially more ldconfig calls, but still the removal of scriptlets from all packages shipping DSOs) in the form of %filetriggerin et.al. exists, but could significantly impact the performance of transactions if we don't find a way to limit the ldconfig calls.

Next steps:

It is my opinion that (d) is a non-starter since removing custom library search paths would require rearchitecting some upstream packages and that might not be palatable upstream. The custom search paths can almost always be replaced by dynamic string tokens in the binary or library e.g. $ORIGIN relative paths in DT_RUNPATH, but it is significant work.

A desire to remove 'paste various arcane scriptlets into their packages' has a solution (d) which is optimal but considerably work, and a solution (e) which could have serious transaction performance costs e.g. disk IO which you pay for in the cloud, drive wear, etc.

One way forward is solution (e) coupled with lua (to filter and reduce ldconfig calls) and a performance test of the number of ldconfig calls are similar.

Florian, Jakub, Did I miss anything?

Comment 13 Jason Tibbitts 2016-10-07 03:32:57 UTC
(In reply to Carlos O'Donell from comment #12)
> Using lua is probably less costly than spawing a shell to process the regexp
> to find if a DSO is present and then call ldconfig, but benchmarking would
> be required.

Honestly I'd think the speed of the thing would be further down in the list of considerations.  But you have my thanks for addressing the correctness of the various options.

It occurs to me that a simple C program to do the same thing would also work, but then it would have to be linked statically and might be considered too much bother.  I'm also assuming that modifying ldconfig to handle this itself is way out of the question.

> I understand the desire to make packaging easier, but since this change is
> going to require active work (the eventual removal of old scriptlets) and
> educational changes to remove cargo-culting (copy-and-pasting the ldconfig
> scriptlet), we better end up with a final solution that is as optimal as our
> previous one or better.

Well, sure, but nobody is asking you to do any of the work to change things.  And, really, the cost of a package that hasn't yet transitioned is nothing more than an extra call to ldconfig when that package is installed or removed.

> Except that this is not true for the packages in comment #3 which add custom
> library search paths under the assumption they need those paths. The use of
> ldconfig in this case is a requirement and not an optimization.

OK, perhaps those are rare enough that the test package set just didn't include one.  But I don't see why that isn't trivially handled by a %filetriggerin/postun pair (on /etc/ld.so.conf.d).  Or am I really just being dense?  This honestly seems like the least of the issues involved here so I must be missing something.

> It arises any time you have a shared library whose filename doesn't match
> the SONAME of the API being exported. It's a trivial case that we must get
> right.

I wonder how the other distributions manage to get this right without having to do any extra worrying about this.  (That's an honest question; I really don't know.)

> No, it would just leave you with a brittle system that might break in
> unexpected ways. This is not a robust solution.

You must be misunderstanding me; I never proposed that as a solution.  I mentioned it as a way of trying to break things.  It would appear that you are agreeing with me here.

Anyway, it's painfully obvious that I'm not the person who should be advocating for this one way or the other.  I've asked folks who know more to jump in but if they don't then I'm certainly not going to try and drive this.  If nothing else, this ticket can exist as reasonable documentation of the reasons why it's not that simple.

I do wish I could understand how at least Mageia just doesn't have the problems mentioned here as downsides of this.  There must be something else they're doing which hasn't been explained to me yet.

> One way forward is solution (e) coupled with lua (to filter and reduce
> ldconfig calls) and a performance test of the number of ldconfig calls are
> similar.

I will have a look at this.  Might take me a bit, though; my list is long.

Since I'm simple-minded, here's what I'm taking away from this so far:

Using %filetriggerin/postun is correct (i.e. in all cases where ldconfig must be run, it gets run) but there are concerns about performance because we have plenty of stuff under %_libdir which doesn't require ldconfig runs, so we'd be calling ldconfig too often.  This has yet to be quantified by anyone but obviously any proponents of this thing should be prepared to do that.

Spawning a single grep in the scriptlet in an attempt to only call ldconfig when libraries are placed in the relevant directories is also considered unacceptable overhead, in part because it also requires that the scriptlet actually run in the shell instead of having rpm just exec ldconfig directly.

Using Lua for the scriptlet and having it only exec ldconfig when necessary may be acceptable if that is actually possible.

Using %transfiletriggerin/postun doesn't have performance issues, but misses important cases where ldconfig would need to be run during the transaction in order for the set of accessible libraries to actually match the set of installed RPMs at any point in the transaction.  The specific case of additional paths added to /etc/ld.co.conf.d cannot possibly wait until the end of the transaction, and so does require a %filetriggerin/postun pair on that directory.  This doesn't have any performance implications, though.

Comment 14 Florian Weimer 2016-10-07 06:59:58 UTC
(In reply to Jakub Jelinek from comment #10)
> (In reply to Jason Tibbitts from comment #9)
> > (In reply to Florian Weimer from comment #8)
> > 
> > > And the additional shell invocation is really faster than calling ldconfig
> > > unconditionally?  You'll need many shared objects for it to matter.
> > 
> > I don't know; Jakub stated that there could be a performance consideration;
> > I indicated the way RPM provides to only call ldconfig when necessary.
> 
> Running ldconfig 1000 times during a transaction certainly isn't in the
> noise, that problem has been raised many times in the past, just search
> bugzilla.

It's much faster than it used to be because there is now a cache for building the cache.  I have not seen such a report myself.

We will have to make ldconfig slower again because critical fsync calls are missing from ldconfig.  One of the caches contains timestamps, which is why checking for modification of the cache before writing to disk will not be a win.

> Running an extra shell script 10000 times during transaction (for each
> package that contains any files whatsoever in the /usr/lib64 etc.
> subdirectories) isn't free either, especially if you call grep etc. in
> there.

Agreed.  But I'm not sure if this is necessary for a transaction trigger.  We just have to make sure that all DSOs on the default search path ship with symlinks for their sonames.  This should not be too difficult to achieve.

For those DSOs which are on the augmented search path, we need to keep the ldconfig invocations.

Comment 15 Florian Weimer 2016-10-07 07:10:49 UTC
(In reply to Carlos O'Donell from comment #12)
> (a) The present day solution is optimal from a performance and technical
> design perspective. Only those packages that need or want to call ldconfig
> do so. The cache is kept up to date. The cost of running ldconfig is minimal
> (not minimized). The performance of the dynamic loader cache is the benefit.

ldconfig currently is not crash-safe, which is why the cost is lower than it has to be.

The solution is not optimal because within an RPM transaction, ldconfig run is multiple times, without RPM running any other scriptlets in-between.  All but one such ldconfig call is wasteful.

> It is my opinion that (d) is a non-starter since removing custom library
> search paths would require rearchitecting some upstream packages and that
> might not be palatable upstream.

The custom library search paths are mostly a Fedora thing, where Fedora couldn't agree upon the canonical implementation or something like that.  Other distributions don't use it this much.

> The custom search paths can almost always
> be replaced by dynamic string tokens in the binary or library e.g. $ORIGIN
> relative paths in DT_RUNPATH, but it is significant work.

DT_RUNPATH doesn't help with that because the library-using package encodes which library variant it wants to use.

Anyway, this is somewhat unrelated because it only affects a small subset of the packages.  I think we can ignore it and require that those packages call ldconfig explicitly.

I'm leaning towards the %trans approach with the symlinks mandate.

Comment 16 Jakub Jelinek 2016-10-07 07:18:39 UTC
(In reply to Florian Weimer from comment #15)
> Anyway, this is somewhat unrelated because it only affects a small subset of
> the packages.  I think we can ignore it and require that those packages call
> ldconfig explicitly.
> 
> I'm leaning towards the %trans approach with the symlinks mandate.

I agree.  But we should then have tools that would complain loudly if something doesn't package the symlinks at koji build time (some nag mail to owner).

Comment 17 James Antill 2016-10-18 04:28:27 UTC
(In reply to Jakub Jelinek from comment #16)
> (In reply to Florian Weimer from comment #15)
> > Anyway, this is somewhat unrelated because it only affects a small subset of
> > the packages.  I think we can ignore it and require that those packages call
> > ldconfig explicitly.
> > 
> > I'm leaning towards the %trans approach with the symlinks mandate.
> 
> I agree.  But we should then have tools that would complain loudly if
> something doesn't package the symlinks at koji build time (some nag mail to
> owner).

We've traditionally had very few checks at koji build time, and this seems a little excessive:

1. Almost all packages seem to ship their symlinks, from the symdb results and checking my F24 system (rpm --qf '' -qf /usr/lib64/*).

2. Even if they fail #1, they still need something to need to use the binary before the end of the transaction. This is possible, but not likely.

...to add a check anyway requires roughly 2 things:

1. Write code which listens to fedmsg events for koji build success, and then checks the packages (sending emails, or updating productdb and having something trigger on that, or whatever). From what I can tell in this thread, that means downloading the package data ... which AFAIK nothing does atm.

2. Get fedora infra. to run it (they'll probably want someone else to own it).

...which is probably a long way of saying this ticket will just be a warning to others.

Comment 18 Carlos O'Donell 2016-10-18 13:21:43 UTC
(In reply to Jakub Jelinek from comment #16)
> (In reply to Florian Weimer from comment #15)
> > Anyway, this is somewhat unrelated because it only affects a small subset of
> > the packages.  I think we can ignore it and require that those packages call
> > ldconfig explicitly.
> > 
> > I'm leaning towards the %trans approach with the symlinks mandate.
> 
> I agree.  But we should then have tools that would complain loudly if
> something doesn't package the symlinks at koji build time (some nag mail to
> owner).

I think we should lower this to the RPM level and add %_missing_dso_links_terminate_build (like the equivalent docs macro) and then control it at that level, that way the build fails outright.

So in summary:

- Add %_missing_dso_links_terminate_build option to RPM and enforce it at the rpm build level, and turn this on immediately in rawhide and fix the failures.
- The glibc package switches to using %trans and calling ldconfig at the end of a transaction for changes in the /lib* directories.
- All packages currently using custom ld.so search paths via /etc/ld.so.conf* would continue to use this method and continue to call ldconfig as required.
- All other packages that don't have custom ld.so search paths would remove their ldconfig calls. Mass file bugs for this issue to get it fixed.

Thoughts?

Comment 19 Jason Tibbitts 2016-10-18 16:15:28 UTC
If you can actually implement %_missing_dso_links_terminate_build, great.  I'm not entirely sure how you would detect these.  So you have a quick shell script that can be run against, say, the composed %buildroot at the end of the RPM generation process?

Hooking up the actual scripting for it in RPM isn't difficult and I can help to get that into redhat-rpm-config.

It's minor and can certainly be addressed separately, but I'm still not understanding why a %filetriggerin/postun pair on /etc/ld.so.conf.d isn't sufficient to allow those packages who place things there to drop their ldconfig calls.  

The packaging committee can take care of the process for getting packages to drop their ldconfig calls, including sending notices, involving provenpackagers, filing bugs and whatnot.  We may end up making some convenience macros or something so that people who want cross-release spec compabitility don't end up carrying around too much cruft.  It will certainly take a couple of releases for this to all work itself out, but the glibc maintainers needn't worry about that kind of thing.

Comment 20 Jason Tibbitts 2016-10-19 02:35:11 UTC
Rereading that, I realized I made kind of a mess out of what I was trying to say.

Basically, how do you detect that the symlinks are in place?  This has to just after the end of %install, when the package has placed all of its files into the buildroot.   We have flexibility as to whether it happens before or after executables are stripped, if that's important.  Obviously you can't depend on too much but you do have all of rpm-build's dependencies, including binutils.

If you have a quick shell script or even just a description of what needs to be done, the packaging committee can get it written, documented, approved by committees, announced and into redhat-rpm-config.  I guess it would be nice to have a good example of a broken package and how it would be fixed, since I wouldn't trust myself to come up with a good example.

And the packaging committee can take care of the rest of the packaging fixing and such.  The glibc maintainers need only add the few lines of trigger to the glibc package.

Comment 21 Jason Tibbitts 2016-10-21 01:20:59 UTC
James and I did a bit more work on this.  Here's where I'm at.

You detect the problem by running "ldconfig -v -N -r %buildroot" and seeing if it creates new symlinks.  If it does then the package is broken and the build should be failed.

ldconfig currently doesn't have any way to tell you whether it would create a symlink, but it does include "(changed)" in the verbose output, so grepping for that is sufficient.  You could also just look for any new files, though that either requires assuming that the filesystem has very high resolution timestamps or diffing directory listings.

James suggested that the best thing to do is to add an option to ldconfig which simply makes it exit nonzero if it would create a symlink.  I have no idea if that's acceptable, but if it does happen then we can adapt when the new version hits rawhide.

Another possibility is simply to run ldconfig and let it create new files.  The build will fail if they aren't packaged.  This could only be a problem in the unlikely case that they're matched by an existing pattern in the wrong subpackage.

Anyway, I've implemented this in redhat-rpm-config (locally) and I will be doing some rebuilds to test.  Will eventually have to fire up my build infra and try out a full rawhide rebuild to see if anything breaks.

Comment 22 Jason Tibbitts 2016-10-25 16:17:14 UTC
For the record, here's the script I'm using.  Could be potentially improved by sed'ing out the " (changed)" bit from the error output, I suppose, but I think these errors would be pretty rare in any case.  I was unsuccessful in finding a package which was actually broken, but I could create one trivially by simply removing a symlink at the end of %install.  The errors produced by this script pinpointed the exact problem and failed the build as expected.

One other issue of note is that some libraries are actually set up using the alternatives system.  I have no problem whatsoever with saying that those packages must still call ldconfig in their scriptlets.  I guess the alternatives system could learn to do it, but I'd leave that to an unrelated discussion about making the alternatives system use triggers.

The script:

#!/bin/bash

# Check for any unpackaged DSO symlinks.

# This calls ldconfig and asks it to update the symlinks in the buildroot.  If
# it creates any symlinks then it will indicate that fact by adding '(changed)'
# to the end of the debugging line it prints.  If that's seen, it triggers a
# nonzero exit.

# It would better, though, if ldconfig acquired a link-check mode where it
# simply fails when it needs to create a symlink.

# Do nothing if the buildroot is the real root
if [ -z "$RPM_BUILD_ROOT" -o "$RPM_BUILD_ROOT" = "/" ]; then
    exit 0
fi

tmp=$(mktemp ${TMPDIR:-/tmp}/check-missing-dso-links.XXXXXX)
trap "rm -f $tmp" EXIT

ldconfig --verbose -N -r $RPM_BUILD_ROOT 2>&1 | grep -E '\(changed\)$' > $tmp

if [[ $? -eq 0 ]]; then
    echo "ldconfig would create additional library symlink(s)."
    echo "These MUST be created and included in the package."
    cat $tmp
    exit 1
fi

Comment 23 Panu Matilainen 2016-10-28 05:20:44 UTC
Rather than trying to detect missing symlinks in vain, why not make rpm run ldconfig automatically as a part of the build? That way the regular rpm checks of unpackaged/extra files in buildroot will trip up the build if there are extra links in the buildroot.

For a PoC test you can do something like this in /usr/lib/rpm/redhat/macros:
--- macros.orig	2016-10-28 07:57:09.327669313 +0300
+++ macros	2016-10-28 08:14:53.456145834 +0300
@@ -85,6 +85,7 @@
 %__arch_install_post   /usr/lib/rpm/check-buildroot
 
 %__os_install_post    \
+    /sbin/ldconfig -r %{buildroot} -N \
     /usr/lib/rpm/brp-compress \
     %{!?__debug_package:\
     /usr/lib/rpm/brp-strip %{__strip} \

For a "real world" version one would want to have it in a separate brp-ldconfig script where it can do all the magic it needs with config files etc, and a way to easily disable just in case.

Comment 24 Zbigniew Jędrzejewski-Szmek 2016-10-29 22:47:25 UTC
The check in comment #22 would work, but I think just creating the symlinks like in comment #23 would be better: it gives very similar information to the packager but removes one step that they'd have to do anyway.

Comment 25 Florian Weimer 2016-10-31 16:27:08 UTC
(In reply to Jason Tibbitts from comment #22)
> For the record, here's the script I'm using.  Could be potentially improved
> by sed'ing out the " (changed)" bit from the error output, I suppose, but I
> think these errors would be pretty rare in any case.  I was unsuccessful in
> finding a package which was actually broken, but I could create one
> trivially by simply removing a symlink at the end of %install.  The errors
> produced by this script pinpointed the exact problem and failed the build as
> expected.

I don't think this is compatible with RemovePathPostfixes.  Do you see a way to remedy this?

Comment 26 Jason Tibbitts 2016-11-01 20:07:13 UTC
Regarding just running ldconfig unconditionally, I started there but thought that it wouldn't be acceptable to just have the "unpackaged files" error kick in and fail the build.  If that is OK with folks, I certainly have no objection.

I've not heard of RemovePathPostfixes before this, and indeed, it's not compatible with the checking script.  I only know of one thing actually using RemovePathPostfixes, and it's a private branch of a Fedora package (curl, the branch is private-kdudka-libcurl-minimal).  But I tested it and it does indeed fail:

Unpackaged library symlink(s):
        libcurl.so.4 -> libcurl.so.4.4.0.minimal (changed)

As far as I can understand things, ldconfig only knows what link to make based on what's inside the library.  There's no real way it can know that you want some postfix tacked onto the end.

Went ahead and tried to build the same package without my checking script in place but with an ldconfig -r %buildroot -N call at the end of the build.  It appears to build OK, and I think I understand why, but it seems to me that it works purely by accident.  Which personally doesn't bother me, but I do think that RemovePathPostfixes would be sufficiently rare wizardry that you might just have to disable some other checks.

Comment 27 Carlos O'Donell 2016-11-01 22:57:26 UTC
(In reply to Jason Tibbitts from comment #26)
> Regarding just running ldconfig unconditionally, I started there but thought
> that it wouldn't be acceptable to just have the "unpackaged files" error
> kick in and fail the build.  If that is OK with folks, I certainly have no
> objection.

I have no objection.
 
> I've not heard of RemovePathPostfixes before this, and indeed, it's not
> compatible with the checking script.  I only know of one thing actually
> using RemovePathPostfixes, and it's a private branch of a Fedora package
> (curl, the branch is private-kdudka-libcurl-minimal).  But I tested it and
> it does indeed fail:
> 
> Unpackaged library symlink(s):
>         libcurl.so.4 -> libcurl.so.4.4.0.minimal (changed)
> 
> As far as I can understand things, ldconfig only knows what link to make
> based on what's inside the library.  There's no real way it can know that
> you want some postfix tacked onto the end.

Correct, ldconfig knows only about DT_SONAME values, and knows nothing about the application specific DSO name.

> Went ahead and tried to build the same package without my checking script in
> place but with an ldconfig -r %buildroot -N call at the end of the build. 
> It appears to build OK, and I think I understand why, but it seems to me
> that it works purely by accident.  Which personally doesn't bother me, but I
> do think that RemovePathPostfixes would be sufficiently rare wizardry that
> you might just have to disable some other checks.

Could you explain why you think this works by accident?

Comment 28 Jason Tibbitts 2016-11-02 00:18:45 UTC
(In reply to Carlos O'Donell from comment #27)

> Could you explain why you think this works by accident?

Well, of course I just learned what RemovePathPostfixes: actually does and I'm not sure of the precise implementation so certainly I'm no expert.  To me it _looks like_ ldconfig runs and creates a symlink.  The name of this symlink happens to match the proper filename that RPM would use after it trims the postfix provided with RemovePathPostfixes:.  So RPM doesn't blow up and everything works out in the end.

The checking script, however, fails things at the point that the link is actually created.

In any case I think this issue is extremely rare and if somehow something does fall afoul of it, the packager could just disable the ldconfig run.  I did find that there is exactly one package in the distribution which currently uses RemovePathPostfixes:, and that's coreutils, but it has no build failures with the checking script in place so it should be fine regardless of which method we choose.

Comment 29 Fedora End Of Life 2017-02-28 10:22:35 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 30 Carlos O'Donell 2017-07-31 19:26:30 UTC
Note:
The Fedora Packaging Committee: glibc file triggers
https://pagure.io/packaging-committee/issue/654

Comment 31 Carlos O'Donell 2017-07-31 19:26:45 UTC
*** Bug 1476839 has been marked as a duplicate of this bug. ***

Comment 32 Carlos O'Donell 2017-07-31 19:32:57 UTC
Again, in summary, with the next actions as I see it:

[fedora packaging guidelines]:
- Document in fedora packaging guidelines that you need to provide your DSO symlink. In addition document that you need an additional ldconfig if you add custom search paths via /etc/ld.so.conf*.

[rpm]:
- Add %_missing_dso_links_terminate_build option to RPM and enforce it at the rpm build level, and turn this on immediately in rawhide and fix the failures.

[glibc]:
- The glibc package switches to using %trans and calling ldconfig at the end of a transaction for changes in the /lib* directories.

[packing work]:
- All packages currently using custom ld.so search paths via /etc/ld.so.conf* would continue to use this method and continue to call ldconfig as required.
- All other packages that don't have custom ld.so search paths would remove their ldconfig calls. Mass file bugs for this issue to get it fixed.

We need a robust way to prevent difficult to debug scenarios from creeping into our package management systems and any optimizations we introduce need to be bulletproof and as good as we can make them. Nobody trusts an OS that sometimes breaks your binary during an upgrade.

Comment 33 Zbigniew Jędrzejewski-Szmek 2017-07-31 20:18:39 UTC
(In reply to Carlos O'Donell from comment #32)

> Again, in summary, with the next actions as I see it:
> 
> [fedora packaging guidelines]:
> - Document in fedora packaging guidelines that you need to provide your DSO
> symlink.
Ack. It'd be great to (re?)-generate the list of packages which do NOT do this currently.

> In addition document that you need an additional ldconfig if you
> add custom search paths via /etc/ld.so.conf*.
Could we instead add a second filetrigger for /etc/ld.so.conf.d/?
This would cover the majority of cases it seems. (I did an unscientific
study of 'rpm -q --scripts -f /etc/ld.so.conf.d/*' and it seems that all
those packages just do the standard ldconfig calls.)

> [rpm]:
> - Add %_missing_dso_links_terminate_build option to RPM and enforce it at
> the rpm build level, and turn this on immediately in rawhide and fix the
> failures.
> 
> [glibc]:
> - The glibc package switches to using %trans and calling ldconfig at the end
> of a transaction for changes in the /lib* directories.
> 
> [packing work]:
> - All packages currently using custom ld.so search paths via
> /etc/ld.so.conf* would continue to use this method and continue to call
> ldconfig as required.
> - All other packages that don't have custom ld.so search paths would remove
> their ldconfig calls. Mass file bugs for this issue to get it fixed.
This is similar to the python-/python2- subpackages transition that is
now being discussed on fedora-devel. I think it'd be better to not do
mass bug filing, and just script & commit the fix for all the packages
where it can be done easily automatically, and fix up the remaining packages
by hand. Filing bug for approx. 1/3 – 1/2 of the distribution would be just
too painful. But this part is the last step, so it can be discussed and
decided later.

So, is there any plan of action? Could we get the glibc file triggers in
first, or do you think FPC should ack the new guidelines first?

Comment 34 Jason Tibbitts 2017-07-31 20:27:29 UTC
I had suggested the filetrigger specifically for /etc/ld.so.conf.d which runs at the end of the package install instead of at the end of the transaction.  It's in this ticket somewhere.  I'm still not sure why it wouldn't work.

Comment 35 Tomasz Kłoczko 2017-07-31 20:33:40 UTC
(In reply to Carlos O'Donell from comment #32)
> Again, in summary, with the next actions as I see it:
> 
> [fedora packaging guidelines]:
> - Document in fedora packaging guidelines that you need to provide your DSO
> symlink. In addition document that you need an additional ldconfig if you
> add custom search paths via /etc/ld.so.conf*.
> 
> [rpm]:
> - Add %_missing_dso_links_terminate_build option to RPM and enforce it at
> the rpm build level, and turn this on immediately in rawhide and fix the
> failures.

If one packages have missing such symlinks it is something which should be fixed in those packages spec files.
IMO none of the global build env macros should be fixing some classes of some packages bugs.
Strength of the set of such bugs is always like continuum so it will be forcing to add continuum number of such fixes in eve finite number of spec files.

> [glibc]:
> - The glibc package switches to using %trans and calling ldconfig at the end
> of a transaction for changes in the /lib* directories.
> 
> [packing work]:
> - All packages currently using custom ld.so search paths via
> /etc/ld.so.conf* would continue to use this method and continue to call
> ldconfig as required.
> - All other packages that don't have custom ld.so search paths would remove
> their ldconfig calls. Mass file bugs for this issue to get it fixed.

Adding in glibc %triggers NOW will not harm anything.
It will allow remove one by one from each packages with libraries calling in %post/%postun ldconfig if for those packages global trigger will be OK.

IMO at the end limited number of packages could be treated individually without touching any global macros or glibc.

If someone is going to introduce %transfiletriggerin/%transfiletriggerpostun as first step and after this find all packages with executed /sbin/ldconfig in %post/%postun and do mass rebuild IMO it will be huge mistake and people will start cursing those changes.

Comment 36 Jan Kurik 2017-08-15 07:21:29 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle.
Changing version to '27'.

Comment 37 Tomasz Kłoczko 2017-08-17 08:28:09 UTC
So can we have answer on question can we have introduced ldconfig file triggers?
If not .. why? or what is blocking such change ASAP?

After last mass rebuild on my system have been upgraded about 2k packages and I'm estimating that probably more than 30% all upgrade time was spend in thousands ldconfig executions which could be reduced in case of upgrade to only one at the end of upgrade.

Another thing.
I just had a look on attached here patch where proposed triggers definition is:

%transfiletriggerin -P 2000000 -- /lib /usr/lib /lib64 /usr/lib64 /etc/ld.so.conf.d
/usr/sbin/ldconfig

%transfiletriggerpostun -P 2000000 -- /lib /usr/lib /lib64 /usr/lib64 /etc/ld.so.conf.d
/usr/sbin/ldconfig


There are two issues with above:
1) ldconfig is in /sbin
2) this triggers definition automatically creates additional dependency for /bin/sh because ldconfig will be executed in script:

---
#!/bin/sh
/usr/sbin/ldconfig
---

New proposed triggers definition fixing above issues:

%transfiletriggerin -P 2000000 -- /lib /usr/lib /lib64 /usr/lib64 /etc/ld.so.conf.d -p /sbin/ldconfig

%transfiletriggerpostun -P 2000000 -- /lib /usr/lib /lib64 /usr/lib64 /etc/ld.so.conf.d -p /sbin/ldconfig

This will generate empty script with /sbin/ldconfig as the interpreter in script:

---
#!/sbin/ldconfig
---

In exactly that way ldconfig is executed in +98% of all currently used %post/%postun scriptlets and there is no any reasons why it should be executed over additional /bin/sh.

Comment 38 Neal Gompa 2017-10-01 21:30:14 UTC
(In reply to Carlos O'Donell from comment #32)
> Again, in summary, with the next actions as I see it:
> 
> [fedora packaging guidelines]:
> - Document in fedora packaging guidelines that you need to provide your DSO
> symlink. In addition document that you need an additional ldconfig if you
> add custom search paths via /etc/ld.so.conf*.
> 

Umm, why?

> [rpm]:
> - Add %_missing_dso_links_terminate_build option to RPM and enforce it at
> the rpm build level, and turn this on immediately in rawhide and fix the
> failures.
> 

Again, why?

> [glibc]:
> - The glibc package switches to using %trans and calling ldconfig at the end
> of a transaction for changes in the /lib* directories.
> 

This is fine with me here.

> [packing work]:
> - All packages currently using custom ld.so search paths via
> /etc/ld.so.conf* would continue to use this method and continue to call
> ldconfig as required.

Uhh, no? Why can't we just have a file trigger to handle this correctly?

> - All other packages that don't have custom ld.so search paths would remove
> their ldconfig calls. Mass file bugs for this issue to get it fixed.
> 

This I agree with.

> We need a robust way to prevent difficult to debug scenarios from creeping
> into our package management systems and any optimizations we introduce need
> to be bulletproof and as good as we can make them. Nobody trusts an OS that
> sometimes breaks your binary during an upgrade.

I have a terrible feeling you're overthinking this...

I have observed in several Linux distributions (including one that I am involved with, Mageia), where only the following file triggers are needed to deal with all cases except for the PathPostfixes thing (we don't use that in Mageia):


> %filetriggerin -p /sbin/ldconfig -P 2000000 -- /etc/ld.so.conf.d

> %filetriggerun -p /sbin/ldconfig -P 2000000 -- /etc/ld.so.conf.d

> %transfiletriggerin -p /sbin/ldconfig -P 2000000 -- /lib /usr/lib /lib64 /usr/lib64

> %transfiletriggerpostun -p /sbin/ldconfig -P 2000000 -- /lib /usr/lib /lib64 /usr/lib64


Can anyone explain to me why this doesn't work?

I'm not against the idea of shipping portions of the ldconfig cache in packages (it would make things way nicer for stateless Linux deployments and things like that. But that's a hell of a lot more work than just moving to file triggers. And as a distribution, we're really bad about making mass changes like this.

The file triggers approach is way easier to implement because we can just tell everyone that they can delete the ldconfig runs for Fedora and EPEL8 (whenever EPEL8 becomes a thing).

Comment 39 Neal Gompa 2017-10-01 21:32:51 UTC
> except for the PathPostfixes thing (we don't use that in Mageia):

And as it occurs to me, my proposed file triggers should work for PathPostfix libraries, too.

Comment 40 Neal Gompa 2017-12-27 01:58:51 UTC
I've been reminded that this is still a problem, and I'm seriously annoyed at the lack of response from you guys in months.

Carlos, can you *please* respond?

Comment 41 Florian Weimer 2018-01-12 14:01:55 UTC
I started a discussion on the devel list: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/DVBZI2UJRHI5GVMJXIQNLHK4TUFS5ZM2/

Comment 42 Tomasz Kłoczko 2018-02-24 10:04:31 UTC
File triggers have been implemented.


Note You need to log in before you can comment on or make changes to this bug.