Bug 2134960 - abidiff is running extremely slowly for firefox and thunderbird
Summary: abidiff is running extremely slowly for firefox and thunderbird
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: libabigail
Version: 38
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Dodji Seketeli
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 2152553
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-14 20:02 UTC by David Cantrell
Modified: 2024-05-21 14:19 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-05-21 14:19:28 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description David Cantrell 2022-10-14 20:02:17 UTC
This was noticed recently, but abidiff runs of recent firefox and thunderbird builds take hours to complete.  I did a timed comparison of just libxul.so from two builds of firefox and it is taking a very very long time to run.

I do not know enough about the internal workings of abidiff to understand what is going on.  Here's the output of time -v running it on just libxul.so:

Command exited with non-zero status 12
        Command being timed: "abidiff --verbose --d1 ./firefox-debuginfo-91.13.0-1.el8_4.x86_64/usr/lib/debug --d2 ./firefox-debuginfo-102.3.0-3.el8_4.x86_64/usr/lib/debug firefox-91.13.0-1.el8_4.x86_64/usr/lib64/firefox/libxul.so firefox-102.3.0-3.el8_4.x86_64/usr/lib64/firefox/libxul.so"
        User time (seconds): 3364.26
        System time (seconds): 853.81
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:10:43
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 95043620
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 28188
        Minor (reclaiming a frame) page faults: 25178091
        Voluntary context switches: 257485
        Involuntary context switches: 16267
        Swaps: 0
        File system inputs: 5465712
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 12

This is only recently noticed.  Prior versions ran much faster.

Comment 1 Dodji Seketeli 2022-10-20 15:39:54 UTC
Indeed.  It's super slow on webkit2gtk3 as well.

It's super slow on huge C++ binaries, I think.

I'll be looking in this.

Comment 2 Tomas Popela 2022-10-20 15:42:52 UTC
(In reply to Dodji Seketeli from comment #1)
> Indeed.  It's super slow on webkit2gtk3 as well.
> 
> It's super slow on huge C++ binaries, I think.

If you want another vehicle to test on, then it's super slow on LibreOffice as well (as you mentioned a huge C++ binary).

Comment 3 Dodji Seketeli 2022-11-28 09:38:48 UTC
Hello,

I have made some progress on this front.  From taking hours and hours without even completing, it's now taking about an hour and a half now, on libxul.so: https://people.redhat.com/dseketel/paste/trace-rhbz2134960.log.txt.  You can scroll down to the end of the log to see the time spent.  This is with the work that is in this "rhbz2134960" branch at https://sourceware.org/git/?p=libabigail.git;a=shortlog;h=refs/heads/users/dodji/rhbz2134960.

I don't know if you have a way to test this on a machine that you have access to; but if you do, you might want to use abipkgdiff that knows how to compare RPMs directly, so that you would not have to go through the hassle of unpacking the RPMs to run abidiff on their content.  You would just need to have the RPM of interest, as well as their accompanying debuginfo RPMs.

So, here is what I think is happening.

First of all, libxul.so is HUGE.  Just its DWARF representation takes more than 25GB in the flat text representation emitted by eu-readelf.  It takes at least 80GB of ram to analyze the two versions of libxul.so in memory at the moment.  So a machine that has less than 80GB of RAM will swap and the time to complete the task will skyrocket.  Note that libxul.so is so big that the DWZ tool seems to give up trying to compress its debug info.  I have implemented a solution to leverage the work done by DWZ, to speed-up the analysis time, but alas.  As DWZ can't run on libxul.so (due to its cheer size), that solution is not useful on this particular library, but it might be useful on the other C++ libraries that have been DWZ'ed.  I am talking about this patch, in the rhbz2134960 branch: https://sourceware.org/git/?p=libabigail.git;a=commit;h=20719e6e7bbc22ba5a5d49bd2c26c17c53674f71.

Second, there is indeed a libabigail issue that triggers on some C++ classes with virtual member functions, which makes comparing classes have a quadratic behaviour, as far as I can tell.  This was not noticed on smaller C++ libraries, I guess, but on libxul.so, the impact is magnified by the cheer size of the debug info.  That issue is fixed by the patch https://sourceware.org/git/?p=libabigail.git;a=commit;h=e0f50dc04bc28306746542745a8bfc604e88d31d, also in the rhbz2134960 branch.

If this is enough of an improvement to ease things up, I might have to roll out a new release with this fix.  What do you think?

Comment 4 Tomas Popela 2022-11-28 10:16:51 UTC
That's a huge improvement Dodji! Thank you for looking at it. I'm just worried that we won't have the ability to ensure that the runners have 80 GB of free RAM, but I will ask the OSCI people what's the possibility there. It would be great if you would roll out a new release with those fixes so we can try them.

Comment 5 Dodji Seketeli 2022-11-29 11:29:38 UTC
Hello,

Here is an el9 scratch build to test with.  Would you need an el8 one?

https://kojihub.stream.rdu2.redhat.com/koji/taskinfo?taskID=1626424

Comment 11 Dodji Seketeli 2022-12-05 09:23:17 UTC
OK, finally, I released libabigail 2.2 with the memory usage improvements done on this issue and some important fixes.  I'll look into updating the CRB 9.2 package too.  In the mean time, there is an EPEL 9 package available: https://koji.fedoraproject.org/koji/buildinfo?buildID=2096104.


However, I have been thinking about this more fundamentally.  Do we even need to verify the ABI of these /private/ big libraries (libxul.so and the libreoffice libraries) ?
I mean, none of these libraries are meant to be used by developers of other applications.  They are meant to be private dependencies of the application they are part of.  Libxul.so is a private dependency of firefox and similarly, the libreoffice libraries are private dependencies of the libreoffice apps.  When the libraries are modified, their application is modified and recompiled accordingly so there is no ABI compatibility challenge involved, as far as I understand.

So I think that the most effective measure moving forward is for the package maintainers of these application packages to add an rpminspect.yaml file to the git repository of the package, turning off abidiff alltogeter
(https://rpminspect.readthedocs.io/en/latest/configuration.html#rpminspect-yaml).

@dcantrell, what do you think?

Comment 12 Michael Catanzaro 2022-12-05 14:00:50 UTC
There are two cases:

 * The library is installed to a public location like /usr/lib64/. abidiff should surely it by default.
 * The library is installed to a private location like /usr/lib64/firefox/libxul.so. Should consider skipping the private libraries by default instead of asking maintainers to modify rpminspect.yaml.

In theory, ABI changes in a private library *could* break things if one package installs plugins for another package to use. But realistically, changes in private libraries will almost always be intentional changes, and the risk here is much lower than for libraries that are installed to public locations, where ABI changes could be devastating. If abidiff runs on private libraries by default, that encourages maintainers to disable abidiff for the entire package, which could stop it from finding problems with public libraries.

Comment 13 Michael Catanzaro 2022-12-05 14:01:32 UTC
(In reply to Michael Catanzaro from comment #12)
> There are two cases:
> 
>  * The library is installed to a public location like /usr/lib64/. abidiff
> should surely it by default.

I meant to write "check it by default."

Comment 14 David Cantrell 2022-12-05 15:40:58 UTC
My thought is that private libraries are not part of the ABI we are trying to guarantee and prevent breakage on for users.  I think an option to abidiff to enable checking private libraries could be useful on occassion, but I think the default behavior of abidiff should be to ignore it.

It's hard to determine what to skip based on filesystem location since anything can be added to the default search path for the dynamic linker, but I guess there is an established expectation of what are "public" and what are "private" directories.

Comment 15 Dodji Seketeli 2022-12-05 16:10:05 UTC
Okay, thank you to both mcatanza and dcantrell.

I agree that by default, we shouldn't abidiff private libraries.

I agree with mcatanza that shared libraries that are in /usr/lib64/ should be verified /by default/.  
And yes, there should be a way to make a random binary opt-in into the ABI verification process.

Now, the question is how?

Here are two proposals:

1/ Either rpminspect determines that a binary ought to be ABI verified (based on heuristics as well as user input), in general, and then it invokes abidiff on it.

2/ Or rpminspect invokes abidiff on all binaries, using new option named, e.g --public-shared-libs-only, where the policy to decide if the shared library is to be actually verified or not is implemented inside to abidiff.

I would tend to go for 1/ because I think that policies should be left to components that are higher in the stack as that eases the maintenance of the whole stack moving forward.  But I am honestly open to what you guys think.

Comment 16 Ben Cotton 2023-02-07 15:10:51 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 38 development cycle.
Changing version to 38.

Comment 17 Aoife Moloney 2024-05-07 15:51:00 UTC
This message is a reminder that Fedora Linux 38 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 38 on 2024-05-21.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '38'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 38 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 18 Aoife Moloney 2024-05-21 14:19:28 UTC
Fedora Linux 38 entered end-of-life (EOL) status on 2024-05-21.

Fedora Linux 38 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.