Bug 1516045 - dnf clean_requirements_on_remove breaks everything and makes the system unbootable
Summary: dnf clean_requirements_on_remove breaks everything and makes the system unboo...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: libsolv
Version: 27
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
Assignee: rpm-software-management
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 1284349 1338921
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-21 22:12 UTC by Stephen Herr
Modified: 2017-11-22 13:52 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-22 13:45:26 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
debugdata file as requested (1.15 MB, application/x-gzip)
2017-11-21 22:26 UTC, Stephen Herr
no flags Details

Description Stephen Herr 2017-11-21 22:12:24 UTC
You guys have an enormous problem. Well two, one software and one process. dnf currently has a default option, clean_requirements_on_remove=True, which *will* lead to unbootable Fedora systems. Here is the sequence of events that leads to this problem:

1) rpm-build (and friends) have automatic behaviour that cannot be turned off that will look at C-level libraries that your package depends on or provides. It automatically adds them to the requires / provides list of the built RPM, so yum / dnf can resolve dependencies correctly and all is well.

2) 3rd party rpm developers (and some fedora developers) often are targeting multiple versions of multiple operating systems with their software. So they often do the only practical thing, and bundle the dependencies that they need inside their rpms. Those libraries get picked up by rpm-build and added to the Provides list, because they are in the RPM. But of course these libraries are not being added to the standard system locations, it's some path specific to this rpm and only this rpm's software can find it.

3) At some point, someone uninstalls something that depends on a core library. Or something that depends on something that depends on something that depends on a core library. And dnf, with this bad configuration, checks to see if it can remove the dependencies too. And it decides that it can, since something else still provides that library (although not in a discoverable location, although dnf doesn't know that).

4) So the core system library gets uninstalled, and everything breaks.

For example:
----
$ sudo dnf remove keepass
Dependencies resolved.
================================================================================
 Package             Arch      Version                   Repository        Size
================================================================================
Removing:
 keepass             x86_64    2.35-3.fc26               @@commandline    2.9 M
Removing dependent packages:
 mono-core           x86_64    4.8.0-7.fc26              @@commandline     53 M
 mono-data           x86_64    4.8.0-7.fc26              @@commandline     16 M
 mono-data-sqlite    x86_64    4.8.0-7.fc26              @@commandline    207 k
 mono-extras         x86_64    4.8.0-7.fc26              @@commandline    1.5 M
 mono-mvc            x86_64    4.8.0-7.fc26              @@commandline    1.6 M
 mono-wcf            x86_64    4.8.0-7.fc26              @@commandline    3.1 M
 mono-web            x86_64    4.8.0-7.fc26              @@commandline    8.4 M
 mono-winforms       x86_64    4.8.0-7.fc26              @@commandline    5.1 M
 xdotool             x86_64    1:3.20150503.1-3.fc26     @@commandline     90 k
Removing unused dependencies:
 libgdiplus          x86_64    4.2-3.fc26                @@commandline    430 k
 libxdo              x86_64    1:3.20150503.1-3.fc26     @@commandline     74 k
 sqlite              x86_64    3.20.1-1.fc26             @updates         1.1 M
 xsel                x86_64    1.2.0-19.fc26             @updates          43 k

Transaction Summary
================================================================================
Remove  14 Packages

Freed space: 93 M
Is this ok [y/N]: 
----

Notice that it's going to uninstall sqlite. That's because mono-data-sqlite Requires "sqlite" (the rpm name), and happens to be the only thing installed right now which does. However I'm sure you know that there's a literal ton of things that actually require sqlite, including but not limited to NetworkManager , wayland, and dnf itself. So if I had said "yes' here it would have broken my ability to boot/log-in, my ability to connect to the internet even in a non-graphical environment, and my ability to re-install the package from fedora's repos even if I had an internet connection.

Automatically removing no-longer-needed leaf dependencies is a nice idea. But that's it, it's a nice-to-have. It's also completely unsafe and is guaranteed to result in people having unbootable systems. It's furthermore a change over yum behaviour, and one you should not have made without considering the implications.

A rational argument can be made that the maybe the 3rd party developers shouldn't bundle system libraries. Okay fine, maybe they shouldn't. But they do, and always have and always will. And you don't blow up user's systems because you don't like something that some other developers are doing. And this is most definitely your problem and not theirs, since you are the ones making the change that triggers the problem. 3rd party developers aren't the one's changing default behaviour or uninstalling sqlite (for example).

So that's one problem; the technical one. And you need to fix it. That option should not exist at all, much less be default.

The second problem, the process problem, is that this exact issue has been reported before. See bug 1284349 and bug 1338921 for example. And the issue was explained to you. And you reverted the automatically-remove-leaf-dependencies behaviour because it was wrong and blowing up user's systems. And then you re-enabled it at a later date. That should never have happened. You need to fix whatever went wrong with your process that caused you to re-push an already fixed urgent-level bug. 

The correct fix is to push a new version of dnf that disables this functionality, regardless of what the conf file says. And then never re-enable it again in the future. Because Fedora working and not bricking user's systems is vastly more important that whatever petty reasons you keep on putting this functionality back in. And yes, I do expect more from the dnf maintainers. You should know that dnf has the ability to break everything and treat these decisions with the appropriate amount of gravitas and thoughtfulness.

Comment 1 Igor Gnatenko 2017-11-21 22:16:12 UTC
Please run dnf with --debugsolver and attach debugdata directory. libsolv after some release should not automatically remove such things.

Comment 2 Stephen Herr 2017-11-21 22:20:18 UTC
$ sudo dnf --debugsolver remove keepass
Dependencies resolved.
====================================================================================================
 Package                  Arch           Version                        Repository             Size
====================================================================================================
Removing:
 keepass                  x86_64         2.35-3.fc26                    @@commandline         2.9 M
Removing dependent packages:
 mono-core                x86_64         4.8.0-7.fc26                   @@commandline          53 M
 mono-data                x86_64         4.8.0-7.fc26                   @@commandline          16 M
 mono-data-sqlite         x86_64         4.8.0-7.fc26                   @@commandline         207 k
 mono-extras              x86_64         4.8.0-7.fc26                   @@commandline         1.5 M
 mono-mvc                 x86_64         4.8.0-7.fc26                   @@commandline         1.6 M
 mono-wcf                 x86_64         4.8.0-7.fc26                   @@commandline         3.1 M
 mono-web                 x86_64         4.8.0-7.fc26                   @@commandline         8.4 M
 mono-winforms            x86_64         4.8.0-7.fc26                   @@commandline         5.1 M
 xdotool                  x86_64         1:3.20150503.1-3.fc26          @@commandline          90 k
Removing unused dependencies:
 libgdiplus               x86_64         4.2-3.fc26                     @@commandline         430 k
 libxdo                   x86_64         1:3.20150503.1-3.fc26          @@commandline          74 k
 sqlite                   x86_64         3.20.1-1.fc26                  @updates              1.1 M
 xsel                     x86_64         1.2.0-19.fc26                  @updates               43 k

Transaction Summary
====================================================================================================
Remove  14 Packages

Freed space: 93 M
Is this ok [y/N]:

Comment 3 Stephen Herr 2017-11-21 22:22:16 UTC
Where is the debugdata directory?

Comment 4 Igor Gnatenko 2017-11-21 22:22:46 UTC
(In reply to Stephen Herr from comment #3)
> Where is the debugdata directory?

In cwd.

Comment 5 Stephen Herr 2017-11-21 22:26:08 UTC
Created attachment 1357039 [details]
debugdata file as requested

Comment 6 Stephen Herr 2017-11-21 22:35:14 UTC
Please justify why bricking a user's system with no warning is a "medium severity / urgency" issue. I know that *I* can recover from this problem, by debugging what is missing, discovering the http fedora repo locations, using another computer to download the appropriate missing package to a usb, and then copying it over to the affected system and using 'rpm' to install it. But I am not an average user. To the average user this would be a completely bricked system.

Comment 7 Igor Gnatenko 2017-11-21 22:38:30 UTC
(In reply to Stephen Herr from comment #6)
> Please justify why bricking a user's system with no warning is a "medium
> severity / urgency" issue. I know that *I* can recover from this problem, by
> debugging what is missing, discovering the http fedora repo locations, using
> another computer to download the appropriate missing package to a usb, and
> then copying it over to the affected system and using 'rpm' to install it.
> But I am not an average user. To the average user this would be a completely
> bricked system.

As long as you use dnf -- you are experienced users. Everyone else are using gnome-software which doesn't have that bug.

Also how priority in bz matters? Maintainers work in free time...

Comment 8 Stephen Herr 2017-11-21 22:46:33 UTC
And "libsolv" is used only by dnf? (honest question, I don't know)

And it's not true that only power users use the command line. All forums / how-to articles always give instructions in terms of command line commands because it's easier. Do we not want people who read help articles to be able to use Fedora?

Even if it is only power users, and even if it is only your spare time, it doesn't matter. The correct severity / urgency of this issue is Urgent, in any sane way that you could define them. So it should be marked as such. Just because it's not your day job to maintain this package does not mean it's impossible for there to exist urgent-priority issues. If absolutely nothing else the urgency should help you prioritize which issue to tackle next.

Comment 9 Stephen Herr 2017-11-21 22:55:02 UTC
Sorry for changing the title, I realize that must seem petty. But the title is not for the developer, who already knows what the actual problem is. Titles are for searchers who are looking to see if a bug has already been reported.

Comment 10 Igor Gnatenko 2017-11-21 22:57:56 UTC
(In reply to Stephen Herr from comment #8)
> And "libsolv" is used only by dnf? (honest question, I don't know)
AFAIK, yes. There are bunch of tools which use libsolv tho.

Comment 11 Igor Gnatenko 2017-11-21 23:03:55 UTC
So looking more into debugdata.... sqlite contains only binary while all sqlite libraries are in the sqlite-libs package. Which libsolv didn't remove. So NOTABUG :)

Comment 12 Stephen Herr 2017-11-22 13:37:04 UTC
(In reply to Igor Gnatenko from comment #11)
> So looking more into debugdata.... sqlite contains only binary while all
> sqlite libraries are in the sqlite-libs package. Which libsolv didn't
> remove. So NOTABUG :)

Well that's good that I was not seconds from a bricked system, but what makes you think it matters?

Can the root problem as described in comment 0 can still occur? I don't think I have any 3rd party rpms installed on this computer right now, so I guess I didn't trigger the described problem.

But the root problem is that you are removing leaf dependencies, which is not safe and can and will brick people's systems if they have RPMs installed that bundle system libraries. It's only NOTABUG if anything in that sentence is not true.

Comment 13 Igor Gnatenko 2017-11-22 13:45:26 UTC
(In reply to Stephen Herr from comment #12)
> (In reply to Igor Gnatenko from comment #11)
> > So looking more into debugdata.... sqlite contains only binary while all
> > sqlite libraries are in the sqlite-libs package. Which libsolv didn't
> > remove. So NOTABUG :)
> 
> Well that's good that I was not seconds from a bricked system, but what
> makes you think it matters?
> 
> Can the root problem as described in comment 0 can still occur? I don't
> think I have any 3rd party rpms installed on this computer right now, so I
> guess I didn't trigger the described problem.
> 
> But the root problem is that you are removing leaf dependencies, which is
> not safe and can and will brick people's systems if they have RPMs installed
> that bundle system libraries. It's only NOTABUG if anything in that sentence
> is not true.

mls did change to not remove "alternatives" automatically, so if there are sqlite-libs and foobarbaz providing libsqlite.so.3 -> autoremove will not remove any of them. So it's not harmful in any way nowadays. Before I thought that this codepath didn't work, but after looking close - it does work as expected.

Comment 14 Stephen Herr 2017-11-22 13:52:52 UTC
Okay, thank you for looking. Sorry for being overly alarmist.


Note You need to log in before you can comment on or make changes to this bug.