Bug 1382224

Summary: RFE: dnf transactions should run in a transient systemd service
Product: [Fedora] Fedora Reporter: Andy Lutomirski <luto>
Component: dnfAssignee: rpm-software-management
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: medium    
Version: rawhideCC: aliakc, cleaver-redhat, dmach, fweimer, grinnz, ilya.gradina, jmracek, kevin, mikhail.v.gavrilov, packaging-team-maint, peljasz, pmatilai, rpm-software-management, samuel-rhbugs, sgraf, Simon.Gerhards, viorel.tabara, vmukhame, vondruch, zbyszek
Target Milestone: ---Keywords: Triaged, UserExperience
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-02 14:39:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andy Lutomirski 2016-10-06 03:27:57 UTC
When possible (which should be pretty much always), I think that dnf should run its RPM transactions in a transient systemd service a la systemd-run.

Pros:
 - The caller's environment won't matter any more.  (Minor.)
 - Loss of the terminal in which dnf is running will no longer hose your system.
 - Outright nuking of the dnf process by systemd if KillUserProcesses=yes will no longer happen.

Cons: No significant downsides I can think of.

Comment 1 Adam Williamson 2016-10-06 04:21:43 UTC
Rather than mandating a specific solution, perhaps we should just ask that dnf should be designed - somehow - to avoid catastrophic failure in the case of the controlling terminal going away? That's the real request here, right?

FWIW, I've been told today (by someone who really likes the F-word) that this is how apt-get works.

Comment 2 Panu Matilainen 2016-10-06 05:49:36 UTC
(In reply to Adam Williamson from comment #1)
> Rather than mandating a specific solution, perhaps we should just ask that
> dnf should be designed - somehow - to avoid catastrophic failure in the case
> of the controlling terminal going away? That's the real request here, right?

Bingo.

> FWIW, I've been told today (by someone who really likes the F-word) that
> this is how apt-get works.

This == systemd service, as proposed here? Or just "not effing crash when the effing terminal effing crashes"? :)

Comment 3 Japheth Cleaver 2016-10-06 07:37:49 UTC
How did we get to a place where systemd is required in order to update RPMs reliably? That's not a healthy solution.

Comment 4 Andy Lutomirski 2016-10-06 16:19:41 UTC
For the cases that currently matter, surviving the loss of the controlling terminal should be sufficient.  For future cases (KillUserProcesses), I think that systemd may SIGKILL the dnf process itself, in which case dnf should IMO be able to survive being SIGKILLed without causing problems.  The only decent solution I can think of would be to ask systemd to run the meat outside as something like a service so that it's outside the scope/session/whatever that is at risk of being SIGKILLed.

Comment 5 Zbigniew Jędrzejewski-Szmek 2016-10-07 00:36:17 UTC
"systemd-run -t dnf ..." apparently is not enough, dnf gets killed as soon as the terminal it is running from is closed. But it certainly could be made to work.

Comment 6 Andy Lutomirski 2016-10-07 00:40:53 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #5)
> "systemd-run -t dnf ..." apparently is not enough, dnf gets killed as soon
> as the terminal it is running from is closed. But it certainly could be made
> to work.

I assume that's because dnf can't survive the loss of its controlling terminal.  IMO it should die when the ctty dies *except when doing critical things*.

Comment 7 Andy Lutomirski 2016-10-07 19:01:10 UTC
Here's my attempt at figuring out what's going on.

TransactionDisdplay.callback calls CliTransactionDisplay.progress, which calls CliTransactionDisplay._out_progress, which calls sys.stdout.write(), which can throw IOError.

TransactionDisplay.callback seems to be passed directly to rpm.TransactionSet.run.

If I were the one fixing this, I would seriously consider invoking rpm.TransactionSet.run() from an entirely separate process and piping the callbacks back to the main cli process so that even if the cli process crashed outright we'd be okay.  (And I'd then add the ability for that subprocess to actually be a transient systemd service if systemd were running.)

An alternative would be to modify Base._run_transaction to swallow exceptions from the callback and spit them out after the transaction is done.

Comment 8 Ali Akcaagac 2016-10-13 07:36:19 UTC
Been following this entire discussion on the ML for a few days now and came by this bugreport due to some comment made on the ML, therefore allow me to comment here.

About two days ago I installed Fedora 25 Beta on one of my Notebooks (from Netinstall: Xfce). The installation went fine and the system works appropriate.

The installation was tar.bz2 as a backup and usually used within a chroot environment for further dnf updates, deletes, installations and *normal* system administration.

The same as I always do, since using Fedora.

The Backup is usually untared in a directory that is called /.cdrom (historical chosen directory).

I then chrooted into this directory from my running desktop and wanted to remove some *rudimentary* (non critical) packages from this chroot backup.

During the dnf delete process (inside the chroot) the entire chroot *AND* the running Xfce Desktop (host) got shot down and took me back into the Linux console.

This never happened with any Fedora version before.

One of my main tasks is dealing with Fedora installations and usually I process changes in a chroot environment, so the *new* backup don't affect the running system.

It would be nice to have this issue sorted out before Fedora 25 Final hits the roads.

Comment 9 Michal Luscon 2016-10-17 11:33:27 UTC
Hi,
we will propably implement a better exception handling in Base class as a temporary solution and will focus on systemd in long term.

Comment 10 Honza Silhan 2016-10-17 14:49:15 UTC
*** Bug 1383490 has been marked as a duplicate of this bug. ***

Comment 11 Jaroslav Mracek 2016-10-31 12:16:55 UTC
*** Bug 1389867 has been marked as a duplicate of this bug. ***

Comment 12 Jaroslav Mracek 2016-10-31 12:20:39 UTC
There is a pull-request that should improve behavior: https://github.com/rpm-software-management/dnf/pull/638

Comment 13 Jaroslav Mracek 2016-11-30 09:11:25 UTC
*** Bug 1256943 has been marked as a duplicate of this bug. ***

Comment 14 Honza Silhan 2017-01-30 12:58:00 UTC
This bug is planned to be fixed in next 1-2 months.

Comment 15 Vít Ondruch 2017-02-03 11:42:29 UTC
Is something like https://github.com/timlau/dnf-daemon/ considered?

Comment 16 Honza Silhan 2017-02-06 11:35:05 UTC
That might be also an option.

we have reconsidered priority of this bug as this would require bigger DNF design changes for little benefit.

Comment 17 Daniel Mach 2019-03-02 14:39:33 UTC
DNF team doesn't plan running transactions as systemd services.

We'll probably start looking into creating a daemon to replace PackageKit
which reached it's end of life:
https://blogs.gnome.org/hughsie/2019/02/14/packagekit-is-dead-long-live-well-something-else/
(Just to make it clear: it's going to be an additional service to existing DNF, which is not going away)