Bug 1157233 - 'dnf download' cannot be run in parallel as non-root user
'dnf download' cannot be run in parallel as non-root user
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: dnf (Show other bugs)
rawhide
Unspecified Unspecified
low Severity unspecified
: ---
: ---
Assigned To: Michal Luscon
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs supermin-dnf
  Show dependency treegraph
 
Reported: 2014-10-26 08:03 EDT by Richard W.M. Jones
Modified: 2015-05-08 03:33 EDT (History)
9 users (show)

See Also:
Fixed In Version: dnf-plugins-core-0.1.5-2.fc21
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-04-06 14:49:06 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
0001-dnf-Ensure-two-processes-cannot-overwrite-each-other.patch (2.08 KB, patch)
2014-11-21 07:04 EST, Richard W.M. Jones
no flags Details | Diff
0001-dnf-Ensure-two-processes-cannot-overwrite-each-other.patch (2.08 KB, patch)
2014-11-21 13:55 EST, Richard W.M. Jones
no flags Details | Diff

  None (edit)
Description Richard W.M. Jones 2014-10-26 08:03:03 EDT
Description of problem:

The 'dnf download' command cannot be run in parallel on a single
machine as a non-root user.

There are a variety of errors:

* [Errno unknown] ftruncate() failed: No such file or directory: 'Input/Output error'
* [Errno 2] No such file or directory: u'/var/tmp/dnf-rjones-YkrAvP/x86_64/21/updates-testing/packages/bash-4.3.30-2.fc21.x86_64.rpm'
(and others)

Version-Release number of selected component (if applicable):

dnf-plugins-core-0.1.3-1.fc21.noarch

How reproducible:

100%

Steps to Reproduce:

In one window, run:

mkdir -p /tmp/t1 ; while dnf download --destdir /tmp/t1 bash.x86_64 glibc.x86_64 >& /tmp/log1 ; do echo -n .; done

In a second window, run:

mkdir -p /tmp/t2 ; while dnf download --destdir /tmp/t2 bash.x86_64 glibc.x86_64 >& /tmp/log2 ; do echo -n .; done

Actual results:

Random errors as above.

Expected results:

Should just work, locking as necessary.  (If one dnf instance is
running, please make the other one wait, NOT fail).

Additional info:

yumdownloader worked correctly in this case.
Comment 1 Honza Silhan 2014-10-27 08:54:23 EDT
Hi, thanks for the report, we'll take a look.
Comment 2 Richard W.M. Jones 2014-11-19 08:08:28 EST
Why not just fix this now?  It's an obvious bug that prevents
'dnf download' from being used safely; as well as blocking us from
adopting dnf at all.
Comment 3 Jan Zeleny 2014-11-19 08:28:43 EST
(In reply to Richard W.M. Jones from comment #2)
> Why not just fix this now?  It's an obvious bug that prevents
> 'dnf download' from being used safely; as well as blocking us from
> adopting dnf at all.

May I ask who is "us" and why should this block people from adopting dnf? Nobody else expressed his opinion in this bugzilla and if I understand it correctly, the only thing that is actually blocked is concurrent execution of dnf download, sequential execution still works.

While I understand people reporting bugs consider each one very important, we have to carefully evaluate every bug and decide which bugs get fixed now and which later. Obviously, those bugs that affect major use cases have preference.
Comment 4 Richard W.M. Jones 2014-11-19 08:36:02 EST
"Us" is supermin/libguestfs.  See the dependent bug 1156498
which was filed because apparently everything should use dnf
in Fedora 22.

'dnf download' is broken if it randomly corrupts things when
you run it in parallel.  Imagine the case where someone writes
a script to download packages, and then discovers that
the script breaks if run in parallel.  I really can't believe
you can argue this is not broken.

Also we cannot solve this in supermin because although we
could make instances of supermin run serially (which would
be horrible, but could be done) it would still break if
an unrelated script or the user ran 'dnf download' at the same time.
Comment 5 Jan Zeleny 2014-11-19 09:01:30 EST
(In reply to Richard W.M. Jones from comment #4)
> "Us" is supermin/libguestfs.  See the dependent bug 1156498
> which was filed because apparently everything should use dnf
> in Fedora 22.

Fair enough. On the other hand, if you just download the package and don't use yum/dnf to do anything else, my guess is that it might be possible to postpone the switch for one more Fedora release - yum is not going away instantly, it will be just phased out.

> 'dnf download' is broken if it randomly corrupts things when
> you run it in parallel.  Imagine the case where someone writes
> a script to download packages, and then discovers that
> the script breaks if run in parallel.  I really can't believe
> you can argue this is not broken.

I don't. I just questioned the importance of the use case. If you provided the information below right at the beginning, I would not do that.

> Also we cannot solve this in supermin because although we
> could make instances of supermin run serially (which would
> be horrible, but could be done) it would still break if
> an unrelated script or the user ran 'dnf download' at the same time.

Ok, let's get this back for evaluation. With this new information I believe it can be addressed sooner than "later or never".
Comment 6 Radek Holy 2014-11-19 11:27:38 EST
Can you please elaborate on the use case a bit? For what do you use "dnf download" exactly? What can be the expected input and what should be the expected output in your case? Thank you in advance.
Comment 7 Richard W.M. Jones 2014-11-19 12:16:01 EST
https://github.com/libguestfs/supermin/blob/master/src/rpm.ml#L308

See above line 308 for the currently working yumdownloader version.

It is passed a list of RPM NEVRs and runs this command:

dnf download --destdir <some-tmpdir> <list-of-RPM-NEVR>

It's necessary because we need to unpack some RPMs to get original
files like /etc files that might be modified by the user.
Comment 8 Honza Silhan 2014-11-21 05:40:49 EST
Richard, I still don't get why you run it in parallel for the same list twice. It only happens when two same packages are downloaded at once so the processes are overriding it. Yes, it is a bug. Bug of wrong use case. If you wanna download package sets simultaneously, divide the <list-of-RPM-NEVR> in half and execute "dnf download" for each part. When multiple users unintentionally download rpms to the same directory, you can set different --destdir and then use "mv". If we find that this bug is inside DNF and related to more filed bugs, we will add higher priority.
Comment 9 Richard W.M. Jones 2014-11-21 05:45:50 EST
It's because you can run supermin twice in two independent
processes.  There's no way to "divide the list" between independent
copies of supermin, run from different places.

Even if we did coordinate multiple copies of supermin, there's still
a problem that supermin could be running and the user could independently
run 'dnf download' causing supermin and/or dnf to break.

Not sure about your point about --destdir, as it always uses a randomly
generated tmpdir for --destdir so they are never going to be the same.
Comment 10 Richard W.M. Jones 2014-11-21 07:04:41 EST
Created attachment 959666 [details]
0001-dnf-Ensure-two-processes-cannot-overwrite-each-other.patch

This patch adds a simple lock file to the non-root cache
directory, which fixes the problem for me.
Comment 11 Pino Toscano 2014-11-21 13:08:04 EST
Another alternative might be allow to specify on command line the cachedir used when running; this way each dnf run in supermin could have a different cachedir than /var/tmp/dnf-$USER-$RANDOM/, reducing even further the conflicts between dnf runs.
Comment 12 Richard W.M. Jones 2014-11-21 13:52:48 EST
(In reply to Pino Toscano from comment #11)
> Another alternative might be allow to specify on command line the cachedir
> used when running; this way each dnf run in supermin could have a different
> cachedir than /var/tmp/dnf-$USER-$RANDOM/, reducing even further the
> conflicts between dnf runs.

For supermin that would certainly be sufficient.  In fact the code
mentions a --tempcache option, although it doesn't appear to be
implemented.

For general dnf download use (eg from user scripts), locking the
cache is better.
Comment 13 Richard W.M. Jones 2014-11-21 13:55:49 EST
Created attachment 959921 [details]
0001-dnf-Ensure-two-processes-cannot-overwrite-each-other.patch

Second version without the obvious bug this time.
Comment 14 Honza Silhan 2014-11-24 04:07:44 EST
Pino, thats a good point. Till the bug is fixed, append `--setopt=cachedir=<dir>` to command line.

Richard, thanks for taking initiative, we will look at the patch.
Comment 15 Pino Toscano 2014-11-24 05:02:12 EST
(In reply to Jan Silhan from comment #14)
> Pino, thats a good point. Till the bug is fixed, append
> `--setopt=cachedir=<dir>` to command line.

It does not seem to work here (current rawhide updated as of ~right now):

$ rm -rf /var/tmp/dnf-pino-*
$ ls -d /var/tmp/dnf-pino-* 2>/dev/null | wc -l
0
$ mkdir dest
$ dnf download --destdir=dest --setopt=cachedir=$PWD/dest -v bash.x86_64
cachedir: /var/tmp/dnf-pino-CkAXFj/x86_64/22
Loaded plugins: copr, playground, download, Query, kickstart, generate_completion_cache, debuginfo-install, builddep, noroot, protected_packages
DNF version: 0.6.2
Fedora - Rawhide - Developmental packages for the next Fedora release       [...]  71 MB/s |  43 MB     00:00
not found updateinfo for: Fedora - Rawhide - Developmental packages for the next Fedora release
Completion plugin: Can't write completion cache: [Errno 13] Permission denied: u'/var/cache/dnf/available.cache'
bash-4.3.30-2.fc22.x86_64.rpm                                               [...]  40 MB/s | 1.6 MB     00:00
$ ls -d /var/tmp/dnf-pino-* 2>/dev/null | wc -l
1
$ ls dest/
bash-4.3.30-2.fc22.x86_64.rpm

According to your suggestion, there should have been no /var/tmp/dnf-$USER-* created at all, while its content being in "destdir".
Comment 16 Michal Luscon 2015-01-22 04:15:57 EST
Download phase in dnf-0.6.4 will be secured by blocking lock mechanism.
Comment 17 Fedora Update System 2015-02-15 19:03:26 EST
dnf-plugins-core-0.1.5-1.fc21,hawkey-0.5.3-2.fc21,dnf-0.6.4-1.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/dnf-plugins-core-0.1.5-1.fc21,hawkey-0.5.3-2.fc21,dnf-0.6.4-1.fc21
Comment 18 Richard W.M. Jones 2015-02-16 04:01:19 EST
I'm now testing with dnf-plugins-core-0.1.5-1.fc21,
hawkey-0.5.3-2.fc21, dnf-0.6.4-1.fc21 but this functionality doesn't
work correctly.

I see this error, although not easily reproducible:

 - "metadata already locked by <pid>"

This causes a failure, but it should just wait for the other
process to complete:

cachedir: /var/tmp/dnf-rjones-wwvQYa/x86_64/21
Loaded plugins: copr, playground, download, Query, protected_packages, needs-restarting, builddep, debuginfo-install, reposync, kickstart, noroot, generate_completion_cache
DNF version: 0.6.4
metadata already locked by 29226
  The application with PID 29226 is: dnf
    Memory :  58 M RSS (284 MB VSZ)
    Started: Mon Feb 16 08:51:02 2015 - 00:02 ago
    State  : Running
supermin: /usr/bin/dnf download -v --destdir '/tmp/supermin01fa6c.tmpdir/yo45jxsg' 'bash.x86_64' 'coreutils.x86_64' 'glibc.x86_64' 'info.x86_64' 'grep.x86_64' 'libattr.x86_64' 'openssl-libs.x86_64' 'glibc-common.x86_64' 'ca-certificates.noarch' 'crypto-policies.noarch' 'krb5-libs.x86_64' 'setup.noarch' 'fedora-release.noarch' 'fedora-repos.noarch': command failed, see earlier errors

----

Also if you use --destdir with a non-existent directory name, then
dnf creates destdir as a *file* and writes every package to the
same file, which seems like a bug, although we work around it by
create a randomly named destdir.
Comment 19 Richard W.M. Jones 2015-02-16 04:05:55 EST
Another random failure (rarer than the above) is:

Completion plugin: Can't write completion cache: unable to open database file
Comment 20 Richard W.M. Jones 2015-02-16 04:10:31 EST
The way to reproduce at least some of these bugs is to run the following
commands in parallel, as non-root (same user):

In Window 1, run:

mkdir -p /tmp/t1 ; while dnf -v download --destdir /tmp/t1 bash.x86_64 glibc.x86_64 info.x86_64 grep.x86_64 libattr.x86_64 openssl-libs.x86_64 coreutils.x86_64 glibc-common.x86_64 ca-certificates.noarch crypto-policies.noarch krb5-libs.x86_64 setup.noarch >& /tmp/log1 ; do echo -n .; done

In Window 2, run:

mkdir -p /tmp/t2 ; while dnf -v download --destdir /tmp/t2 bash.x86_64 glibc.x86_64 info.x86_64 grep.x86_64 libattr.x86_64 openssl-libs.x86_64 coreutils.x86_64 glibc-common.x86_64 ca-certificates.noarch crypto-policies.noarch krb5-libs.x86_64 setup.noarch >& /tmp/log2 ; do echo -n .; done

In Window 3, run:

mkdir -p /tmp/t3 ; while dnf -v download --destdir /tmp/t1 bash.x86_64 glibc.x86_64 info.x86_64 grep.x86_64 libattr.x86_64 openssl-libs.x86_64 coreutils.x86_64 glibc-common.x86_64 ca-certificates.noarch crypto-policies.noarch krb5-libs.x86_64 setup.noarch >& /tmp/log3 ; do echo -n .; done

Let that run for quite a while.  When it fails, examine the log
files (/tmp/log[123]).
Comment 21 Richard W.M. Jones 2015-02-16 04:12:42 EST
Try again, this time with the correct commands:

In Window 1, run:

mkdir -p /tmp/t1 ; while dnf -v download --destdir /tmp/t1 bash.x86_64 glibc.x86_64 info.x86_64 grep.x86_64 libattr.x86_64 openssl-libs.x86_64 coreutils.x86_64 glibc-common.x86_64 ca-certificates.noarch crypto-policies.noarch krb5-libs.x86_64 setup.noarch >& /tmp/log1 ; do echo -n .; done

In Window 2, run:

mkdir -p /tmp/t2 ; while dnf -v download --destdir /tmp/t2 bash.x86_64 glibc.x86_64 info.x86_64 grep.x86_64 libattr.x86_64 openssl-libs.x86_64 coreutils.x86_64 glibc-common.x86_64 ca-certificates.noarch crypto-policies.noarch krb5-libs.x86_64 setup.noarch >& /tmp/log2 ; do echo -n .; done

In Window 3, run:

mkdir -p /tmp/t3 ; while dnf -v download --destdir /tmp/t3 bash.x86_64 glibc.x86_64 info.x86_64 grep.x86_64 libattr.x86_64 openssl-libs.x86_64 coreutils.x86_64 glibc-common.x86_64 ca-certificates.noarch crypto-policies.noarch krb5-libs.x86_64 setup.noarch >& /tmp/log3 ; do echo -n .; done
Comment 22 Richard W.M. Jones 2015-02-16 04:39:27 EST
Another error is:

repo: using cache for: updates
Using metadata from Mon Feb 16 08:44:21 2015
Completion plugin: Can't write completion cache: unable to open database file
Waiting for process with pid 4268 to finish.
Waiting for process with pid 4022 to finish.
[Errno 2] No such file or directory: u'/var/tmp/dnf-rjones-wwvQYa/x86_64/21/updates/packages/bash-4.3.33-1.fc21.x86_64.rpm'

where it seems as if a parallel dnf deleted the file from
the cache.
Comment 23 Richard W.M. Jones 2015-02-16 04:46:53 EST
Here's a better and simpler reproducer of the 'metadata already locked'
bug.  It seems to happen when two instances of 'dnf download' are
started at exactly the same time:

$ mkdir -p /tmp/t1 /tmp/t2 ; dnf download --destdir /tmp/t1 bash.x86_64 & dnf download --destdir /tmp/t2 bash.x86_64

That fails about 2/3rds of the time for me.

When it fails you will see:

metadata already locked by 3199
  The application with PID 3199 is: dnf
    Memory :  35 M RSS (361 MB VSZ)
    Started: Mon Feb 16 09:46:17 2015 - 00:02 ago
    State  : Running
Using metadata from Mon Feb 16 08:44:21 2015
bash-4.3.33-1.fc21.x86_64.rpm                   2.2 MB/s | 1.6 MB     00:00    
[1]+  Exit 1                  dnf download --destdir /tmp/t1 bash.x86_64

Note the 'Exit 1' indicating that one of the dnf processes
exited with a failure instead of waiting.
Comment 24 Fedora Update System 2015-02-17 03:04:20 EST
Package hawkey-0.5.3-2.fc21, dnf-plugins-core-0.1.5-1.fc21, dnf-0.6.4-1.fc21:
* should fix your issue,
* was pushed to the Fedora 21 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing hawkey-0.5.3-2.fc21 dnf-plugins-core-0.1.5-1.fc21 dnf-0.6.4-1.fc21'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-2139/dnf-plugins-core-0.1.5-1.fc21,hawkey-0.5.3-2.fc21,dnf-0.6.4-1.fc21
then log in and leave karma (feedback).
Comment 25 Richard W.M. Jones 2015-02-17 05:07:05 EST
Please can you remove this bug from the update.
Comment 26 Fedora Update System 2015-02-20 03:32:39 EST
hawkey-0.5.3-2.fc21, dnf-plugins-core-0.1.5-1.fc21, dnf-0.6.4-1.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 27 Jaroslav Reznik 2015-03-03 12:04:40 EST
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22
Comment 28 Honza Silhan 2015-03-04 09:04:28 EST
(In reply to Richard W.M. Jones from comment #25)
> Please can you remove this bug from the update.

too late.

We try to hold the download lock longer during rpm transaction and make metadata lock non-blocking.
Comment 29 Michal Luscon 2015-03-09 12:48:09 EDT
Implemented in https://github.com/rpm-software-management/dnf/pull/234
Comment 30 Fedora Update System 2015-03-31 14:17:34 EDT
hawkey-0.5.4-1.fc22,dnf-0.6.5-1.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/hawkey-0.5.4-1.fc22,dnf-0.6.5-1.fc22
Comment 31 Fedora Update System 2015-04-01 21:43:50 EDT
Package hawkey-0.5.4-1.fc22, dnf-0.6.5-1.fc22:
* should fix your issue,
* was pushed to the Fedora 22 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing hawkey-0.5.4-1.fc22 dnf-0.6.5-1.fc22'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2015-5337/hawkey-0.5.4-1.fc22,dnf-0.6.5-1.fc22
then log in and leave karma (feedback).
Comment 32 Fedora Update System 2015-04-03 07:39:18 EDT
dnf-plugins-extras-0.0.6-2.fc22,yum-utils-1.1.31-505.fc22,yum-3.4.3-505.fc22,hawkey-0.5.4-1.fc22,dnf-0.6.5-1.fc22 has been submitted as an update for Fedora 22.
https://admin.fedoraproject.org/updates/dnf-plugins-extras-0.0.6-2.fc22,yum-utils-1.1.31-505.fc22,yum-3.4.3-505.fc22,hawkey-0.5.4-1.fc22,dnf-0.6.5-1.fc22
Comment 33 Richard W.M. Jones 2015-04-03 07:40:04 EDT
Yes this version now appears to work reliably.
Comment 34 Fedora Update System 2015-04-06 14:49:06 EDT
dnf-plugins-extras-0.0.6-2.fc22, yum-3.4.3-505.fc22, dnf-0.6.5-1.fc22, yum-utils-1.1.31-505.fc22, hawkey-0.5.4-1.fc22 has been pushed to the Fedora 22 stable repository.  If problems still persist, please make note of it in this bug report.
Comment 35 Fedora Update System 2015-04-16 05:05:26 EDT
dnf-plugins-core-0.1.5-2.fc21,dnf-0.6.4-5.fc21 has been submitted as an update for Fedora 21.
https://admin.fedoraproject.org/updates/dnf-plugins-core-0.1.5-2.fc21,dnf-0.6.4-5.fc21
Comment 36 Fedora Update System 2015-05-08 03:33:34 EDT
dnf-plugins-core-0.1.5-2.fc21, dnf-0.6.4-5.fc21 has been pushed to the Fedora 21 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.