Bug 1714706

Summary: dnf install <package> sometimes ends in: [Errno 2] No such file or directory: '/var/cache/dnf/...
Product: Red Hat Enterprise Linux 8 Reporter: Andrei Stepanov <astepano>
Component: dnfAssignee: Packaging Maintenance Team <packaging-team-maint>
Status: CLOSED NOTABUG QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0CC: brad.baker, dhellmann, dmach, james.antill, mdomonko
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-10 11:36:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrei Stepanov 2019-05-28 16:29:09 UTC
dnf install sometimes finish unsuccessfully:

Last metadata expiration check: 0:00:01 ago on Tue 28 May 2019 10:36:14 AM EDT.
Dependencies resolved.
================================================================================
 Package               Arch       Version            Repository            Size
================================================================================
Installing:
 createrepo_c          x86_64     0.11.0-1.el8       rhel-8-appstream      76 k
Installing dependencies:
 createrepo_c-libs     x86_64     0.11.0-1.el8       rhel-8-appstream     101 k
 drpm                  x86_64     0.3.0-14.el8       rhel-8-appstream      71 k

Transaction Summary
================================================================================
Install  3 Packages

Total size: 249 k
Installed size: 556 k
Downloading Packages:
[SKIPPED] createrepo_c-0.11.0-1.el8.x86_64.rpm: Already downloaded             
[SKIPPED] createrepo_c-libs-0.11.0-1.el8.x86_64.rpm: Already downloaded        
[SKIPPED] drpm-0.3.0-14.el8.x86_64.rpm: Already downloaded                     
Running transaction check
Waiting for process with pid 6473 to finish.
[Errno 2] No such file or directory: '/var/cache/dnf/rhel-8-appstream-b9ec074453e13071/packages/createrepo_c-0.11.0-1.el8.x86_64.rpm'
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.


I do not have re-producer. But it can happen sometimes.

dnf-4.0.9.2-5.el8.noarch

Comment 1 Daniel Mach 2019-06-03 11:29:25 UTC
Weren't there multiple dnf processes running on the same system?

It seems that you've run a similar transaction twice:
dnf#1: download RPMs
dnf#2: report [SKIPPED], RPMs were downloaded to cache by dnf#1
dnf#1: runs RPM transaction
dnf#2: waits on dnf#1 (Waiting for process with pid 6473 to finish)
dnf#1: RPM transaction finishes
dnf#1: removes installed RPMs from cache
dnf#2: tries to continue, but RPMs cannot be found in cache (were removed by rpm#1 in the previous step)

I'd prefer to keep the existing behavior and close the bug.
Extending the critical section would lead to worse user experience (DNF would wait for other operations to finish more frequently).

Comment 2 Andrei Stepanov 2019-06-03 12:00:10 UTC
Okay, let's close. I just wanted you do not miss some important bug. I am fine with `NOTABUG`

Comment 3 Brad 2021-06-10 13:50:53 UTC
This bug actually just impacted us. We're using Rancher on VMware which is deploying kubernetes nodes using a pre-created VMware VM template with CentOS 8.3. How this manifests in our instance was:

1) The VMware CentOS template was unpatched - curl was out of date on it.
2) Our rancher installation uses node templates which call a "package update" (dnf update) via cloud-init
3) During the deployment process it seems rancher calls DNF install curl.
4) At the same time our cloud-init process calls a dnf update. 
5) The two DNF processes collide/conflict as outlined above by Andrei. This causes the second process which is installing curl to exit reporting a status code 1
6) Rancher interprets the status code of 1 as an unsuccessfully deployment and ends the deployment process. 
7) The whole process loops and repeats infinitely. 

There are some workarounds which we're considering:

1) Run packer to update our CentOS template so that curl isn't out of date and thus thus conflict doesn't occur. This is only a bandied though because any time curl becomes out of date the problem will resurface.  
2) Don't do a DNF update in our cloud-init template - but then we end up with unpatched Rancher/kubernetes nodes. Not ideal. 
3) Try to get Suse/Rancher to adjust their deployment scripts so that they don't try to run a DNF while one may be running in cloud-init
4) Try to get Redhat/Centos to fix DNF to avoid collisions like this 

I strongly feel #4 is the best solution to this issue. Even if Suse/Rancher fixes their scripts other people are going to encounter this problem and it can be a very difficult/obscure problem to troubleshoot.