This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 998113 - [locking] dnf makecache stuck, blocks operation
[locking] dnf makecache stuck, blocks operation
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: dnf (Show other bugs)
19
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: packaging-team-maint
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-17 07:32 EDT by Karel Volný
Modified: 2014-09-09 05:04 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-09-09 05:04:06 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Karel Volný 2013-08-17 07:32:29 EDT
Description of problem:
I cannot run dnf upgrade as it says metadata-cache is locked.

Version-Release number of selected component (if applicable):
dnf-0.3.10-1.giteb9dddb.fc19.noarch

How reproducible:
depending on the trigger condition (network timeout?)

Steps to Reproduce:
1. get `dnf -v makecache timer` command (cronjob) stuck
2. dnf upgrade

Actual results:
[root@kvolny ~]# dnf upgrade
metadata-cache already locked by 3575
  The application with PID 3575 is: dnf
    Paměť  :  26 M RSS (430 MB VSZ)
    Spuštěn: Fri Aug 16 03:45:07 2013 - před 1 day(s) 9:10:26
    State  : Spí


Expected results:
(upgrade goes as usual)


Additional info:
trying strace, it shows timeouts repeating:

[root@kvolny ~]# strace -y -p 3575
Process 3575 attached
select(13, [12<socket:[44973]>], [], [], {0, 276892}) = 0 (Timeout)
poll([{fd=12<socket:[44973]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
select(13, [12<socket:[44973]>], [], [], {1, 0}) = 0 (Timeout)
poll([{fd=12<socket:[44973]>, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 0 (Timeout)
...


from dnf.log:

Aug 16 02:45:07 INFO Metadata cache refreshed recently.
Aug 16 03:45:07 INFO --- logging initialized ---
Aug 16 03:45:07 DEBUG Config time: 0.006
Aug 16 03:45:07 DEBUG dnf version: 0.3.10
Aug 16 03:45:07 SUBDEBUG COMMAND: dnf -v makecache timer 
Aug 16 03:45:07 SUBDEBUG Installroot: /
Aug 16 03:45:07 SUBDEBUG Ext Commands: timer
Aug 16 03:45:07 DEBUG Vytváření cache souborů pro všechna metadata.
Aug 16 03:45:07 DEBUG rpmfusion-nonfree-debuginfo: will expire after 378557 seconds.
Aug 16 03:45:07 DEBUG fedora: metadata will expire after 567 seconds and will be refreshed now
Aug 16 03:45:07 DEBUG rpmfusion-nonfree-updates-debuginfo: metadata will expire after 568 seconds and will be refreshed now
Aug 16 03:45:07 DEBUG rpmfusion-free-updates: metadata will expire after 568 seconds and will be refreshed now
Aug 16 03:45:07 DEBUG adobe-linux-x86_64: metadata will expire after 568 seconds and will be refreshed now
Aug 16 03:45:07 DEBUG rpmfusion-nonfree-updates: metadata will expire after 569 seconds and will be refreshed now
Aug 16 03:45:07 DEBUG rpmfusion-free: metadata will expire after 569 seconds and will be refreshed now
Aug 16 03:45:07 DEBUG rhpkg: will expire after 120658 seconds.
Aug 16 03:45:07 DEBUG updates-debuginfo: metadata will expire after 573 seconds and will be refreshed now
Aug 16 03:45:07 DEBUG fedora-debuginfo: will expire after 378559 seconds.
Aug 16 03:45:07 DEBUG updates: metadata will expire after 584 seconds and will be refreshed now
Aug 16 03:45:07 DEBUG rpmfusion-nonfree: metadata will expire after 588 seconds and will be refreshed now
Aug 16 03:45:07 DEBUG rpmfusion-free-debuginfo: will expire after 378559 seconds.
Aug 16 03:45:07 DEBUG rpmfusion-free-updates-debuginfo: metadata will expire after 589 seconds and will be refreshed now
Aug 16 03:45:07 DEBUG qa-tools: will expire after 120659 seconds.
Aug 16 03:45:07 DEBUG beaker-client: will expire after 120660 seconds.
Aug 16 03:45:07 DEBUG not found deltainfo for: RPM Fusion for Fedora 19 - Nonfree - Debug
Aug 17 12:55:33 INFO --- logging initialized ---
Aug 17 12:55:33 DEBUG Config time: 0.007
Aug 17 12:55:33 DEBUG dnf version: 0.3.10
Aug 17 12:55:33 SUBDEBUG COMMAND: dnf upgrade 
Aug 17 12:55:33 SUBDEBUG Installroot: /
Aug 17 12:55:33 SUBDEBUG Ext Commands: 
Aug 17 12:55:33 CRITICAL metadata-cache already locked by 3575
Aug 17 12:55:33 CRITICAL   The application with PID 3575 is: dnf
Aug 17 12:55:33 CRITICAL     Paměť  :  26 M RSS (430 MB VSZ)
Aug 17 12:55:33 CRITICAL     Spuštěn: Fri Aug 16 03:45:07 2013 - před 1 day(s) 9:10:26
Aug 17 12:55:33 CRITICAL     State  : Spí


after killing the process, everything goes ok
Comment 1 Ales Kozumplik 2013-08-19 03:03:19 EDT
This is closely related to the bug 922441. I can only wonder what causes the hang in the makecache process: it is probably a network communication hang but to honestly blame libreport we'd need to catch it red-handed (hopefully this [1] will help it in the future).

Leaving this open for now, hoping to reproduce.

[1] https://github.com/akozumpl/dnf/commit/d46d2ff51eb274daa73cbbab28a6c8508776b9ba
Comment 2 Karel Volný 2013-08-20 10:02:08 EDT
(In reply to Ales Kozumplik from comment #1)
> This is closely related to the bug 922441.

looking at that ... would that be extremely hard (or undesired) to implement some inter process communication so that the program run by the user would be able to get the status of the one running in background, show the background downloads progress, offer to interrupt the background process, etc.?

> I can only wonder what causes the hang in the makecache process: it is probably
> a network communication hang but to honestly blame libreport we'd need to catch
> it red-handed (hopefully this [1] will help it in the future).

ugh, what does libreport has to do with this?
Comment 3 Ales Kozumplik 2013-08-20 10:19:36 EDT
(In reply to Karel Volný from comment #2)
> looking at that ... would that be extremely hard (or undesired) to implement
> some inter process communication so that the program run by the user would
> be able to get the status of the one running in background, show the
> background downloads progress, offer to interrupt the background process,
> etc.?

It's a possibility given that both processes belong to the same user.

> 
> > I can only wonder what causes the hang in the makecache process: it is probably
> > a network communication hang but to honestly blame libreport we'd need to catch
> > it red-handed (hopefully this [1] will help it in the future).
> 
> ugh, what does libreport has to do with this?

sorry, I meant 'librepo'.
Comment 4 Karel Volný 2013-08-20 10:57:10 EDT
(In reply to Ales Kozumplik from comment #3)
> (In reply to Karel Volný from comment #2)
> > looking at that ... would that be extremely hard (or undesired) to implement
> > some inter process communication so that the program run by the user would
> > be able to get the status of the one running in background, show the
> > background downloads progress, offer to interrupt the background process,
> > etc.?
> 
> It's a possibility given that both processes belong to the same user.

well, except for read only operations where you don't need to acquire any locks anyways, does dnf allow other user than root (or someone using sudo to become root)? - I guess "same user" rule would cover 100% cases here ...

...
> sorry, I meant 'librepo'.

:-)
Comment 5 Ales Kozumplik 2013-08-21 03:05:22 EDT
> well, except for read only operations where you don't need to acquire any
> locks anyways, does dnf allow other user than root (or someone using sudo to
> become root)? - I guess "same user" rule would cover 100% cases here ...
> 

yes, DNF runs under the normal user, it just will fail to do any actual RPM transaction of course.

about the read-only locking, you are right, this is one thing I'm considering, please see bug 980227 comment 2.
Comment 6 Karel Volný 2013-08-29 10:35:47 EDT
(In reply to Ales Kozumplik from comment #5)
> > well, except for read only operations where you don't need to acquire any
> > locks anyways, does dnf allow other user than root (or someone using sudo to
> > become root)? - I guess "same user" rule would cover 100% cases here ...
> > 
> 
> yes, DNF runs under the normal user, it just will fail to do any actual RPM
> transaction of course.

I thought it gains root for cache updates ...

> about the read-only locking, you are right, this is one thing I'm
> considering, please see bug 980227 comment 2.

uh, that seems far in the future ... can something be done about the makecache problem with the current implementatino?
Comment 7 Ales Kozumplik 2014-09-09 05:04:06 EDT
Closing this. DNF's metadata cache is heavily built on the idea that there is only one local copy of the repository metadata. If a background process that creates this metadata is stuck it can either be because of:

a) a persistent network issue. In that case terminating the background process and running DNF from the command line will result in a stuck operation again. Also, since this has been reported we have tweaked the timeouting values for librepo so eventually all such DNF processes finish or fail with a network error.

b) a transient network error. In that case the user best knows what to do and can kill the background process. Again, should not  be necessary.

c) an actual bug in the DNF/librepo networking code that prevents DNF from terminating successfully or with an error---such should be reported separately and properly debugged.

For possibilities of running multiple DNF with multiple caches (using different installroots) see bug 1124316.

For a request of blocking DNF executions see bug 1049028.

Note You need to log in before you can comment on or make changes to this bug.