Bug 1229823

Summary: dnf idles and does nothing in the middle of a process
Product: [Fedora] Fedora Reporter: Ali Akcaagac <aliakc>
Component: dnfAssignee: rpm-software-management
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: aliakc, jsilhan, jzeleny, mluscon, packaging-team-maint, pnemade, tim.lauridsen, vmukhame
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-07-19 14:44:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test.sh script for output testing.
none
log.txt output missing half of the stuff we'd expected to see none

Description Ali Akcaagac 2015-06-09 17:44:24 UTC
I don't know how to describe it.

We've been running a script that basicly does this

dnf install <list of packages> --nogpgcheck --installroot=<somedir>

The output is piped into a file.

Now what happens ... and this is happening quite often ... is this:

quite often we see that dnf updates (downloads metadata) into an empty

<somedir>/var/lib/dnf/22

directory. But dnf doesn't output anything into xterm, console or whatever. Only by watching the said directory we see that dnf indeed does something.

But then from one second to another it simply halts and doesn't do anything. Not all repositories data have been downloaded.

Right now - as I type this - we are waiting over 30 minutes for the process to go on. Nothing. Even tap doesn't show any process eating... ps -aux doesn't show a defunct process.

Two things we don't understand:

a) Why does dnf often do not output anything - while it should at least show that it's downloading metadata from some location. This is even happening if we start from a clean directory (metadata directory).
b) Why does dnf halt often in the middle of the process stopping the entire workflow from continuing. It looks like the process died but it hasn't.

We've been having this quite often lately and don't know the cause.

Comment 1 Ali Akcaagac 2015-06-09 17:48:41 UTC
The correct path is:

<sonedir/var/cache/dnf/i386/22/<the files there>

We use &> to pipe the output to a file and tail -f log it. Sadly the file remains 0 bytes.

Comment 2 Ali Akcaagac 2015-06-09 17:51:38 UTC
We just aborted a couple of times and restarted the script. Now it looks like it has processed all metadata and continues. Output is showing now. Still, there is no reason for dnf to go idle in the middle of a process for no reason. There is no background dnf running, nothing that would in any ways hinder dnf to continue working. Nothing. This is only happening with *get metadata from server*. There is also no reason for a 100mbit line to not get the data. Stalled servers I don't think so. At least in all the years we never had such issues with yum.

Comment 3 Radek Holy 2015-06-10 07:43:05 UTC
I wonder whether this can be related to bug 1212320.
And I agree that we should log something.

Comment 4 Ali Akcaagac 2015-06-10 08:15:50 UTC
Well I only quickly peeked over that bug... But let's make things a bit more clear...

I believe the problem that I describe is related to something completely else.

Background:
During the conversion of our infrastructure from yum to dnf, we detected a massive slowdown of operations. During the time when we've been using Fedora 20, we had a feeling how long certain processes takes. For example compiling the entire XFCE4.12 including yum builddep, yum installation etc. requires just 50 minutes. In worst case if builddeps are required it goes up to 80-90 minutes. With dnf we ended up in 2.5-3.0 hours what used to be 50 minutes and 4.0-5.0 hours what used to be 80-90 minutes.

The test machine isn't the fastest, we know.

We started performance tests with the filesystem underneath (xfs) but the performance for that system is ok. So nothing unusual compared to the performance of Fedora 20.

Then we've saw that also other parts like installing and de-installing uses much longer using dnf than yum (at least we compare dnf -> Fedora 22 with yum -> Fedora 20).

Last night I started to write some small script to measure the time that is needed for dnf to install/remove/reinstall/update and yum to install/remove/reinstall/update. To make it simple, it's nothing special and measures only the relative execution time for our test system.

time dnf reinstall somepackage.rpm
sync
sleep 5
sync
time yum reinstall somepackage.rpm

The test script is not finished yet. But during the execution I figured out that output from DNF as well as YUM on Fedora 22 is being eaten. That is: output is entirely missing.

We also detected that xfce4-terminal (which uses vte) is eating characters and output from other tools as well. Specially if it's piped or used together like the "time dnf" construction.

I filled a report about this to:

https://bugzilla.redhat.com/show_bug.cgi?id=1230021

You can see from the output above, how linefeeds are being eaten.

We started the same stuff using xterm, same issue. So the problem might not be vte, xfce4-terminal, xterm... It might be a lower component of X or even the Kernel itself. Maybe some python module preventing output ? Who knows ?

I hacked up a small test script (see attachment) and let it ran by issuing the following command:

sudo sh test.sh &> log.txt

And the ouput is missing half of the expected information (see attachment). I guarantee, that the file (log.txt) has not been touched, modified or altered in any ways.

We are talking about exactly this issue in this bug.

Comment 5 Ali Akcaagac 2015-06-10 08:16:18 UTC
Created attachment 1037155 [details]
test.sh script for output testing.

Comment 6 Ali Akcaagac 2015-06-10 08:17:03 UTC
Created attachment 1037156 [details]
log.txt output missing half of the stuff we'd expected to see

Comment 7 Ali Akcaagac 2015-06-10 08:36:22 UTC
Something has changed if I look at the log above again, because we were pulling in some updates the moment before.

Nonetheless. The above log should at least contain output that it's processing the repositories. From what I remember, yum is doing this on Fedora 20.

But then... just a side note... Look at the real time (this is not yet a valid bug report, we need to do more speed tests here). Do you see how dnf uses 3 minutes to pull in all repo data while yum only needs 1 minutes ?

Re-running the script:
dnf
real	2m42.438s
user	1m44.702s
sys	0m8.960s

yum
real	1m33.239s
user	0m18.528s
sys	0m2.813s

Comment 8 Ali Akcaagac 2015-06-10 08:57:55 UTC
The changes pulled in were related to mesa, perl and some other parts. Logged out and logged into XFCE 4.12 again and re-run the above script. The thing is hard to describe. It might be related to logging out and in again.

When we ran the script a couple of times, we didn't even get output..

Say we saw an output similar to this when using tail -f log.txt

------------------------------------------
dnf clean
... the cleaning message ...

<nothing>
------------------------------------------
end.

All echo's, further times etc where missing. Loading that file in VIM showed the same as tail -f did. Either the mesa stuff fixed something (which I don't expect because xfce4 barely uses it) or the logging in and out has reset an uncondition state that shows up during long term running of the XOrg session.

Once this happens again I will report back.

Comment 9 Ali Akcaagac 2015-06-10 09:17:37 UTC
Oh I nearly forgot to explain what we mean with "does nothing in the middle of a process" by explaining the backgrouds why we came to this at all...

After we had the issue as explained above (output being eaten and no further response), we dived into the /var/lib/dnf/i386/22 directory and did a frequent ls -l inside that (every couple of seconds).

We saw that there was no progress in pulling down the repository metadata. This has nothing to do with two dnf processes interferring each others as your bug is showing above. It has simply to do that nothing happens anymore in the dnf metadata directory.

We would expected something like this:

-bash-4.3$ ls -l
total 71040
drwxr-xr-x 3 root root       21 Jun  6 21:54 adobe-linux-i386
-rw-r--r-- 1 root root      607 Jun  6 21:54 adobe-linux-i386-filenames.solvx
-rw-r--r-- 1 root root     6105 Jun  6 21:55 adobe-linux-i386.solv
-rw-r--r-- 1 root root        2 Jun 10 10:53 expired_repos.json
drwxr-xr-x 4 root root       55 Jun 10 10:50 fedora
-rw-r--r-- 1 root root 30445939 Jun 10 10:51 fedora-filenames.solvx
-rw-r--r-- 1 root root 13632525 Jun 10 10:53 fedora.solv
drwxr-xr-x 3 root root       21 Jun 10 10:49 google
drwxr-xr-x 3 root root       21 Jun 10 10:52 google-chrome
-rw-r--r-- 1 root root     1764 Jun 10 10:52 google-chrome-filenames.solvx
-rw-r--r-- 1 root root     7366 Jun 10 10:53 google-chrome.solv
drwxr-xr-x 3 root root       21 Jun 10 10:50 google-earth
-rw-r--r-- 1 root root     7553 Jun 10 10:50 google-earth-filenames.solvx
-rw-r--r-- 1 root root     5904 Jun 10 10:53 google-earth.solv
-rw-r--r-- 1 root root     1764 Jun 10 10:49 google-filenames.solvx
-rw-r--r-- 1 root root     7366 Jun 10 10:53 google.solv
drwxr-xr-x 3 root root       38 Jun 10 10:49 kde
-rw-r--r-- 1 root root       82 Jun 10 10:49 kde-filenames.solvx
-rw-r--r-- 1 root root     5509 Jun 10 10:53 kde.solv
drwxr-xr-x 3 root root       38 Jun 10 10:49 kde-testing
-rw-r--r-- 1 root root       82 Jun 10 10:49 kde-testing-filenames.solvx
-rw-r--r-- 1 root root     5509 Jun 10 10:53 kde-testing.solv
drwxr-xr-x 3 root root       38 Jun 10 10:50 kde-unstable
-rw-r--r-- 1 root root       82 Jun 10 10:50 kde-unstable-filenames.solvx
-rw-r--r-- 1 root root     5509 Jun 10 10:53 kde-unstable.solv
-rw-r--r-- 1 root root        0 Jun 10 09:39 last_makecache
drwxr-xr-x 4 root root       53 Jun 10 10:52 rpmfusion-free
-rw-r--r-- 1 root root   289347 Jun 10 10:52 rpmfusion-free-filenames.solvx
-rw-r--r-- 1 root root   198183 Jun 10 10:53 rpmfusion-free.solv
drwxr-xr-x 3 root root       38 Jun 10 10:52 rpmfusion-free-updates
-rw-r--r-- 1 root root       82 Jun 10 10:52 rpmfusion-free-updates-filenames.solvx
-rw-r--r-- 1 root root     4782 Jun 10 10:53 rpmfusion-free-updates.solv
drwxr-xr-x 3 root root       38 Jun 10 10:52 rpmfusion-free-updates-testing
-rw-r--r-- 1 root root       82 Jun 10 10:52 rpmfusion-free-updates-testing-filenames.solvx
-rw-r--r-- 1 root root     4782 Jun 10 10:53 rpmfusion-free-updates-testing.solv
drwxr-xr-x 3 root root       38 Jun 10 10:52 rpmfusion-nonfree
-rw-r--r-- 1 root root    86576 Jun 10 10:52 rpmfusion-nonfree-filenames.solvx
-rw-r--r-- 1 root root    71325 Jun 10 10:53 rpmfusion-nonfree.solv
drwxr-xr-x 3 root root       38 Jun 10 10:52 rpmfusion-nonfree-updates
-rw-r--r-- 1 root root       82 Jun 10 10:52 rpmfusion-nonfree-updates-filenames.solvx
-rw-r--r-- 1 root root     4782 Jun 10 10:53 rpmfusion-nonfree-updates.solv
drwxr-xr-x 3 root root       38 Jun 10 10:49 rpmfusion-nonfree-updates-testing
-rw-r--r-- 1 root root       82 Jun 10 10:49 rpmfusion-nonfree-updates-testing-filenames.solvx
-rw-r--r-- 1 root root     4782 Jun 10 10:53 rpmfusion-nonfree-updates-testing.solv
-rw-r--r-- 1 root root  1039500 Jun 10 10:53 @System.solv
drwxr-xr-x 4 root root       55 Jun 10 10:53 updates
-rw-r--r-- 1 root root  4870515 Jun 10 10:53 updates-filenames.solvx
-rw-r--r-- 1 root root  1106996 Jun 10 10:53 updates.solv
drwxr-xr-x 4 root root       55 Jun 10 10:50 updates-testing
-rw-r--r-- 1 root root 11916983 Jun 10 10:50 updates-testing-filenames.solvx
-rw-r--r-- 1 root root  5348036 Jun 10 10:53 updates-testing.solv
-rw-r--r-- 1 root root  2882834 Jun 10 10:50 updates-testing-updateinfo.solvx
-rw-r--r-- 1 root root   680182 Jun 10 10:53 updates-updateinfo.solvx
-bash-4.3$ 

But received something like that:

-bash-4.3$ ls -l
total 71040
drwxr-xr-x 3 root root       21 Jun  6 21:54 adobe-linux-i386
-rw-r--r-- 1 root root      607 Jun  6 21:54 adobe-linux-i386-filenames.solvx
-rw-r--r-- 1 root root     6105 Jun  6 21:55 adobe-linux-i386.solv
-rw-r--r-- 1 root root        2 Jun 10 10:53 expired_repos.json
drwxr-xr-x 4 root root       55 Jun 10 10:50 fedora
-rw-r--r-- 1 root root 30445939 Jun 10 10:51 fedora-filenames.solvx
-rw-r--r-- 1 root root 13632525 Jun 10 10:53 fedora.solv
-rw-r--r-- 1 root root  1039500 Jun 10 10:53 @System.solv
drwxr-xr-x 4 root root       55 Jun 10 10:53 updates
-rw-r--r-- 1 root root  4870515 Jun 10 10:53 updates-filenames.solvx
-rw-r--r-- 1 root root  1106996 Jun 10 10:53 updates.solv
drwxr-xr-x 4 root root       55 Jun 10 10:50 updates-testing
-rw-r--r-- 1 root root 11916983 Jun 10 10:50 updates-testing-filenames.solvx
-rw-r--r-- 1 root root  5348036 Jun 10 10:53 updates-testing.solv
-rw-r--r-- 1 root root  2882834 Jun 10 10:50 updates-testing-updateinfo.solvx
-rw-r--r-- 1 root root   680182 Jun 10 10:53 updates-updateinfo.solvx
-bash-4.3$ 

This is now something that I just created out of my mind. By looking inside that directory with ls -l we saw that it was doing nothing anymore and idles there for over 30 minutes (not one byte has changed since then). We have been seeing this for various times after we switched to Fedora 22 (27.05.2015). This also got us to the point "no textual output, we can't see what's happening". It's like the entire dnf process came to a dead end. Neither top, ps, or other things have shown activity. Only aborting dnf and having it start over again (done a couple of times) made it continue working.

We wonder that the character eating (inside xterm, vte, xfce4-terminal) might be related to dnf becoming idle because most of the things happening within linux is reading and writing of block and character devices.

Some *condition* that causes this (... which shows up during a workday ...) but disappears during reboot or login/logout (from X in my case above)... I can not describe it more precisely because Fedora 22 is quite new and there might be a ton of *missbehaviours* that's being tracked by various people. Packages going hand in hand with other packages etc.

Comment 10 Honza Silhan 2015-07-24 12:20:12 UTC
Ali do you need to wait even if you don't set `--installroot` switch? How long was the installed package list?

Comment 11 Ali Akcaagac 2015-08-24 09:45:25 UTC
The entire installed package list..... hmmmmm?

Actually this was during downloading og the meta-data so I am not sure whether DNF was processing any package list at that time. But the package list was a normal Fedora-22 Workstation install. The package list was about 1800 packages which should be stripped down to about 700 packages.

But yet I am still having "stalls" in the middle of either downloading meta-data or package updates with current DNF. Last time this morning on a 100mbit line. I am not sure but I belive it has something to do with bandwidth throtteling because it downloaded 10 updates or so and stalled at one. While all packages had around 1-2mb/s, this particular one had just around 100kb/s or even less and then hung. Had to ctrl+c and restart DNF a couple of times until it russhed through. No installroot was used this morning.

-bash-4.3$ dnf --version
1.1.0
  Installed: dnf-0:1.1.0-2.fc22.noarch at 2015-08-15 08:01
  Built    : Fedora Project at 2015-08-12 16:58

  Installed: rpm-0:4.12.0.1-12.fc22.i686 at 2015-06-30 15:25
  Built    : Fedora Project at 2015-06-29 10:34
-bash-4.3$

Comment 12 Ali Akcaagac 2016-03-14 15:23:47 UTC
Reporting back. This issue is still happening quite often.

-bash-4.3$ rpm -qa | grep -i "dnf" | sort
dnf-1.1.7-2.fc22.noarch
dnf-conf-1.1.7-2.fc22.noarch
dnf-plugins-core-0.1.17-1.fc22.noarch
dnf-yum-1.1.7-2.fc22.noarch
python-dnf-1.1.7-2.fc22.noarch
python-dnf-plugins-core-0.1.17-1.fc22.noarch
-bash-4.3$

Comment 13 Fedora Admin XMLRPC Client 2016-07-08 09:27:41 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 14 Fedora End Of Life 2016-07-19 14:44:21 UTC
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.