Description of problem: Running journalctl takes very long time, over 25 minutes. Version-Release number of selected component (if applicable): systemd-208-9.fc20.x86_64 How reproducible: Always Steps to Reproduce: 1. time journalctl | grep xyz 2. 3. Actual results: real 25m31.478s user 11m2.966s sys 2m36.218s Expected results: Way faster execution. Additional info: # journalctl --disk-usage Journals take up 3.6G on disk. # journalctl --until=-1m | head -1 -- Logs begin at Sun 2013-07-28 18:25:57 CEST, end at Thu 2014-01-02 01:20:02 CE The same question on the text files in /var/log takes: # time grep -r --exclude-dir=journal xyz /var/log ... real 1m6.362s user 0m2.253s sys 0m1.201s # du -sh --exclude=journal /var/log 620M /var/log System info: # cat /proc/cpuinfo | grep "model name" model name : AMD Turion(tm) II Neo N54L Dual-Core Processor model name : AMD Turion(tm) II Neo N54L Dual-Core Processor # free -h total used free shared buffers cached Mem: 1.8G 1.7G 127M 1.1M 398M 826M -/+ buffers/cache: 523M 1.3G Swap: 4.3G 133M 4.1G The system is a home web and mail server running at runlevel 3.
It should be added that this also effects systemctl: # time systemctl status postfix.service postfix.service - Postfix Mail Transport Agent Loaded: loaded (/usr/lib/systemd/system/postfix.service; enabled) Active: active (running) since Sun 2014-01-05 18:16:48 CET; 2 weeks 6 days ago Main PID: 6548 (master) CGroup: /system.slice/postfix.service ├─ 6548 /usr/libexec/postfix/master -w ├─12919 qmgr -l -t fifo -u ├─12922 tlsmgr -l -t unix -u ├─12937 pickup -l -t fifo -u └─14566 anvil -l -t unix -u [ten log lines showed] Hint: Some lines were ellipsized, use -l to show in full. real 0m42.251s user 0m0.026s sys 0m0.361s
58 days later, and the problem still persist: # time journalctl | grep xyz ... real 20m6.684s user 12m18.541s sys 2m44.568s # # time grep -r --exclude-dir=journal xyz /var/log ... real 1m47.507s user 0m2.836s sys 0m1.631s # Things was a tad faster at my two workstation computers at home and at work (both Dell T1650 with E3-1270 Xeon and 16GB of RAM) Home: # time journalctl | grep xyz ... real 0m46.057s user 0m18.918s sys 0m3.308s # time grep -r --exclude-dir=journal xyz /var/log ... real 0m4.424s user 0m0.101s sys 0m0.038s # Work: # time journalctl | grep xyz ... real 0m40.357s user 0m11.891s sys 0m1.627s # time grep -r --exclude-dir=journal xyz /var/log real 0m6.646s user 0m0.125s sys 0m0.040s # But they have smaller journals, which might be the answer, 772.6M at home, and 553.4M at work, compared to 4.0G for my server that is the really sluggish one. Note that 'systemctl status <whatever>' is equally sluggish, as it reads the journal, and prints some log-lines at the end of the status message.
158 days later, and still no action on this issue... This time it was a tad faster, but still waaaay too long time: # time journalctl | grep xyz ... real 12m8.549s user 8m55.324s sys 2m16.010s # journalctl --disk-usage Journals take up 3.9G on disk. # journalctl --until=-1m | head -1 -- Logs begin at Fri 2014-03-14 23:07:16 CET, end at Sat 2014-06-07 23:05:48 CEST. -- # time grep -r --exclude-dir=journal xyz /var/log ... real 1m52.778s user 0m4.206s sys 0m2.491s # du -sh --exclude=journal /var/log 1.1G /var/log # rpm -qa systemd* | sort systemd-208-16.fc20.x86_64 systemd-libs-208-16.fc20.x86_64 systemd-python-208-16.fc20.x86_64 systemd-ui-3-1.fc20.x86_64
200 days later, and still no action on this issue... The problem persist. # time journalctl | grep xyz ... real 31m5.530s user 14m11.046s sys 18m21.174s # journalctl --disk-usage Journals take up 4.0G on disk. # journalctl --until=-1m | head -1 -- Logs begin at Thu 2014-05-01 16:29:17 CEST, end at Sat 2014-07-19 13:31:01 CEST. -- # time grep -r --exclude-dir=journal xyz /var/log ... real 2m8.209s user 0m4.898s sys 0m2.757s # du -sh --exclude=journal /var/log 1.3G /var/log # rpm -qa systemd* | sort systemd-208-19.fc20.x86_64 systemd-libs-208-19.fc20.x86_64 systemd-python-208-19.fc20.x86_64 systemd-ui-3-1.fc20.x86_64
252 days later, and still no response whatsoever from the maintainers... The journal is still sluggish. Se below: # time journalctl | grep xyz ... real 29m36.162s user 14m14.193s sys 17m38.224s # journalctl --disk-usage Journals take up 4.0G on disk. # journalctl --until=-1m | head -1 -- Logs begin at Thu 2014-06-05 20:34:43 CEST, end at Tue 2014-09-09 21:33:17 CEST. -- Compare this with the time it takes to do the same with syslog text files covering the same period. # time grep -r --exclude-dir=journal xyz /var/log ... real 2m9.329s user 0m5.054s sys 0m2.879s # du -sh --exclude=journal /var/log 1.4G /var/log # rpm -qa systemd* | sort systemd-208-21.fc20.x86_64 systemd-libs-208-21.fc20.x86_64 systemd-python-208-21.fc20.x86_64 systemd-ui-3-1.fc20.x86_64 A simple 'systemctl status' also takes quite a long time to complete: # time systemctl status postfix.service postfix.service - Postfix Mail Transport Agent Loaded: loaded (/usr/lib/systemd/system/postfix.service; enabled) Active: active (running) since Sun 2014-05-04 20:29:11 CEST; 4 months 6 days ago Main PID: 1350 (master) CGroup: /system.slice/postfix.service ├─ 1350 /usr/libexec/postfix/master -w ├─ 3306 qmgr -l -t fifo -u ├─ 3346 tlsmgr -l -t unix -u ├─27379 pickup -l -t fifo -u └─28486 anvil -l -t unix -u ... some log lines ... real 0m14.598s user 0m0.039s sys 0m0.244s (it is the log lines that take time, making the 'systemctl status' command really annoying to use)
What filesystem is used for /var/log/journal?
/var/log/journal resides on a / raid1 partition (consisting of three drives, initially two) using ext4. The same partition as the text based logs refereed to earlier.
(In reply to Lars E. Pettersson from comment #7) > /var/log/journal resides on a / raid1 partition (consisting of three drives, > initially two) using ext4. I have to correct myself. That should be ext3, not ext4.
334 days later, and still no action on this bug... The same tests as before: # time journalctl | grep xyz ... real 51m57.719s user 18m13.949s sys 36m12.876s # journalctl --disk-usage Journals take up 3.9G on disk. # journalctl --until=-1m | head -1 -- Logs begin at Sat 2014-06-07 15:52:06 CEST, end at Sun 2014-11-30 17:16:02 CET. -- # time grep -r --exclude-dir=journal xyz /var/log ... real 1m50.120s user 0m4.717s sys 0m2.714s # du -sh --exclude=journal /var/log 1.3G /var/log # rpm -qa systemd* | sort systemd-208-28.fc20.x86_64 systemd-libs-208-28.fc20.x86_64 systemd-python-208-28.fc20.x86_64 systemd-ui-3-1.fc20.x86_64 # time systemctl status postfix.service postfix.service - Postfix Mail Transport Agent Loaded: loaded (/usr/lib/systemd/system/postfix.service; disabled) Active: inactive (dead) real 0m8.445s user 0m0.016s sys 0m0.139s
(In reply to Lars E. Pettersson from comment #9) > 334 days later, and still no action on this bug... Well, not entirely true, as one of the systemd maintainers asked what filesystem you were using. Apparently there was some big issues with btrfs and journal logs, so he was trying to rule that out (which we have since you are using ext3). Whats the output of 'journalctl --verify' ? Perhaps you have a lot of corrupt logs and it's taking a long time to construct the index. Does doing a 'journalctl --update-catalog' help any? Also, if you have space, can you try copying all the journal files into another directory and operating on them with journalctl --root? This would perhaps show if there was some kind of fragmentation or fs issue that was fixed or helped by copying them.
(In reply to Kevin Fenzi from comment #10) > (In reply to Lars E. Pettersson from comment #9) > > 334 days later, and still no action on this bug... > > Well, not entirely true, as one of the systemd maintainers asked what > filesystem you were using. Well, asking what file system I use and then silently ignoring my answer is not much of action, at least not in my book. I did not even know why the question was asked, seemed a bit off-topic to me... So, thanks for providing information on why the question of file system was asked. OK, ran 'journalctl --verify' and it found 103 OK files, and 3 that failed with the faults "Invalid tail monotonic timestamp", "Invalid tail monotonic timestamp", and "Data number mismatch". Compared to other systems, nothing particular, i.e. the journal seem to be OK and healthy. Tested making a copy, as you asked me to do, and ran the time test on that. # time journalctl --root /root/journal/journal | grep xyz ... real 52m22.032s user 18m2.488s sys 36m55.334s I.e. no improvement at all... Tried the switch '--update-catalog' with the following result. # journalctl --update-catalog # time journalctl | grep xyz ... real 52m36.395s user 17m55.600s sys 37m13.684s Well, that did not improve things either. Any other thing I could test?
Nothing else is leaping to my mind. Currently journal compresses with xz which saves a lot of space, but is very slow... From irc: Dec 01 13:25:00 <zbyszek> nirik: not specifically on this bug, but yes, we're working on improving journal speed. Dec 01 13:25:18 <zbyszek> I'm currently working on making lz4 compression the default for journal. Dec 01 13:25:56 <zbyszek> It has been optional for a few months, but the on-disk format was incompatible with the lz4 binary, so I didn't want to make it the default until that was fixed. Dec 01 13:26:32 <zbyszek> Lz4 upstream has added a "frame" api which provides compatibility, and we'll be switching over to it and making lz4 the default instead of xz. Dec 01 13:27:06 <zbyszek> This should provide a noticable speedup, since compression/decompression of large blocks was very costly. Dec 01 13:27:29 <zbyszek> XZ speed was also improved a lot through better setings in the meanwhile. Dec 01 13:28:00 <zbyszek> There have been other fixes too, but nothing which would allow us to say that "this bug is fixed".
I pushed a couple of commits upstream to optimize journal iteration: http://cgit.freedesktop.org/systemd/systemd/commit/?id=b7c88ab8cc7d55a43450bf3dea750f95f2e910d6 http://cgit.freedesktop.org/systemd/systemd/commit/?id=f8b5a3b75fb55f0acb85c21424b3893c822742e9 I have a couple of more ideas for improvement, but the implementation's not working yet.
Here's another one: http://cgit.freedesktop.org/systemd/systemd/commit/?id=69adae5168da231c6cf319f708860954701b25ed
(In reply to Michal Schmidt from comment #13) > I pushed a couple of commits upstream to optimize journal iteration: > > http://cgit.freedesktop.org/systemd/systemd/commit/ > ?id=b7c88ab8cc7d55a43450bf3dea750f95f2e910d6 > > http://cgit.freedesktop.org/systemd/systemd/commit/ > ?id=f8b5a3b75fb55f0acb85c21424b3893c822742e9 These are now superseded by the patchset around: http://cgit.freedesktop.org/systemd/systemd/commit/?id=7943f42275025e1b6642b580b19b24dfab8dee61
This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
journalctl performs better in F22.